This adds the latency histogram to the column_family swagger
definitions.
The definitions are based on the ColumnFamilyMetrics.
It adds the following commands:
get_read_latency_histogram
get_all_read_latency_histogram
get_write_latency_histogram
get_all_write_latency_histogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The histogrm object is used both as a general counter for the number of
events and for statistics and sampling.
This chanage the histogram implementation, so it would support spares
sampling while keeping the total number of event accurate.
The implementation includes the following:
Remove the template nature of the histogram, as it is used only for
timer and use the name ihistogram instead.
If in the future we'll need a histogram for other types, we can use the
histogrma name for it.
a total counter was added that count the number of events that are part
of the statistic calculation.
A helper methods where added to the ihistogram to handle the latency
counter object.
According to the sample mask it would mark the latency object as start
if the counter and the mask are non zero and it would accept the latency
object in its mark method, in which if the latency was not start, it
will not be added and only the 'count' counter that counts the total
number of events will be incremented.
This should reduce the impact of latency calculation to a neglectable
effect.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
When doing a spares latency check, it is required to know if a latency
object was started.
This returns true if the start timer was set.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
storage_service is a singleton, and wants a database for initialization.
On the other hand, database is a proper object that is created and
destroyed for each test. As a result storage_service ends up using
a destroyed object.
Work around this by:
- leaking the database object so that storage_service has something
to play with
- doing the second phase of storage_service initialization only once
(endsize / (1024*1024)) is an integer calculation, so if endsize is
lower than 1024^2, the result would be 0.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
I made the mistake of running scylla on a spinning disk. Since a disk
can serve about 100 reads/second, that set the tone for the whole benchmark.
Fix by improving cache preload when flushing a memtable. If we can detect
that a mutation is not part of any sstable (other than the one we just wrote),
we can add insert it into the cache.
After this, running a mixed cassandra-stress returns the expected results,
even on a spinning disk.
Add a FIXME about something I'm unsure about - does repair only need to
repair this node, or also make an effort to also repair the other nodes
(or more accurately, their specific token-ranges being repaired) if we're
already communicating with them?
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
If a stream failed, print a clear error message that repair failed, instead
of ignoring it and letting Seastar's generic "warning, exception was ignored"
be the only thing the user will see.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
The previous repair code exchanged data with the other nodes which have
one arbitrary token. This will only work correctly when all the nodes
replicate all the data. In a more realistic scenario, the node being
repaired holds copies of several token ranges, and each of these ranges
has a different set of replicas we need to perform the repair with.
So this patch does the right thing - we perform a separate repair_range()
for each of the local ranges, and each of those will find a (possibly)
different set of nodes to communicate with.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This patch adds a method get_ranges() to replication-strategy.
It returns the list of token ranges held by the given endpoint.
It will be used by the replication code, which needs to know
in particular which token ranges are held by *this* node.
This function is the analogue of Origin's getAddressRanges().get(endpoint).
As in Origin, also here the implementation is not meant to be efficient,
and will not be used in the fast path.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
Now, make_local_reader does not need partition_range to be alive when we
read the mutation reader. No need to store it in stream_detail for its
lifetime.
Since all the gossip callback (e.g., on_change) are executed inside a
seastar::async context, we can make wait for the operations like update
system table to complete.
So that do_before_change_notifications and do_on_change_notifications
are under seastar::async.
Now, before_change callbacks are inside seastar::async context.
It is easier to futurize apply_new_states and handle_major_state_change.
Now, on_change, on_join and on_restart callbacks are inside
seastar::async context.
It is not correct to use _scheduled_gossip_task.armed() to tell if
gossip is enabled or not , since timer set _armed = false before calling
the timer callback.
It was working correctly because we did not actually check is_enabled()
flag inside the timer callback but inside the send_gossip_digest_syn()'s
continuation and at that time the timer is armed again.
Use a standalone flag to do so.
Similar to a mutation_reader, but limited: it only returns whether a key
is sure not to exist in some mutation source. Non-blocking and expected
to execute fast. Corresponds to an sstable bloom filter.
To avoid ambiguity, it doesn't return a bool, instead a longer but less
ambiguous "definitely_doesnt_exists" or "maybe_exists".
* seastar 6f1dd3c...887f72d (8):
> finally(): don't discard any exception
> dpdk: check the resulting cluster for non-i40e NICs
> reactor: avoid SIGPIPE when writing to a socket
> memory: Don't run reclaimers if free memory is above the threshold
> core: Add missing include to transfer.hh
> dhcp: print the "sending discover" message only once
> reactor: count io_threaded_fallback statistic
> future: finally(): don't let the exceptional future to be ignored
"This series enables incremental eviction of data from cache. The eviction is
controlled by the LSA tracker, which consideres evictable regions as part of
its reclaim() method."