"This adds the get_snitch_name and update_snitch functionality to the API. After
this series it would be possible to return the snitch name and to update the
snitch."
"This series cleans the streaming_histogram and the estimated histogram that
were importad from origin, it then uses it to get the estimated min and max row
estimation in the API."
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.
This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.
The first to use the modified histogram are the column family for their read
and write."
Conflicts:
database.hh
[in v2: 1. Fixed a few small bugs.
2. Added rudementary support parallel/sequential repair.
3. Verified that code works correctly with Asias's fix to streaming]
This patch adds the capability to track repair operations which we have
started, and check whether they are still running or completed (successfully
or unsuccessfully).
As before one starts a repair with the REST api:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
where "try1" is the name of the keyspace. This returns a repair id -
a small integer starting with 0. This patch adds support for similar
request to *query* the status of a previously started repair, by adding
the "id=..." option to the query, which enquires about the status of the
repair with this id: For example.,
curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0"
gets the current status of this repair 0. This status can be RUNNING,
SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an
invalid id is passed (not the id of any real repair that was previously
started).
This patch also adds two alternative code-paths in the main repair flow
do_repair_start(): One where each range is repaired one after another,
and one where all the ranges are repaired in parallel. At the moment, the
enabled code is the parallel version, just as before this patch. But the
will also be useful for implementing the "parallel" vs "sequential" repair
options of Cassandra.
Note that if you try to use repair, you are likely to run into a bug in
the streaming code which results in Scylla either crashing or a repair
hanging (never realising it finished). Asias already has a fix this this bug,
and will hopefully publish it soon, but it is unrelated to the repair code
so I think this patch can independently be committed.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds the implementation to in the API to the row size histogram.
It adds a map_cf method that perform a map operation over all column
family on the different shards.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the estimated_histogram to the utils definition file.
The estimated_histogram holds a list of buckets and a list of buckets
offsets.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the implementation for min and max row size in column family.
It uses the column family map redudce helper function with the addtional
function to get the min and max row size.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This helper function wraps the std min and max template for int64_t, it
makes it easier to pass them as a value in need.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
With the change in column_family stats, the API needs to get the counter
from the read and write histogram.
It also adds the implementation for the read and write latency histogram.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This change make sure that when there are no results (ie. all the
histogram that are summed are empty) the return result will be a zerroed
histogram and not an empty object.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the latency histogram to the column_family swagger
definitions.
The definitions are based on the ColumnFamilyMetrics.
It adds the following commands:
get_read_latency_histogram
get_all_read_latency_histogram
get_write_latency_histogram
get_all_write_latency_histogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed
1. Repair options are not yet supported (range repair, sequential/parallel
repair, choice of hosts, datacenters and column families, etc.).
2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
alternative optimization) and partial data exchange haven't been
implemented yet.
3. Full repair for nodes with multiple separate ranges is not yet
implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
so each vnode's range has a different host as a replica, so we need
to exchange each key range separately with a different remote host.
4. Our repair operation returns a numeric operation id (like Origin),
but we don't yet provide any means to use this id to check on ongoing
repairs like Origin allows.
5. Error hangling, logging, etc., needs to be improved.
6. SMP nodes (with multiple shards) should work correctly (thanks to
Asias's latest patch for SMP mutation streaming) but haven't been
tested.
7. Incremental repair is not supported (see
http://www.datastax.com/dev/blog/more-efficient-repairs)
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds the API implementation for the read, write, number of
panding flushes and memtable switch count.
The implementation uses a helper function to perform map and map_reduce
on column_family.
The get_uuid helper method now supports both colon notations (i.e.
either as a ":" or as %3A)
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the read and write counters to the column_family swagger
definitions.
It adds the following commands:
get_read
get_all_read
get_write
get_all_write
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the implementation to the histogram for the storage proxy.
After this patch the following url will be available:
/storage_proxy/metrics/read/latency/histogram
/storage_proxy/metrics/range/latency/histogram
/storage_proxy/metrics/write/latency/histogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the read, write and range histograms to the storage_proxy
It adds the following commands:
get_read_metrics_latency_histogram
get_range_metrics_latency_histogram
get_write_metrics_latency_histogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch generalizd the sum helper function to accept any field as
long as it support the + operator and that it can be parrsed as json.
It adds a sum function to sum histograms it does so by:
adding the totatl, adding the sum, set the min and max
setting the avrage and variance and combining the samples.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The utils file will hold general modules, that need to be used by
multiple modules.
As a start, it holds the histogram definition.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
config.hh changes rapidly, so don't force lots of recompiles by including it.
Need to place seed_provider_type in namespace scope, so we can forward
declare it for that.
It should not be called directly: externall callers should be calling flush()
instead.
To be sure it doesn't happen again, make seal_active_memtable private.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
In much of our column_families APIs, we need to pass a pointer to the database.
The only reason we do that, is so we can properly handle the commit log entries
after we seal the current memtables into sstables.
Now that we store a pointer to the commit log in the CF itself at the time it
is created, we no longer have to do it. As a result, the APIs are a lot
cleaner, with no gratuitous parameters.
My motivation for this was the flush method, but as a result, apply() also gets
cleaner.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This adds an implementation to the storage_service counters. The
implementation uses the stats object inside the storage_proxy.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
A common scenario in the API is to get a sigle value from a distributed
object that has a get_stats method.
The helper function would get the object and a function that return a
single value from the stat object and would perform the map_reduce.
It would return a future that can be used as a return value from the
API.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
The API needs to call the storage_proxy, for that a reference to the
distribute storage_proxy is added to the context and is set in main.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds a stub implementation to the metrics of the storage proxy.
After this patch the following URL will be available:
/storage_service/metrics/cas_write/contention
/storage_service/metrics/cas_write/condition_not_met
/storage_service/metrics/cas_read/unfinished_commit
/storage_service/metrics/cas_read/contention
/storage_service/metrics/cas_read/condition_not_met
/storage_service/metrics/read/timeouts
/storage_service/metrics/read/unavailables
/storage_service/metrics/range/timeouts
/storage_service/metrics/range/unavailables
/storage_service/metrics/write/timeouts
/storage_service/metrics/write/unavailables
The implementation returns 0 for all queries.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the storage definition to the storage proxy swagger definition
file.
It adds the definitions for the following command:
get_cas_write_metrics_unfinished_commit
get_cas_write_metrics_contention
get_cas_write_metrics_condition_not_met
get_cas_read_metrics_unfinished_commit
get_cas_read_metrics_contention
get_cas_read_metrics_condition_not_met
get_read_metrics_timeouts
get_read_metrics_unavailables
get_range_metrics_timeouts
get_range_metrics_unavailables
get_write_metrics_timeouts
get_write_metrics_unavailables
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds a stub implementation to the commit log metrics.
The calls return the currect value type with a stub value.
After this patch the following url will be available:
/commitlog/metrics/completed_tasks
/commitlog/metrics/pending_tasks
/commitlog/metrics/total_commit_log_size
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the commit log swagger definition to to the commit log
definition file.
The API is based on the CommitLogMetrics.
The following commands were added:
get_completed_tasks
get_pending_tasks
get_total_commit_log_size
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"The column family matrics is a set of data related to the column family.
This series adds an API based on the ColumnFamilyMetrics mbean.
It has a stub implementation, just so the JMX proxy would get a response."
"This series adds the hinted handoff and hinted handoff metrics API with a stub
implementation. The API definition was based on the HintedHandOffMetricsMBean
and the HintedHandoffMetrics."
Conflicts:
api/api.cc
configure.py
"This series adds the cache service metrics API, It is based on the CacheMetrics
definitions.
There are statistics on per key, row and counters that will be expose in the
API. This series contain a stub implementation, that returns the correct types
but with a stub value."
This adds a stub implementation for the hinted handoff metrics.
The stubbed methods return the correct type, but with a stub value.
After this patch the following path will be available:
/hinted_handoff/metrics/create_hint/{addr}
/hinted_handoff/metrics/not_stored_hints/{addr}
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>