Avi asked not to use an atomic integer to produce ids for repair
operations. The existing code had another bug: It could return some
id immediately, but because our start_repair() hasn't started running
code on cpu 0 yet, the new id was not yet registered and if we were to
call repair_get_status() for this id too quickly, it could fail.
The solution for both issues is that start_repair() should return not
an int, but a future<int>: the integer id is incremented on cpu 0 (so
no atomics are needed), and then returned and the future is fulfilled.
Note that the future returned by start_repair() does not wait for the
repair to be over - just for its index to be registered and be usable
to a call to repair_get_status().
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
"This series address issues #59 and #23.
It moves the API configuration from the command line argument to the general
config, it also move the api-doc directory to be configurable instead of hard
coded."
Fixes#59Fixes#23
This patch addresses issu #155, it adds a helper function that if a
keyspace does not exists it throw a bad parameter exception.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
$ curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json" "http://127.0.0.1:10000/storage_service/gossiping"
btw, the description looks incorrect:
POST /storage_service/gossiping
allows a user to recover a forcibly 'killed' node
This adds the ownwership method implementation to the storage_service
API. After the patch the following url will be supported:
GET /storage_service/ownership/{keyspace}
GET /storage_service/ownership/
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the API for get_effective_ownership and
get_ownership in storage_service.
It is based on the StorageServiceMBean definition.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the column family mean row size in the per column family and
the total version. I uses the ratio_helper class to calculate the mean
over all the shrades.
This distinguish between the async repair that starts the repair, that
will now be a POST request and the method that check on the command
progress that will now be a GET command.
After the change each operation would get the parameters that it needs.
The GET will return an enum based on the repair_status.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch uses the now existing infrastructure to expose statistics about the bloom
filters hit/miss rates.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Adding to API function to return count of sstables in L0 if leveled
compaction strategy is enabled, 0 otherwise. Currently, we don't
support leveled compaction strategy, so function to return count of
sstables in L0 always return zero.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
"This series expose statistics from the row_cache in the cache_service API.
After this series the following methods will be available:
get_row_hits
get_row_requests
get_row_hit_rate
get_row_size
get_row_entries"
This adds a stub implementation for the storge service metrics. The
implementation returns the currect type with a stub value.
After this patch the following url will be available:
/storage_service/metrics/load
/storage_service/metrics/exceptions
/storage_service/metrics/hints_in_progress
/storage_service/metrics/total_hints
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the storage service metrics that is based on the
StorageServiceMetrics class.
The following command where added:
get_metrics_load
get_exceptions
get_total_hints_in_progress
get_total_hints
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the implementation for get_row_hits, get_row_requests,
get_row_hit_rate, row_enries, row_size and row_capacity
The implementation is based on the column-family map reduce
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Some of the APIs need to return a ratio.
The ratio_holder struct is a helper class that counts the total and the
sub totat, it implements the json::jsonable virtual class with a
to_json method that return the ratio.
The main usage of the sturct is with a map-reduce method.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"This adds the get_snitch_name and update_snitch functionality to the API. After
this series it would be possible to return the snitch name and to update the
snitch."
"This series cleans the streaming_histogram and the estimated histogram that
were importad from origin, it then uses it to get the estimated min and max row
estimation in the API."
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.
This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.
The first to use the modified histogram are the column family for their read
and write."
Conflicts:
database.hh
[in v2: 1. Fixed a few small bugs.
2. Added rudementary support parallel/sequential repair.
3. Verified that code works correctly with Asias's fix to streaming]
This patch adds the capability to track repair operations which we have
started, and check whether they are still running or completed (successfully
or unsuccessfully).
As before one starts a repair with the REST api:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
where "try1" is the name of the keyspace. This returns a repair id -
a small integer starting with 0. This patch adds support for similar
request to *query* the status of a previously started repair, by adding
the "id=..." option to the query, which enquires about the status of the
repair with this id: For example.,
curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0"
gets the current status of this repair 0. This status can be RUNNING,
SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an
invalid id is passed (not the id of any real repair that was previously
started).
This patch also adds two alternative code-paths in the main repair flow
do_repair_start(): One where each range is repaired one after another,
and one where all the ranges are repaired in parallel. At the moment, the
enabled code is the parallel version, just as before this patch. But the
will also be useful for implementing the "parallel" vs "sequential" repair
options of Cassandra.
Note that if you try to use repair, you are likely to run into a bug in
the streaming code which results in Scylla either crashing or a repair
hanging (never realising it finished). Asias already has a fix this this bug,
and will hopefully publish it soon, but it is unrelated to the repair code
so I think this patch can independently be committed.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds the implementation to in the API to the row size histogram.
It adds a map_cf method that perform a map operation over all column
family on the different shards.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the estimated_histogram to the utils definition file.
The estimated_histogram holds a list of buckets and a list of buckets
offsets.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the implementation for min and max row size in column family.
It uses the column family map redudce helper function with the addtional
function to get the min and max row size.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This helper function wraps the std min and max template for int64_t, it
makes it easier to pass them as a value in need.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
With the change in column_family stats, the API needs to get the counter
from the read and write histogram.
It also adds the implementation for the read and write latency histogram.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This change make sure that when there are no results (ie. all the
histogram that are summed are empty) the return result will be a zerroed
histogram and not an empty object.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the latency histogram to the column_family swagger
definitions.
The definitions are based on the ColumnFamilyMetrics.
It adds the following commands:
get_read_latency_histogram
get_all_read_latency_histogram
get_write_latency_histogram
get_all_write_latency_histogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>