$ curl -X POST --header "Content-Type: application/json" --header "Accept:
application/json" "http://127.0.0.1:10000/storage_service/gossiping"
btw, the description looks incorrect:
POST /storage_service/gossiping
allows a user to recover a forcibly 'killed' node
This adds the ownwership method implementation to the storage_service
API. After the patch the following url will be supported:
GET /storage_service/ownership/{keyspace}
GET /storage_service/ownership/
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the API for get_effective_ownership and
get_ownership in storage_service.
It is based on the StorageServiceMBean definition.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the column family mean row size in the per column family and
the total version. I uses the ratio_helper class to calculate the mean
over all the shrades.
This distinguish between the async repair that starts the repair, that
will now be a POST request and the method that check on the command
progress that will now be a GET command.
After the change each operation would get the parameters that it needs.
The GET will return an enum based on the repair_status.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch uses the now existing infrastructure to expose statistics about the bloom
filters hit/miss rates.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
Adding to API function to return count of sstables in L0 if leveled
compaction strategy is enabled, 0 otherwise. Currently, we don't
support leveled compaction strategy, so function to return count of
sstables in L0 always return zero.
Signed-off-by: Raphael S. Carvalho <raphaelsc@cloudius-systems.com>
"This series expose statistics from the row_cache in the cache_service API.
After this series the following methods will be available:
get_row_hits
get_row_requests
get_row_hit_rate
get_row_size
get_row_entries"
This adds a stub implementation for the storge service metrics. The
implementation returns the currect type with a stub value.
After this patch the following url will be available:
/storage_service/metrics/load
/storage_service/metrics/exceptions
/storage_service/metrics/hints_in_progress
/storage_service/metrics/total_hints
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the storage service metrics that is based on the
StorageServiceMetrics class.
The following command where added:
get_metrics_load
get_exceptions
get_total_hints_in_progress
get_total_hints
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the implementation for get_row_hits, get_row_requests,
get_row_hit_rate, row_enries, row_size and row_capacity
The implementation is based on the column-family map reduce
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
Some of the APIs need to return a ratio.
The ratio_holder struct is a helper class that counts the total and the
sub totat, it implements the json::jsonable virtual class with a
to_json method that return the ratio.
The main usage of the sturct is with a map-reduce method.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"This adds the get_snitch_name and update_snitch functionality to the API. After
this series it would be possible to return the snitch name and to update the
snitch."
"This series cleans the streaming_histogram and the estimated histogram that
were importad from origin, it then uses it to get the estimated min and max row
estimation in the API."
"Histograms are used to collect latency information, in Origin, many of the
operations are timed, this is a potential performance issue. This series adds
an option to sample the operations, where small amount will be timed and the
most will only be counted.
This will give an estimation for the statistics, while keeping an accurate
count of the total events and have neglectible performance impact.
The first to use the modified histogram are the column family for their read
and write."
Conflicts:
database.hh
[in v2: 1. Fixed a few small bugs.
2. Added rudementary support parallel/sequential repair.
3. Verified that code works correctly with Asias's fix to streaming]
This patch adds the capability to track repair operations which we have
started, and check whether they are still running or completed (successfully
or unsuccessfully).
As before one starts a repair with the REST api:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
where "try1" is the name of the keyspace. This returns a repair id -
a small integer starting with 0. This patch adds support for similar
request to *query* the status of a previously started repair, by adding
the "id=..." option to the query, which enquires about the status of the
repair with this id: For example.,
curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0"
gets the current status of this repair 0. This status can be RUNNING,
SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an
invalid id is passed (not the id of any real repair that was previously
started).
This patch also adds two alternative code-paths in the main repair flow
do_repair_start(): One where each range is repaired one after another,
and one where all the ranges are repaired in parallel. At the moment, the
enabled code is the parallel version, just as before this patch. But the
will also be useful for implementing the "parallel" vs "sequential" repair
options of Cassandra.
Note that if you try to use repair, you are likely to run into a bug in
the streaming code which results in Scylla either crashing or a repair
hanging (never realising it finished). Asias already has a fix this this bug,
and will hopefully publish it soon, but it is unrelated to the repair code
so I think this patch can independently be committed.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds the implementation to in the API to the row size histogram.
It adds a map_cf method that perform a map operation over all column
family on the different shards.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the estimated_histogram to the utils definition file.
The estimated_histogram holds a list of buckets and a list of buckets
offsets.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the implementation for min and max row size in column family.
It uses the column family map redudce helper function with the addtional
function to get the min and max row size.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This helper function wraps the std min and max template for int64_t, it
makes it easier to pass them as a value in need.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
With the change in column_family stats, the API needs to get the counter
from the read and write histogram.
It also adds the implementation for the read and write latency histogram.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This change make sure that when there are no results (ie. all the
histogram that are summed are empty) the return result will be a zerroed
histogram and not an empty object.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds the latency histogram to the column_family swagger
definitions.
The definitions are based on the ColumnFamilyMetrics.
It adds the following commands:
get_read_latency_histogram
get_all_read_latency_histogram
get_write_latency_histogram
get_all_write_latency_histogram
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed
1. Repair options are not yet supported (range repair, sequential/parallel
repair, choice of hosts, datacenters and column families, etc.).
2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
alternative optimization) and partial data exchange haven't been
implemented yet.
3. Full repair for nodes with multiple separate ranges is not yet
implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
so each vnode's range has a different host as a replica, so we need
to exchange each key range separately with a different remote host.
4. Our repair operation returns a numeric operation id (like Origin),
but we don't yet provide any means to use this id to check on ongoing
repairs like Origin allows.
5. Error hangling, logging, etc., needs to be improved.
6. SMP nodes (with multiple shards) should work correctly (thanks to
Asias's latest patch for SMP mutation streaming) but haven't been
tested.
7. Incremental repair is not supported (see
http://www.datastax.com/dev/blog/more-efficient-repairs)
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This adds the API implementation for the read, write, number of
panding flushes and memtable switch count.
The implementation uses a helper function to perform map and map_reduce
on column_family.
The get_uuid helper method now supports both colon notations (i.e.
either as a ":" or as %3A)
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>