This adds a stub implementation for the storge service metrics. The
implementation returns the currect type with a stub value.
After this patch the following url will be available:
/storage_service/metrics/load
/storage_service/metrics/exceptions
/storage_service/metrics/hints_in_progress
/storage_service/metrics/total_hints
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
"This adds the get_snitch_name and update_snitch functionality to the API. After
this series it would be possible to return the snitch name and to update the
snitch."
[in v2: 1. Fixed a few small bugs.
2. Added rudementary support parallel/sequential repair.
3. Verified that code works correctly with Asias's fix to streaming]
This patch adds the capability to track repair operations which we have
started, and check whether they are still running or completed (successfully
or unsuccessfully).
As before one starts a repair with the REST api:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
where "try1" is the name of the keyspace. This returns a repair id -
a small integer starting with 0. This patch adds support for similar
request to *query* the status of a previously started repair, by adding
the "id=..." option to the query, which enquires about the status of the
repair with this id: For example.,
curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0"
gets the current status of this repair 0. This status can be RUNNING,
SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an
invalid id is passed (not the id of any real repair that was previously
started).
This patch also adds two alternative code-paths in the main repair flow
do_repair_start(): One where each range is repaired one after another,
and one where all the ranges are repaired in parallel. At the moment, the
enabled code is the parallel version, just as before this patch. But the
will also be useful for implementing the "parallel" vs "sequential" repair
options of Cassandra.
Note that if you try to use repair, you are likely to run into a bug in
the streaming code which results in Scylla either crashing or a repair
hanging (never realising it finished). Asias already has a fix this this bug,
and will hopefully publish it soon, but it is unrelated to the repair code
so I think this patch can independently be committed.
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:
curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"
I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed
1. Repair options are not yet supported (range repair, sequential/parallel
repair, choice of hosts, datacenters and column families, etc.).
2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
alternative optimization) and partial data exchange haven't been
implemented yet.
3. Full repair for nodes with multiple separate ranges is not yet
implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
so each vnode's range has a different host as a replica, so we need
to exchange each key range separately with a different remote host.
4. Our repair operation returns a numeric operation id (like Origin),
but we don't yet provide any means to use this id to check on ongoing
repairs like Origin allows.
5. Error hangling, logging, etc., needs to be improved.
6. SMP nodes (with multiple shards) should work correctly (thanks to
Asias's latest patch for SMP mutation streaming) but haven't been
tested.
7. Incremental repair is not supported (see
http://www.datastax.com/dev/blog/more-efficient-repairs)
Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
config.hh changes rapidly, so don't force lots of recompiles by including it.
Need to place seed_provider_type in namespace scope, so we can forward
declare it for that.
It should not be called directly: externall callers should be calling flush()
instead.
To be sure it doesn't happen again, make seal_active_memtable private.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
In much of our column_families APIs, we need to pass a pointer to the database.
The only reason we do that, is so we can properly handle the commit log entries
after we seal the current memtables into sstables.
Now that we store a pointer to the commit log in the CF itself at the time it
is created, we no longer have to do it. As a result, the APIs are a lot
cleaner, with no gratuitous parameters.
My motivation for this was the flush method, but as a result, apply() also gets
cleaner.
Signed-off-by: Glauber Costa <glommer@cloudius-systems.com>
This adds the following implementation to the storage_service API:
get_leaving_nodes
get_moving_nodes
get_joining_nodes
get_all_data_file_locations
get_saved_caches_location
get_host_id_map
get_current_generation_number
get_keyspaces
force_keyspace_flush
force_keyspace_compaction
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds a stub implementation of the storage service, to simplify
future implementation, variables that should be used in the
implementation are taken and stored.
Implementation return the currect type, but with stub values.
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>
This adds implementation to the added storage service definitions.
After this patch, the following calls will be supported:
/storage_service/tokens
/storage_service/tokens/{endpoint}
/storage_service/commitlog
/storage_service/tokens_endpoint
Signed-off-by: Amnon Heiman <amnon@cloudius-systems.com>