scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Nadav Har'El	07480c75e6	repair: use parallel_for_each instead of semaphore Requested by Avi. The added benefit is that the code for repairing all the ranges in parallel is now identical to the code of repairing the ranges one by one - just replace do_for_each with parallel_for_each, and no need for a different implementation using semaphores like I had before this patch. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-20 10:51:57 +03:00
Nadav Har'El	4e3dbef512	repair: conform to coding style Use "_" prefix on class member "status". Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-20 10:51:56 +03:00
Nadav Har'El	5a02eeaba9	v2: repair: track ongoing repairs [in v2: 1. Fixed a few small bugs. 2. Added rudementary support parallel/sequential repair. 3. Verified that code works correctly with Asias's fix to streaming] This patch adds the capability to track repair operations which we have started, and check whether they are still running or completed (successfully or unsuccessfully). As before one starts a repair with the REST api: curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1" where "try1" is the name of the keyspace. This returns a repair id - a small integer starting with 0. This patch adds support for similar request to query the status of a previously started repair, by adding the "id=..." option to the query, which enquires about the status of the repair with this id: For example., curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0" gets the current status of this repair 0. This status can be RUNNING, SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an invalid id is passed (not the id of any real repair that was previously started). This patch also adds two alternative code-paths in the main repair flow do_repair_start(): One where each range is repaired one after another, and one where all the ranges are repaired in parallel. At the moment, the enabled code is the parallel version, just as before this patch. But the will also be useful for implementing the "parallel" vs "sequential" repair options of Cassandra. Note that if you try to use repair, you are likely to run into a bug in the streaming code which results in Scylla either crashing or a repair hanging (never realising it finished). Asias already has a fix this this bug, and will hopefully publish it soon, but it is unrelated to the repair code so I think this patch can independently be committed. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-16 14:23:02 +03:00
Nadav Har'El	75384413f3	repair: fix use of handle_exception() handle_exception() should really discard the future's value automatically, and in an upcoming version of Seastar, won't. So instead of sp.execute().handle_exception(...) (where execute() returns a future which is not future<>) We need to write sp.execute().discard_result().handle_exception(...) This already works in today's Seastar (the extra discard_result() doesn't cause any harm), and will be necessary when handle_exception() in Seastar is improved (I'll send a patch soon). Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-12 17:46:41 +03:00
Nadav Har'El	a5ce8108f2	repair: add FIXME Add a FIXME about something I'm unsure about - does repair only need to repair this node, or also make an effort to also repair the other nodes (or more accurately, their specific token-ranges being repaired) if we're already communicating with them? Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-10 12:16:56 +03:00
Nadav Har'El	7a8ed228c7	repair: better error message If a stream failed, print a clear error message that repair failed, instead of ignoring it and letting Seastar's generic "warning, exception was ignored" be the only thing the user will see. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-10 12:16:56 +03:00
Nadav Har'El	71a3a0c026	repair: repair each local range separately The previous repair code exchanged data with the other nodes which have one arbitrary token. This will only work correctly when all the nodes replicate all the data. In a more realistic scenario, the node being repaired holds copies of several token ranges, and each of these ranges has a different set of replicas we need to perform the repair with. So this patch does the right thing - we perform a separate repair_range() for each of the local ranges, and each of those will find a (possibly) different set of nodes to communicate with. Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-10 12:16:55 +03:00
Nadav Har'El	34b1cc42cd	Initial repair support This patch adds the beginning of node repair support. Repair is initiated on a node using the REST API, for example to repair all the column families in the "try1" keyspace, you can use: curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1" I tested that the repair already works (exchanges mutations with all other replicas, and successfully repairs them), so I think can be committed, but will need more work to be completed 1. Repair options are not yet supported (range repair, sequential/parallel repair, choice of hosts, datacenters and column families, etc.). 2. All the data of the keyspace is exchanged - Merkle Trees (or an alternative optimization) and partial data exchange haven't been implemented yet. 3. Full repair for nodes with multiple separate ranges is not yet implemented correctly. E.g., consider 10 nodes with vnodes and RF=2, so each vnode's range has a different host as a replica, so we need to exchange each key range separately with a different remote host. 4. Our repair operation returns a numeric operation id (like Origin), but we don't yet provide any means to use this id to check on ongoing repairs like Origin allows. 5. Error hangling, logging, etc., needs to be improved. 6. SMP nodes (with multiple shards) should work correctly (thanks to Asias's latest patch for SMP mutation streaming) but haven't been tested. 7. Incremental repair is not supported (see http://www.datastax.com/dev/blog/more-efficient-repairs) Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>	2015-08-05 13:26:36 +03:00

8 Commits