Commit Graph

8 Commits

Author SHA1 Message Date
Nadav Har'El
07480c75e6 repair: use parallel_for_each instead of semaphore
Requested by Avi. The added benefit is that the code for repairing
all the ranges in parallel is now identical to the code of repairing
the ranges one by one - just replace do_for_each with parallel_for_each,
and no need for a different implementation using semaphores like I had
before this patch.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-20 10:51:57 +03:00
Nadav Har'El
4e3dbef512 repair: conform to coding style
Use "_" prefix on class member "status".

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-20 10:51:56 +03:00
Nadav Har'El
5a02eeaba9 v2: repair: track ongoing repairs
[in v2: 1. Fixed a few small bugs.
        2. Added rudementary support parallel/sequential repair.
	3. Verified that code works correctly with Asias's fix to streaming]

This patch adds the capability to track repair operations which we have
started, and check whether they are still running or completed (successfully
or unsuccessfully).

As before one starts a repair with the REST api:

   curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"

where "try1" is the name of the keyspace. This returns a repair id -
a small integer starting with 0. This patch adds support for similar
request to *query* the status of a previously started repair, by adding
the "id=..." option to the query, which enquires about the status of the
repair with this id: For example.,

    curl -i -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1?id=0"

gets the current status of this repair 0. This status can be RUNNING,
SUCCESSFUL or FAILED, or a HTTP 400 "unknown repair id ..." in case an
invalid id is passed (not the id of any real repair that was previously
started).

This patch also adds two alternative code-paths in the main repair flow
do_repair_start(): One where each range is repaired one after another,
and one where all the ranges are repaired in parallel. At the moment, the
enabled code is the parallel version, just as before this patch. But the
will also be useful for implementing the "parallel" vs "sequential" repair
options of Cassandra.

Note that if you try to use repair, you are likely to run into a bug in
the streaming code which results in Scylla either crashing or a repair
hanging (never realising it finished). Asias already has a fix this this bug,
and will hopefully publish it soon, but it is unrelated to the repair code
so I think this patch can independently be committed.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-16 14:23:02 +03:00
Nadav Har'El
75384413f3 repair: fix use of handle_exception()
handle_exception() should really discard the future's value automatically,
and in an upcoming version of Seastar, won't. So instead of

	sp.execute().handle_exception(...)

(where execute() returns a future which is *not* future<>)
We need to write

	sp.execute().discard_result().handle_exception(...)

This already works in today's Seastar (the extra discard_result()
doesn't cause any harm), and will be necessary when handle_exception()
in Seastar is improved (I'll send a patch soon).

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-12 17:46:41 +03:00
Nadav Har'El
a5ce8108f2 repair: add FIXME
Add a FIXME about something I'm unsure about - does repair only need to
repair this node, or also make an effort to also repair the other nodes
(or more accurately, their specific token-ranges being repaired) if we're
already communicating with them?

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:56 +03:00
Nadav Har'El
7a8ed228c7 repair: better error message
If a stream failed, print a clear error message that repair failed, instead
of ignoring it and letting Seastar's generic "warning, exception was ignored"
be the only thing the user will see.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:56 +03:00
Nadav Har'El
71a3a0c026 repair: repair each local range separately
The previous repair code exchanged data with the other nodes which have
one arbitrary token. This will only work correctly when all the nodes
replicate all the data. In a more realistic scenario, the node being
repaired holds copies of several token ranges, and each of these ranges
has a different set of replicas we need to perform the repair with.

So this patch does the right thing - we perform a separate repair_range()
for each of the local ranges, and each of those will find a (possibly)
different set of nodes to communicate with.

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-10 12:16:55 +03:00
Nadav Har'El
34b1cc42cd Initial repair support
This patch adds the beginning of node repair support. Repair is initiated
on a node using the REST API, for example to repair all the column families
in the "try1" keyspace, you can use:

curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/repair_async/try1"

I tested that the repair already works (exchanges mutations with all other
replicas, and successfully repairs them), so I think can be committed,
but will need more work to be completed

 1. Repair options are not yet supported (range repair, sequential/parallel
    repair, choice of hosts, datacenters and column families, etc.).

 2. *All* the data of the keyspace is exchanged - Merkle Trees (or an
    alternative optimization) and partial data exchange haven't been
    implemented yet.

 3. Full repair for nodes with multiple separate ranges is not yet
    implemented correctly. E.g., consider 10 nodes with vnodes and RF=2,
    so each vnode's range has a different host as a replica, so we need
    to exchange each key range separately with a different remote host.

 4. Our repair operation returns a numeric operation id (like Origin),
    but we don't yet provide any means to use this id to check on ongoing
    repairs like Origin allows.

 5. Error hangling, logging, etc., needs to be improved.

 6. SMP nodes (with multiple shards) should work correctly (thanks to
    Asias's latest patch for SMP mutation streaming) but haven't been
    tested.

 7. Incremental repair is not supported (see
    http://www.datastax.com/dev/blog/more-efficient-repairs)

Signed-off-by: Nadav Har'El <nyh@cloudius-systems.com>
2015-08-05 13:26:36 +03:00