storage_service: allow concurrent tablet migration in tablets/move API

Currently it waits for topology state machine to be idle, so it allows
one tablet to be moved at a time. We should allow it to start migration
if the current transition state is

- topology::transition_state::tablet_migration or
- topology::transition_state::tablet_draining

to allow starting parallel tablet movement. That will be useful when
scripting a custom rebalancing algorithm.

in this change, we wait until the topology state machine is idle or
it is at either of the above two states.

Fixes #16437
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17203
This commit is contained in:
Kefu Chai
2024-02-07 19:25:29 +08:00
committed by Tomasz Grabiec
parent 082ad51b71
commit 876478b84f

View File

@@ -5228,7 +5228,12 @@ future<> storage_service::move_tablet(table_id table, dht::token token, locator:
auto guard = co_await _group0->client().start_operation(&_abort_source);
while (_topology_state_machine._topology.is_busy()) {
rtlogger.debug("move_tablet(): topology state machine is busy");
const auto tstate = *_topology_state_machine._topology.tstate;
if (tstate == topology::transition_state::tablet_draining ||
tstate == topology::transition_state::tablet_migration) {
break;
}
rtlogger.debug("move_tablet(): topology state machine is busy: {}", tstate);
release_guard(std::move(guard));
co_await _topology_state_machine.event.wait();
guard = co_await _group0->client().start_operation(&_abort_source);
@@ -5241,6 +5246,9 @@ future<> storage_service::move_tablet(table_id table, dht::token token, locator:
auto last_token = tmap.get_last_token(tid);
auto gid = locator::global_tablet_id{table, tid};
if (tmap.get_tablet_transition_info(tid)) {
throw std::runtime_error(format("Tablet {} is in transition", gid));
}
if (!locator::contains(tinfo.replicas, src)) {
throw std::runtime_error(format("Tablet {} has no replica on {}", gid, src));
}