Commit Graph

15 Commits

Author SHA1 Message Date
Tomasz Grabiec
551cc0233d tablets, raft topology: Add support for decommission with tablets
Load balancer will recognize decommissioning nodes and will
move tablet replicas away from such nodes with highest priority.

Topology changes have now an extra step called "tablet draining" which
calls the load balancer. The step will execute tablet migration track
as long as there are nodes which require draining. It will not do regular
load balancing.

If load balancer is unable to find new tablet replicas, because RF
cannot be met or availability is at risk due to insufficient node
distribution in racks, it will throw an exception. Currently, topology
change will retry in a loop. We should make this error cause topology
change to be paused so that admin becomes aware of the problem and
issues an abort on the topology change. There is no infrastructure for
aborts yet, so this is not implemented.
2023-09-14 13:05:49 +02:00
Tomasz Grabiec
8565af4dd3 tablet_allocator: Compute load sketch lazily
This allows any node to act as a target later.
2023-09-14 13:04:49 +02:00
Tomasz Grabiec
1c595ab7f4 tablet_allocator: Set node id correctly
It was unset and unused.
2023-09-14 13:04:49 +02:00
Tomasz Grabiec
389573543e tablet_allocator: Make migration_plan a class
It will be extended with more fields so that load balancer can
communicate more information to the coordinator.
2023-09-14 13:04:47 +02:00
Tomasz Grabiec
d5539e080d tablets: Implement cleanup step
This change adds a stub for tablet cleanup on the replica side and wires
it into the tablet migration process.

The handling on replica side is incomplete because it doesn't remove
the actual data yet. It only flushes the memtables, so that all data
is in sstables and none requires a memtable flush.

This patch is necessary to make decommission work. Otherwise, a
memtable flush would happen when the decommissioned node is put in the
drained state (as in nodetool drain) and it would fail on missing host
id mapping (node is no longer in topology), which is examined by the
tablet sharder when producing sstable sharding metadata. Leading to
abort due to failed memtable flush.
2023-09-14 12:45:10 +02:00
Tomasz Grabiec
f827cfd5b6 tablet_allocator: unregister metrics when leadership is lost
So that graphs are not polluted with stale metrics from past leaders.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
d653cbae53 tablets: load_balancer: Export metrics 2023-08-05 21:48:08 +02:00
Tomasz Grabiec
67c7aadded service, raft: Move balance_tablets() to tablet_allocator
The implementation will access metrics registered from tablet_allocator.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
cb0d763a22 tablet_allocator: Start even if tablets feature is not enabled
topology coordinator will call it. Rather than spreading ifs there,
it's simpler to start it and disable functionality in the tablet
allocator.
2023-08-05 21:48:08 +02:00
Tomasz Grabiec
3f221b1f05 tablets: load_balancer: Remove double logging 2023-07-31 01:45:23 +02:00
Tomasz Grabiec
96d06b58df tests: tablets: Check that load balancing is interrupted by topology change
We add a special mode of load balancing, enabled through error
injection, which causes it to continuously generate plans. This
should keep the topology coordinator continuously in the tablet
migration track.

We enable this mode in test_tablets.py:test_bootstrap before
bootstrapping nodes to see that bootstrap request interrupts
tablet migration track. If this would not be the case, the
test will hang.
2023-07-31 01:45:23 +02:00
Tomasz Grabiec
fe181b3bac tablets: Balance tablets concurrently with active migrations
After this change, the load balancer can make progress with active
migrations. If the algorithm is called with active tablet migrations
in tablet metadata, those are treated by load balancer as if they were
already completed. This allows the algorithm to incrementally make
decision which when executed with active migrations will produce the
desired result.

Overload of shards is limited by the fact that the algorithm tracks
streaming concurrency on both source and target shards of active
migrations and takes concurrency limit into account when producing new
migrations.

The coordinator executes the load balancer on edges of tablet state
machine stransitions. This allows new migrations to be started as soon
as tablets finish streaming.

The load balancer is also continuously invoked as long as it produces
a non-empty plan. This is in order to saturate the cluster with
streaming. A single make_plan() call is still not saturating, due
to the way algorithm is implemented.
2023-07-31 01:45:23 +02:00
Tomasz Grabiec
e338679266 tablets: Add formatter for tablet_migration_info 2023-07-31 01:45:23 +02:00
Tomasz Grabiec
6f4a35f9ae service: tablet_allocator: Introduce tablet load balancer
Will be invoked by the topology coordinator later to decide
which tablets to migrate.
2023-07-25 21:08:51 +02:00
Tomasz Grabiec
5e89f2f5ba service: Introduce tablet_allocator
Currently, responsible for injecting mutations of system.tablets to
schema changes.

Note that not all migrations are handled currently. Dependant view or
cdc table drops are not handled.
2023-04-24 10:49:37 +02:00