Files
scylladb/service
Raphael S. Carvalho a2eed4bb45 service: Use optimistic replicas in all_sibling_tablet_replicas_colocated
all_sibling_tablet_replicas_colocated was using committed ti.replicas to
decide whether sibling tablets are co-located and merge can be finalized.
This caused a false non-co-located window when a co-located pair was moved
by the load balancer: as both tablets migrate together, their del_transition
commits may land in different Raft rounds. After the first commit, ti.replicas
diverge temporarily (one tablet shows the new position, the other the old),
causing all_sibling_tablet_replicas_colocated to return false. This clears
finalize_resize, allowing the load balancer to start new cascading migrations
that delay merge finalization by tens of seconds.

Fix this by using the optimistic replica view (trinfo->next when transitioning,
ti.replicas otherwise) — the same view the load balancer uses for load
accounting — so finalize_resize stays populated throughout an in-flight
migration and no spurious cascades are triggered.

Steps that lead to the problem:

1. Merge is triggered. The load balancer generates co-location migrations
   for all sibling pairs that are not yet on the same shard. Some pairs
   finish co-location before others.

2. Once all pairs are co-located in committed state,
   all_sibling_tablet_replicas_colocated returns true and finalize_resize
   is set. Meanwhile the load balancer may have already started a regular
   LB migration on one co-located pair (both tablets are stable and the
   load balancer is free to move them).

3. The LB migration moves both tablets together (colocated_tablets). Their
   two del_transition commits land in separate Raft rounds. After the first
   commit, ti.replicas[t1] = new position but ti.replicas[t2] = old position.

4. In this window, all_sibling_tablet_replicas_colocated sees the pair as
   NOT co-located, clears finalize_resize, and the load balancer generates
   new migrations for other tablets to rebalance the load that the pair
   move created.

5. Those new migrations can take tens of seconds to stream, keeping the
   coordinator in handle_tablet_migration mode and preventing
   maybe_start_tablet_resize_finalization from being called. The merge
   finalization is delayed until all those cascaded migrations complete.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-821.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1459.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#29465
2026-04-15 14:40:15 +03:00
..
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00
2026-04-12 19:46:33 +03:00