scylladb/service at ae0208e35cf36db33be0e4eae7d88689a91fd576 - scylladb - Anomalous Gitea

mirrors/scylladb

Files

History

Botond Dénes 5573c3b18e Merge 'tablets: Fix deadlock in background storage group merge fiber' from Tomasz Grabiec

When it deadlocks, groups stop merging and compaction group merge
backlog will run-away.

Also, graceful shutdown will be blocked on it.

Found by flaky unit test
test_merge_chooses_best_replica_with_odd_count, which timed-out in 1
in 100 runs.

Reason for deadlock:

When storage groups are merged, the main compaction group of the new
storage group takes a compaction lock, which is appended to
_compaction_reenablers_for_merging, and released when the merge
completion fiber is done with the whole batch.

If we accumulate more than 1 merge cycle for the fiber, deadlock
occurs. Lock order will be this

Initial state:

 cg0: main
 cg1: main
 cg2: main
 cg3: main

After 1st merge:

 cg0': main [locked], merging_groups=[cg0.main, cg1.main]
 cg1': main [locked], merging_groups=[cg2.main, cg3.main]

After 2nd merge:

 cg0'': main [locked], merging_groups=[cg0'.main [locked], cg0.main, cg1.main, cg1'.main [locked], cg2.main, cg3.main]

merge completion fiber will try to stop cg0'.main, which will be
blocked on compaction lock. which is held by the reenabler in
_compaction_reenablers_for_merging, hence deadlock.

The fix is to wait for background merge to finish before we start the
next merge. It's achieved by holding old erm in the background merge,
and doing a topology barrier from the merge finalizing transition.

Background merge is supposed to be a relatively quick operation, it's
stopping compaction groups. So may wait for active requests. It
shouldn't prolong the barrier indefinitely.

Tablet tests which trigger merge need to be adjusted to call the
barrier, otherwise they will be vulnerable to the deadlock.

Fixes SCYLLADB-928

Backport to >= 2025.4 because it's the earliest vulnerable due to f9021777d8.

Closes scylladb/scylladb#29007

* github.com:scylladb/scylladb:
  tablets: Fix deadlock in background storage group merge fiber
  replica: table: Propagate old erm to storage group merge
  test: boost: tablets_test: Save tablet metadata when ACKing split resize decision
  storage_service: Extract local_topology_barrier()

2026-03-20 09:05:52 +02:00

..

broadcast_tables/experimental

treewide: Move replica related files to replica directory

2025-09-18 08:00:35 +03:00

direct_failure_detector

treewide: #include Seastar headers with angle brackets

2026-01-13 14:56:15 +02:00

mv: allow setting concurrency in PRUNE MATERIALIZED VIEW

2025-11-27 00:02:28 +01:00

Merge 'cql3: pin prepared cache entry in prepare() to avoid invalid weak handle race' from Alex Dathskovsky

2026-03-11 12:09:23 +01:00

service level: fix crash during migration to driver server level

2026-03-13 11:24:26 +01:00

raft: service: reload auth cache before service levels

2026-03-18 09:06:20 +01:00

strong_consistency

strong consistency: redirect requests to live replicas from the same rack

2026-03-12 17:48:54 +01:00

address_map.hh

code: Stop using seastar::compat::source_location

2025-11-27 19:10:11 +02:00

cache_hitrate_calculator.hh

…

cas_shard.hh

…

client_routes.cc

db: api: service: Fix ClientConnectorError in test_client_routes The bug was caused by capturing local variables by reference in lambdas passed to with_retry(), which is a coroutine. When the coroutine suspends, the lambda frame exits and the referenced locals are destroyed, leading to use-after-lifetime issues. This change fixes the problem by ensuring safe ownership across suspension points and also refactors how route_keys and route_entries are passed from the caller. Previously they were passed as const lvalue references, which cannot be moved and therefore ended up being repeatedly copied across function calls and lambda invocations. The new approach avoids unnecessary copies and makes the lifetime semantics explicit and safe.

2025-12-22 14:52:47 +02:00

client_routes.hh

db: api: service: Fix ClientConnectorError in test_client_routes The bug was caused by capturing local variables by reference in lambdas passed to with_retry(), which is a coroutine. When the coroutine suspends, the lambda frame exits and the referenced locals are destroyed, leading to use-after-lifetime issues. This change fixes the problem by ensuring safe ownership across suspension points and also refactors how route_keys and route_entries are passed from the caller. Previously they were passed as const lvalue references, which cannot be moved and therefore ended up being repeatedly copied across function calls and lambda invocations. The new approach avoids unnecessary copies and makes the lifetime semantics explicit and safe.

2025-12-22 14:52:47 +02:00

client_state.cc

Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros

2026-03-13 15:03:10 +01:00

client_state.hh

Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros

2026-03-13 15:03:10 +01:00

CMakeLists.txt

service level: remove version 1 service level code

2026-03-10 10:46:48 +02:00

endpoint_lifecycle_subscriber.hh

service: transport: add CLIENT_ROUTES_CHANGE event

2025-12-15 18:19:37 +01:00

load_broadcaster.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

load_meter.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

maintenance_mode.hh

…

mapreduce_service.cc

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

mapreduce_service.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

memory_limiter.hh

…

migration_listener.hh

migration_listener: fix deadlock in nested notifications

2025-12-17 14:00:28 +01:00

migration_manager.cc

Merge 'Remove the rest of pre raft topology code' from Gleb Natapov

2026-03-11 10:24:20 +02:00

migration_manager.hh

migration_manager: remove unused forward definitions

2026-03-10 10:46:48 +02:00

misc_services.cc

replica/table: keep track of total pre-compression file size

2025-11-13 00:49:57 +01:00

query_state.hh

…

session.cc

…

session.hh

…

state_id.hh

…

storage_proxy_fwd.hh

storage_proxy: introduce node_local_only flag

2025-07-24 19:48:08 +02:00

storage_proxy_stats.hh

storage_proxy_stats: add fenced_out_requests metric

2025-09-15 11:24:53 +02:00

storage_proxy.cc

Merge 'Improve debuggability of test/cluster/test_data_resurrection_in_memtable.py' from Botond Dénes

2026-03-17 13:35:19 +01:00

storage_proxy.hh

storage_proxy: Add snapshot_keyspace method

2026-02-23 11:27:15 +01:00

storage_service.cc

Merge 'tablets: Fix deadlock in background storage group merge fiber' from Tomasz Grabiec

2026-03-20 09:05:52 +02:00

storage_service.hh

Merge 'tablets: Fix deadlock in background storage group merge fiber' from Tomasz Grabiec

2026-03-20 09:05:52 +02:00

tablet_allocator_fwd.hh

…

tablet_allocator.cc

tablet options: Add max_tablet_count tablet option to enforce tablet count upper bounds

2026-03-03 11:19:24 +03:00

tablet_allocator.hh

Merge 'Improve load balancer logging and other minor cleanups' from Tomasz Grabiec

2026-01-29 08:25:17 +02:00

tablet_operation.hh

…

task_manager_module.cc

tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children

2026-03-18 15:37:24 +01:00

task_manager_module.hh

tasks: service: add global_topology_request_virtual_task

2025-12-16 13:31:22 +01:00

topology_coordinator.cc

tablets: Fix deadlock in background storage group merge fiber

2026-03-12 22:45:01 +01:00

topology_coordinator.hh

topology_coordinator: pass raft_topology_cmd by value

2026-02-16 08:57:42 +01:00

topology_guard.hh

…

topology_mutation.cc

cdc: drop usage of cdc_local table and v1 generation definition

2026-03-10 10:39:59 +02:00

topology_mutation.hh

cdc: drop usage of cdc_local table and v1 generation definition

2026-03-10 10:39:59 +02:00

topology_state_machine.cc

raft topology: drop upgrade_state and its type from the topology state machine since it is not used any longer

2026-03-10 10:09:39 +02:00

topology_state_machine.hh

cdc: drop usage of cdc_local table and v1 generation definition

2026-03-10 10:39:59 +02:00

view_update_backlog_broker.hh

…