scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Files

Asias He 4e7202ee32 repair: Fix deadlock when topology coordinator steps down in the middle

Consider this:

1) n1 is the topology coordinator
2) n1 schedules and executes a tablet repair with session id s1 for a
tablet on n3 an n4.
3) n3 and n4 take and store the in _rs._repair_compaction_locks[s1]
4) n1 steps down before it executes
locator::tablet_transition_stage::end_repair
5) n2 becomes the new topology coordinator
6) n2 runs locator::tablet_transition_stage::repair again
7) n3 and n4 try to take the lock again and hangs since the lock is
already taken.

To avoid the deadlock, we can throw in step 7 so that n2 will
proceed to end_repair stage and release the lock. After that, the
scheduler could schedule the tablet repair request again.

Fixes #26346

Closes scylladb/scylladb#27163

(cherry picked from commit da5cc13e97)

Closes scylladb/scylladb#27337

2025-12-01 13:06:02 +01:00

auth_cluster

test: add test_anonymous_user to test_raft_service_levels

2025-10-30 20:10:47 +01:00

dtest

test: dtest: limits_test.py: test_max_cells log level

2025-10-01 22:40:34 +02:00

lwt

tests(lwt): new test for LWT testing during tablet resize

2025-11-04 12:47:24 +01:00

alternator: Fix tag name to request vnodes

2025-11-26 20:42:07 +02:00

object_store

streaming: fix loop break condition in tablet_sstable_streamer::stream

2025-11-25 11:42:34 +02:00

random_failures

db/view: Require rf_rack_valid_keyspaces when creating view

2025-10-06 13:19:54 +00:00

tasks

test: Fix drain api in task_manager_client.py

2025-08-11 10:10:07 +08:00

__init__.py

…

conftest.py

pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py

2025-09-15 12:34:45 +02:00

suite.yaml

test: dtest: limits_test.py: make the tests work

2025-10-01 22:40:29 +02:00

test_aggregation.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_alternator.py

test/cluster: modify test to not fail on 2025.4 branch

2025-11-26 20:42:09 +02:00

test_automatic_cleanup.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_bad_initial_token.py

test: cluster: add test_bad_initial_token

2025-04-25 12:25:15 +02:00

test_batchlog_manager.py

db/batchlog: Drop batch if table has been dropped

2025-09-23 07:48:59 +02:00

test_blocked_bootstrap.py

…

test_boot_after_ip_change.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_boot_nodes.py

test: Add test_boot_nodes.py

2025-07-10 10:56:53 +08:00

test_bti_index.py

test/cluster/test_bti_index.py: avoid a race with CQL tracing

2025-10-20 10:32:58 +03:00

test_cdc_generation_clearing.py

test_cdc_generation_clearing: wait for generations to propagate

2025-06-09 12:59:04 +02:00

test_cdc_generation_data.py

raft_group0: split shutdown into abort_and_drain and destroy

2025-07-25 17:16:14 +02:00

test_cdc_generation_publishing.py

test_cdc_generation_publishing: fix to read monotonically

2025-05-30 08:35:56 +02:00

test_cdc_with_alter.py

test: test concurrent writes with column drop with cdc preimage

2025-11-16 09:29:27 +01:00

test_cdc_with_tablets.py

test: cdc: extend cdc with tablets tests

2025-10-30 02:44:47 +00:00

test_change_ip.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_change_replication_factor_1_to_0.py

test: cluster: deflake consistency checks after decommission

2025-09-09 19:01:12 +02:00

test_change_rpc_address.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_cluster_features.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_commitlog_segment_data_resurrection.py

…

test_commitlog.py

…

test_concurrent_schema.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_config_live_updates.py

test: add test for live updates of generic server config

2025-06-23 17:56:26 +02:00

test_config.py

…

test_conflicting_keys_read_repair.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_coordinator_queue_management.py

test.py: rework log_browsing for dtest migration

2025-05-19 11:50:55 +00:00

test_crash_coordinator_before_streaming.py

…

test_data_resurrection_after_cleanup.py

test: cluster: deflake consistency checks after decommission

2025-09-09 19:01:12 +02:00

test_data_resurrection_in_memtable.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_decommission.py

test: cluster: deflake consistency checks after decommission

2025-09-09 19:01:12 +02:00

test_deprecating_cluster_features.py

…

test_describe.py

cql3: Represent create_statement using managed_string

2025-07-01 12:58:02 +02:00

test_different_group0_ids.py

test.py: rewrite the wait_for_first_completed

2025-10-22 18:12:52 +02:00

test_encryption.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_error_becoming_voter.py

…

test_fencing.py

test_fencing: add test_lwt_fencing_upgrade

2025-09-15 12:34:45 +02:00

test_global_ignore_nodes.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_gossip_boot.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_gossiper_empty_self_id_on_shadow_round.py

gossiper: fix empty initial local node state

2025-09-08 11:38:31 +02:00

test_gossiper_orphan_remover.py

gossiper: move force_remove_endpoint to work on host id

2025-04-06 18:39:24 +03:00

test_gossiper_race.py

gossiper: check for a race condition in do_apply_state_locally

2025-09-08 11:38:30 +02:00

test_gossiper.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_group0_schema_versioning.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_hints.py

test.py: rewrite the wait_for_first_completed

2025-10-22 18:12:52 +02:00

test_incremental_repair.py

repair: Fix deadlock when topology coordinator steps down in the middle

2025-12-01 13:06:02 +01:00

test_initial_token.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_ip_mappings.py

…

test_keyspace_rf.py

test/cqlpy: add keyspace creation default replication factor tests

2025-08-28 01:42:34 +02:00

test_long_join.py

test: improve async execution in test_long_join

2025-09-08 17:14:37 +02:00

test_long_query_timeout_erm.py

test.py: rewrite the wait_for_first_completed

2025-10-22 18:12:52 +02:00

test_lwt_semaphore.py

…

test_maintenance_mode.py

test/cluster/test_maintenance_mode.py: Wait for initialization

2025-11-15 22:11:06 +00:00

test_major_compaction.py

compaction: fix use after free when strategy is altered during compaction

2025-10-21 00:59:33 +00:00

test_metadata_id.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_multidc.py

Merge 'cql3: Warn when creating RF-rack-invalid keyspace' from Dawid Mędrek

2025-08-22 11:33:32 +02:00

test_mutation_schema_change.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_mv.py

tombstone_gc: don't use 'repair' mode for colocated tables

2025-11-26 08:36:52 +01:00

test_no_dc_rack_change.py

test: cluster: introduce test_no_dc_rack_change

2025-04-17 16:22:58 +02:00

test_no_removed_node_event_on_ip_change.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_node_isolation.py

tiering (test.py): introduce tiering labels

2025-08-04 15:38:16 +03:00

test_node_ops_metrics.py

test/pylib/rest_client: fix ScyllaMetrics filtering

2025-08-10 10:16:00 +02:00

test_node_shutdown_waits_for_pending_requests.py

…

test_nodetool.py

…

test_not_enough_token_owners.py

tablets: scheduler: Balance racks separately when rf_rack_valid_keyspaces is true

2025-09-23 00:30:37 +02:00

test_query_rebounce.py

…

test_raft_cluster_features.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_raft_fix_broken_snapshot.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_ignore_nodes.py

…

test_raft_no_quorum.py

test: test_raft_no_quorum: test_can_restart: deflake the read barrier call

2025-10-12 21:02:02 +03:00

test_raft_recovery_basic.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_recovery_during_join.py

test: deflake driver reconnections in the recovery procedure tests

2025-09-22 17:21:06 +02:00

test_raft_recovery_entry_loss.py

test: test_raft_recovery_entry_loss: fix the typo in the test case name

2025-10-17 10:27:33 +00:00

test_raft_recovery_majority_loss.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_recovery_stuck.py

test: test_raft_recovery_stuck: ensure mutual visibility before using driver

2025-11-20 10:36:54 +02:00

test_raft_recovery_user_data.py

test: deflake driver reconnections in the recovery procedure tests

2025-09-22 17:21:06 +02:00

test_raft_snapshot_request.py

…

test_raft_snapshot_truncation.py

…

test_raft_voters.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_random_tables.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_read_repair.py

test/cluster/test_read_repair: write 100 rows in trace test

2025-06-27 16:23:08 +03:00

test_refresh.py

Add nodetool refresh --scope option

2025-05-29 16:12:09 +03:00

test_remove_alive_node.py

…

test_remove_rpc_client_with_pending_requests.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_repair.py

test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints

2025-08-21 11:48:17 +03:00

test_replace_alive_node.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_replace_ignore_nodes.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_replace_with_encryption.py

…

test_replace_with_same_ip_twice.py

…

test_replace.py

test.py: rework log_browsing for dtest migration

2025-05-19 11:50:55 +00:00

test_restart_cluster.py

…

test_resurrection.py

test: port of test and reproducer for resurrection during file based streaming

2025-03-30 13:39:40 +03:00

test_reversed_queries_during_simulated_upgrade_process.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_rpc_compression.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_select_from_mutation_fragments.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_shutdown_hang.py

…

test_snapshot.py

test: add type creation to test_snapshot

2025-07-10 10:46:55 +02:00

test_sstable_cleanup_stop.py

compaction: Fix stop of sstable cleanup

2025-09-11 08:55:10 +03:00

test_sstable_compression_config.py

test/cluster: Add test for default SSTable compressor

2025-11-04 15:41:40 +02:00

test_sstable_compression_dictionaries_autotrain.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_sstable_compression_dictionaries_basic.py

db/config: Deprecate sstable_compression_dictionaries_allow_in_ddl

2025-11-04 15:40:46 +02:00

test_sstable_compression_dictionaries_upgrade.py

pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py

2025-09-15 12:34:45 +02:00

test_sstable_set.py

test: Verify partitioned set store split and unsplit correctly

2025-04-29 15:47:33 -03:00

test_start_bootstrapped_with_invalid_seed.py

…

test_streaming_deadlock.py

test: limit test_streaming_deadlock_removenode concurrency

2025-09-19 12:50:20 +03:00

test_table_desc_read_barrier.py

…

test_table_drop.py

test: test table drop during flush

2025-04-23 14:29:28 +02:00

test_tablet_repair_scheduler.py

repair: Avoid too many fragments in a single repair_row_on_wire

2025-07-29 13:43:53 +08:00

test_tablet_stats.py

topology_coordinator: Make tablet_load_stats_refresh_interval configurable

2025-07-31 14:31:55 +03:00

test_tablets2.py

replica: Fail timed-out single-key read on cleaned up tablet replica

2025-11-21 17:50:21 +03:00

test_tablets_colocation.py

tombstone_gc: don't use 'repair' mode for colocated tables

2025-11-26 08:36:52 +01:00

test_tablets_cql.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_tablets_intranode.py

…

test_tablets_lwt.py

test_tablets_lwt: add test_tablets_merge_waits_for_lwt

2025-10-24 12:22:20 +02:00

test_tablets_merge.py

test_tablets_merge: test_tablet_split_merge_with_many_tables: reduce number of tables in debug mode

2025-09-29 15:30:13 +03:00

test_tablets_migration.py

test: enable load balancing on a single node in test_restart_leaving_replica_during_cleanup

2025-09-11 13:19:56 +02:00

test_tablets_removenode.py

test/cluster: Disable rf_rack_valid_keyspaces in problematic tests

2025-05-10 16:30:49 +02:00

test_tablets.py

topology_coordinator: fix log message

2025-10-24 12:21:21 +02:00

test_tls.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_tombstone_gc.py

test: test group0 tombstone GC in the Raft-based recovery procedure

2025-10-22 17:13:34 +00:00

test_topology_failure_recovery.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_ops_encrypted.py

test: cluster: deflake consistency checks after decommission

2025-09-09 19:01:12 +02:00

test_topology_ops.py

test: cluster: deflake consistency checks after decommission

2025-09-09 19:01:12 +02:00

test_topology_recovery_basic.py

test.py: apply the nightly label on test_topology_recovery_basic

2025-09-01 14:16:29 +02:00

test_topology_recovery_majority_loss.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_topology_rejoin.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_topology_remove_decom.py

raft_topology: Modify the conditional logic in remove node operation to enhance concurrency for raft enabled clusters.

2025-09-17 15:23:32 +05:30

test_topology_remove_garbage_group0.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_schema.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_topology_smp.py

test/cluster: Adjust simple tests to RF-rack-validity

2025-05-10 16:30:18 +02:00

test_topology_upgrade_not_stuck_after_recent_removal.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_upgrade_stuck.py

test.py: rewrite the wait_for_first_completed

2025-10-22 18:12:52 +02:00

test_topology_upgrade.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_truncate_concurrent_writes.py

truncate: add test for truncate with concurrent writes

2025-08-05 13:54:14 +02:00

test_truncate_with_drop.py

system_keyspace: Prune dropped tables from truncation on start/drop

2025-09-03 07:25:34 +03:00

test_truncate_with_tablets.py

topology coordinator: allow running multiple global commands in parallel

2025-06-11 11:29:33 +03:00

test_unfinished_writes_during_shutdown.py

storage_service: Cancel all write requests on storage_proxy shutdown

2025-07-22 15:03:30 +02:00

test_view_build_status.py

test/cluster: add view build status tests

2025-08-27 10:23:04 +02:00

test_view_building_coordinator.py

service/storage_service: migrate staging sstables in view building

2025-11-17 10:28:35 +00:00

test_write_query_during_cql_server_shutdown.py

generic_server: Two-step connection shutdown.

2025-07-28 10:08:06 +02:00

test_writes_to_previous_cdc_generations.py

…

test_zero_token_nodes_multidc.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_zero_token_nodes_no_replication.py

test/cluster/conftest: cluster_con: provide default values for port and use_ssl

2025-08-22 09:51:24 +03:00

test_zero_token_nodes_topology_ops.py

test/cluster/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity

2025-05-10 16:30:34 +02:00

util.py

Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"

2025-09-22 09:32:46 +03:00