scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-21 15:22:13 +00:00

Files

Patryk Jędrzejczak 1ed3f5c4af Merge 'storage_service: cancel write handlers during drain to prevent shutdown deadlock' from Petr Gusev

Fixes a shutdown deadlock where a node hangs because `stale_versions_in_use()` blocks on stale `token_metadata` versions held by write handlers whose `MUTATION_DONE` responses can never arrive (transport is already stopped).

Two manifestations depending on whether the shutting-down node is the topology coordinator:
- Coordinator: do_drain → wait_for_group0_stop deadlocks because the topology coordinator fiber is stuck in barrier_and_drain → stale_versions_in_use().
- Non-coordinator: ss::stop → uninit_messaging_service deadlocks because the barrier_and_drain RPC handler holds the gate open.

The non-coordinator case was fixed in PR #24714 (cancel all write requests on storage_proxy shutdown), but its test never actually failed — the write handler always captured the current token_metadata version because `pause_before_barrier_and_drain` used `one_shot=True,` so only the first `barrier_and_drain` was paused. The topology state hadn't advanced by that point, meaning the write handler's ERM version matched the current version and `stale_versions_in_use()` returned immediately. The coordinator case was not covered at all.

Cancel all write response handlers on all shards right after `stop_transport()` in `do_drain()`. This releases their ERMs and the associated stale token_metadata versions, unblocking `stale_versions_in_use()`.

Fixed the test to ensure the write handler holds a stale version: use one_shot=False, let the first barrier_and_drain through (version still current), then wait for the second one (version now stale). Extended to cover both coordinator and non-coordinator shutdown on the same 2-node cluster.

Also includes supporting changes:
- error_injection: release wait_for_message waiters on disable() so the test can atomically unblock paused handlers
- error_injection: add non-shared mode to wait_for_message for per-invocation message semantics
- scylla_cluster.py: allow stop() to bypass start_stop_lock so SIGKILL works while stop_gracefully is blocked

Fixes: SCYLLADB-1842
Refs: scylladb/scylladb#23665

backports: SCYLLADB-1842 reported a failure in 2025.1, so we need to backport to all versions starting from 2025.1

Closes scylladb/scylladb#29882

* https://github.com/scylladb/scylladb:
  storage_service: cancel write handlers during drain to prevent shutdown deadlock
  test_unfinished_writes_during_shutdown: extend to cover coordinator shutdown
  test_unfinished_writes_during_shutdown: fix to reproduce the shutdown deadlock
  test_unfinished_writes_during_shutdown: await add_last_node_task instead of cancelling it
  test_unfinished_writes_during_shutdown: add timeout and deadlock detection for shutdown_task
  test: scylla_cluster: allow stop() to bypass start_stop_lock
  error_injection: add non-shared mode to wait_for_message
  error_injection: release waiters when injection is disabled

2026-05-21 15:43:36 +02:00

auth_cluster

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

dtest

Merge 'test: limits: optimize test_max_cells to avoid large allocations and fragmentation' from Dario Mirovic

2026-05-15 18:12:48 +02:00

lwt

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

object_store

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

random_failures

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

storage

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

tasks

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

__init__.py

…

conftest.py

test.py: rewrite resource gather

2026-05-18 12:23:40 +02:00

test_aggregation.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_alternator_proxy_protocol.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_alternator.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_audit.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_automatic_cleanup.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_bad_initial_token.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_batchlog_manager.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_blocked_bootstrap.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_boot_nodes.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_bootstrap_with_quick_group0_join.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_bti_index.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_cdc_generation_clearing.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_cdc_generation_data.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_cdc_generation_publishing.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_cdc_with_alter.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_cdc_with_tablets.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_change_ip.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_change_replication_factor_1_to_0.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_change_rpc_address.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_client_routes.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_cluster_features.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_commitlog_segment_data_resurrection.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_commitlog.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_compaction_backpressure.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_concurrent_schema.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_config_live_updates.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_config.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_config.yaml

test: remove dead suite subclasses and legacy execution pipeline

2026-05-17 22:16:31 +03:00

test_conflicting_keys_read_repair.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_coordinator_queue_management.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_counter_write_timeout_metric.py

…

test_counters_with_tablets.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_crash_coordinator_before_streaming.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_create_table_during_node_shutdown.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_data_resurrection_after_cleanup.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_data_resurrection_in_memtable.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_decommission_kill_then_replace.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_decommission.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_deprecating_cluster_features.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_describe.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_different_group0_ids.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_encryption.py

…

test_ensure_committed_by_group0.py

schema: ensure committed_by_group0 is set for all non-system tables on boot

2026-05-21 10:22:07 +02:00

test_error_becoming_voter.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_failure_after_group0_server_registration.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_fencing.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_global_ignore_nodes.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_gossiper_empty_self_id_on_shadow_round.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_gossiper_orphan_remover.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_gossiper_race.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_gossiper.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_group0_recovers_after_partial_command_application.py

…

test_guardrails.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_hints.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_incremental_repair.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_initial_token.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_internode_compression.py

…

test_ip_mappings.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_keyspace_rf.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_left_node_notification.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_logstor.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_long_join.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_long_query_timeout_erm.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_lwt_semaphore.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_maintenance_mode.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_major_compaction.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_metadata_id.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_multidc.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_mutation_schema_change.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_mv.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_no_dc_rack_change.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_no_removed_node_event_on_ip_change.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_node_isolation.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_node_ops_metrics.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_node_shutdown_waits_for_pending_requests.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_nodetool.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_not_enough_token_owners.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_prepare_race.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_proxy_protocol.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_query_rebounce.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_cluster_features.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_ignore_nodes.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_no_quorum.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_recovery_during_join.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_recovery_entry_loss.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_recovery_user_data.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_snapshot_request.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_raft_snapshot_truncation.py

test: fix flaky test_raft_snapshot_truncation by waiting for async log truncation

2026-05-21 10:50:00 +03:00

test_raft_voters.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_random_tables.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_read_repair.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_refresh.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_remove_alive_node.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_remove_rpc_client_with_pending_requests.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_repair.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_replace_alive_node.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_replace_with_encryption.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_replace_with_same_ip_twice.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_replace.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_replica_exceptions.py

…

test_rest_api_on_startup.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_restart_cluster.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_resurrection.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_reversed_queries_during_simulated_upgrade_process.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_rpc_compression.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_select_from_mutation_fragments.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_shutdown_hang.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_size_based_load_balancing.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_snapshot_with_tablets.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_snapshot.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_sstable_cleanup_stop.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_sstable_compression_config.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_sstable_compression_dictionaries_autotrain.py

…

test_sstable_compression_dictionaries_basic.py

…

test_sstable_compression_dictionaries_upgrade.py

test: update get_scylla_2025_1_executable() to use 2025.1.12

2026-05-12 23:20:55 +02:00

test_sstable_set.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_start_bootstrapped_with_invalid_seed.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_streaming_deadlock.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_strong_consistency.py

strong_consistency: cache leader location for non-replica nodes

2026-05-21 10:32:56 +02:00

test_table_desc_read_barrier.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_table_drop.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablet_repair_scheduler.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablet_stats.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets2.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_colocation.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_cql.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_intranode.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_lwt.py

Merge 'storage_service: cancel write handlers during drain to prevent shutdown deadlock' from Petr Gusev

2026-05-21 15:43:36 +02:00

test_tablets_merge.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_migration.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_parallel_decommission.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets_removenode.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tablets.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tls.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tombstone_gc.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_tools_perf.py

test/cluster: remove now-redundant expected_server_up_state=SERVING

2026-05-05 18:56:37 +03:00

test_topology_failure_recovery.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_ops_encrypted.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_ops_with_rf_rack_valid.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_ops.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_rejoin.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_remove_decom.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_schema.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_topology_smp.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_truncate_concurrent_writes.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_truncate_with_drop.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_truncate_with_tablets.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_ttl_row.py

test: wait for TTL scheduling sanity metric

2026-05-12 12:38:25 +03:00

test_unfinished_writes_during_shutdown.py

Merge 'storage_service: cancel write handlers during drain to prevent shutdown deadlock' from Petr Gusev

2026-05-21 15:43:36 +02:00

test_uninitialized_conns_semaphore.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_vector_store.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_view_build_status.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_view_building_coordinator.py

Merge 'db/view/view_building_coordinator: add flag to mark if any remote work was finished' from Michał Jadwiszczak

2026-05-21 15:11:58 +02:00

test_vnodes_to_tablets_migration.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_write_query_during_cql_server_shutdown.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_writes_to_previous_cdc_generations.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_zero_token_nodes_multidc.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_zero_token_nodes_no_replication.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

test_zero_token_nodes_topology_ops.py

test.py: remove redundant pytest.mark.asyncio decorators

2026-05-21 10:36:47 +03:00

util.py

test: fix flaky test_kill_coordinator_during_op

2026-04-30 21:27:56 +03:00