scylladb/test/cluster at 3ce7e250cc211f2b7b349f2bf84b49b77986ffdc - scylladb - Anomalous Gitea

mirrors/scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Files

History

Nadav Har'El 3ce7e250cc alternator: fix schema "concurrent modification" errors

In ScyllaDB, schema modification operations use "optimistic locking":
A schema operation reads the current schema, decides what it wants to do
and prepares changes to the schema, and then attempts to commit those
changes - but only if the schema hasn't changed since the first read.
If the schema has already been changed by some other node - we need to
try again. In a loop.

In Alternator, there are six operations that perform schema modification:
CreateTable, DeleteTable, UpdateTable, TagResource, UntagResource and
UpdateTimeToLive. All of them were missing this loop. We knew about
this - and even had FIXME in all places. So all these operations,
when facing contention of concurrent schema modifications on different
nodes may fail one of these operations with an error like:

   Internal server error: service::group0_concurrent_modification
   (Failed to apply group 0 change due to concurrent modification).

This problem had very minor effect, if any, on real users because the
DynamoDB SDK automatically retries operations that fail with retryable
errors - like this "Internal server error" - and most likely the schema
operation will succeed upon retry. However, as shown in issue #13152
these failures were annoying in our CI, where tests - which disable
request retries - failed on these errors.

This patch fixes all six operations (the last three operations all
use one common function, db::modify_tags(), so are fixed by one
change) to add the missing loop.

The patch also includes reproducing tests for all these operations -
the new tests all fail before this patch, and pass with it.

These new tests are much more reliable reproducers than the dtests
we had that only sometimes - very rarely - reproduced the problem.
Moreover, the new tests reproduces the bug seperately for each of the
six operations, so if we forget to fix one of the six operations, one
of the tests would have continued to fail. Of course I checked this
during development.

The new tests are in the test/cluster framework, not test/alternator,
because this problem can only be reproduced in a multi-node cluster:
On a single node, it serializes its schema modifications on its own;
The collisions only happen when more than one node attempts schema
modifications at the same time.

Fixes #13152

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23827

2025-05-05 09:59:08 +03:00

..

Merge 'Add tablet enforcing option' from Benny Halevy

2025-04-03 16:32:19 +03:00

Merge 'mv: make base_info in view schemas immutable' from Wojciech Mitros

2025-04-27 19:12:12 +03:00

backup: Add test for invalid endpoint

2025-04-17 16:31:43 +03:00

random_failures

raft/test: disable the stop_before_becoming_raft_voter test

2025-04-07 12:23:25 +02:00

test_tablet_tasks: use injection to revoke resize

2025-04-30 07:04:57 +03:00

__init__.py

…

conftest.py

test.py: fix gathering logs in case of fail

2025-04-21 13:12:35 +03:00

suite.yaml

Merge 'Add tablet enforcing option' from Benny Halevy

2025-04-03 16:32:19 +03:00

test_aggregation.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_alternator.py

alternator: fix schema "concurrent modification" errors

2025-05-05 09:59:08 +03:00

test_automatic_cleanup.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_bad_initial_token.py

test: cluster: add test_bad_initial_token

2025-04-25 12:25:15 +02:00

test_blocked_bootstrap.py

…

test_boot_after_ip_change.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_cdc_generation_clearing.py

…

test_cdc_generation_data.py

…

test_cdc_generation_publishing.py

…

test_change_ip.py

…

test_change_replication_factor_1_to_0.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_change_rpc_address.py

…

test_cluster_features.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_commitlog_segment_data_resurrection.py

…

test_commitlog.py

…

test_compacting_reader_tombstone_gc.py

…

test_concurrent_schema.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_config.py

…

test_conflicting_keys_read_repair.py

…

test_coordinator_queue_management.py

…

test_crash_coordinator_before_streaming.py

…

test_data_resurrection_after_cleanup.py

…

test_data_resurrection_in_memtable.py

test/cluster: add test_data_resurrection_in_memtable.py

2025-04-08 00:11:36 -04:00

test_decommission.py

…

test_deprecating_cluster_features.py

…

test_different_group0_ids.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_encryption.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_error_becoming_voter.py

…

test_fencing.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_global_ignore_nodes.py

…

test_gossip_boot.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_gossiper_orphan_remover.py

gossiper: move force_remove_endpoint to work on host id

2025-04-06 18:39:24 +03:00

test_gossiper.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_group0_schema_versioning.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_hints.py

db/hints: Cancel draining when stopping node

2025-03-13 11:55:15 +02:00

test_initial_token.py

…

test_ip_mappings.py

…

test_long_join.py

…

test_lwt_semaphore.py

…

test_maintenance_mode.py

…

test_major_compaction.py

…

test_multidc.py

test/cluster/test_multidc: Clean up RF-rack-valid keyspaces tests

2025-03-31 09:38:42 +03:00

test_mutation_schema_change.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_mv.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_no_dc_rack_change.py

test: cluster: introduce test_no_dc_rack_change

2025-04-17 16:22:58 +02:00

test_no_removed_node_event_on_ip_change.py

…

test_node_isolation.py

…

test_node_ops_metrics.py

…

test_node_shutdown_waits_for_pending_requests.py

…

test_nodetool.py

…

test_not_enough_token_owners.py

…

test_query_rebounce.py

…

test_raft_cluster_features.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_raft_fix_broken_snapshot.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_ignore_nodes.py

…

test_raft_no_quorum.py

raft: make group0 Raft operation timeout configurable

2025-04-15 10:57:39 +03:00

test_raft_recovery_basic.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_recovery_during_join.py

test: add tests for the Raft-based recovery procedure

2025-03-14 13:53:05 +01:00

test_raft_recovery_entry_loss.py

test: add tests for the Raft-based recovery procedure

2025-03-14 13:53:05 +01:00

test_raft_recovery_majority_loss.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_recovery_stuck.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_recovery_user_data.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_raft_snapshot_request.py

…

test_raft_snapshot_truncation.py

…

test_raft_voters.py

raft/test: add the upgrade test for limited voters feature

2025-04-07 12:31:37 +02:00

test_random_tables.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_read_repair.py

wip

2025-04-17 03:01:17 -04:00

test_remove_alive_node.py

…

test_remove_rpc_client_with_pending_requests.py

…

test_repair.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_replace_alive_node.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_replace_ignore_nodes.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_replace_with_encryption.py

…

test_replace_with_same_ip_twice.py

…

test_replace.py

…

test_restart_cluster.py

…

test_resurrection.py

test: port of test and reproducer for resurrection during file based streaming

2025-03-30 13:39:40 +03:00

test_reversed_queries_during_simulated_upgrade_process.py

…

test_rpc_compression.py

…

test_select_from_mutation_fragments.py

…

test_shutdown_hang.py

…

test_snapshot.py

…

test_sstable_compression_dictionaries_autotrain.py

test_sstable_compression_dictionaries_autotrain: raise the timeout

2025-04-29 22:09:14 +03:00

test_sstable_compression_dictionaries_basic.py

test: add test_sstable_compression_dictionaries_basic.py

2025-04-01 00:07:30 +02:00

test_start_bootstrapped_with_invalid_seed.py

…

test_table_desc_read_barrier.py

…

test_table_drop.py

test: test table drop during flush

2025-04-23 14:29:28 +02:00

test_tablet_repair_scheduler.py

…

test_tablet_stats.py

virtual-tables: Introduce system.load_per_node

2025-04-09 20:21:51 +02:00

test_tablets2.py

test: Test truncate during topology change

2025-04-16 09:10:22 +03:00

test_tablets_cql.py

test_tablets_cql: test_alter_dropped_tablets_keyspace: extend expected error

2025-04-23 18:54:22 +03:00

test_tablets_intranode.py

…

test_tablets_merge.py

service: Introduce rack-aware co-location migrations for tablet merge

2025-03-16 22:45:00 +02:00

test_tablets_migration.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_tablets_removenode.py

test: tablets: Fix flakiness due to ungraceful shutdown

2025-03-27 09:44:07 +03:00

test_tablets.py

test: test_tablets: wait for cql

2025-04-24 21:25:29 +03:00

test_tls.py

…

test_tombstone_gc.py

…

test_topology_failure_recovery.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_ops_encrypted.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_ops.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_recovery_basic.py

test: mark tests with the gossip-based recovery procedure

2025-03-14 13:53:05 +01:00

test_topology_recovery_majority_loss.py

…

test_topology_rejoin.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_topology_remove_decom.py

…

test_topology_remove_garbage_group0.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_schema.py

test/pylib: servers_add: add auto_rack_dc parameter

2025-03-30 19:23:40 +03:00

test_topology_smp.py

…

test_topology_upgrade_not_stuck_after_recent_removal.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_upgrade_stuck.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_topology_upgrade.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_truncate_with_tablets.py

topology_coordinator: Use shorter fault-injection overloads

2025-04-10 14:05:46 +03:00

test_view_build_status.py

db/config: add tablets_mode_for_new_keyspaces option

2025-03-24 14:54:45 +02:00

test_writes_to_previous_cdc_generations.py

…

test_zero_token_nodes_multidc.py

…

test_zero_token_nodes_no_replication.py

…

test_zero_token_nodes_topology_ops.py

test: test_zero_token_nodes_topology_ops: use host IDs for ignored nodes

2025-04-24 20:17:19 +03:00

util.py

test: Ignore DEBUG,TRACE,INFO level messages when checking for failed mutations.

2025-04-18 16:17:41 +03:00