scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-03 22:55:46 +00:00

Files

Nadav Har'El d33bb6ea00 Merge 'test: fix race window test flakiness from residual re-repair' from Avi Kivity

Fix the persistent flakiness in `test_incremental_repair_race_window_promotes_unrepaired_data` (SCYLLADB-1478, reopened).

After restarting servers[1], the topology coordinator can initiate a **residual re-repair** when it sees tablets stuck in the `repair` stage. This re-repair flushes memtables on all replicas and marks post-repair data as repaired, contaminating the test state and masking the compaction-merge bug the test is designed to detect. The assertion then fails on the *next* retry because the previous attempt's re-repair left behind repaired sstables containing post-repair keys.

1. **Propagating `current_key` through the exception** — correctly advanced the key counter on retry, but the contaminated tablet metadata from the prior re-repair (repaired sstables with post-repair keys) was still present, causing assertion failures on the next attempt.

2. **DROP TABLE + CREATE TABLE between retries** — the tablet metadata (sstables_repaired_at, repair stage) is tied to the tablet identity, and recreating the table in the same keyspace still showed residual state issues.

Instead of trying to clean up contaminated state, each retry creates a **completely fresh keyspace** (unique name via `create_new_test_keyspace`). This gives entirely new tablets with no residual repair metadata from prior attempts. Combined with broader detection of coordinator changes and residual re-repairs, the test reliably retries before any contamination can cause false failures.

The detection is now comprehensive:
- **Broadened coordinator check**: any coordinator change (`new_coord != coord`), not just migration to servers[1]
- **Re-repair detection** at three points: post-restart, during the compaction poll, and after injection release — grep for `"Initiating tablet repair host="` in the coordinator log

1. **`test: extract _setup_table_for_race_window helper`** — pure code-movement refactor that extracts keyspace+table+data+repair1+data+flush into a reusable helper. Easily verifiable as a no-op behavioral change.

2. **`test: fix race window test flakiness from residual re-repair`** — the actual fix: broadened detection logic + re-repair grep at 3 points + fresh-keyspace retry on exception.

Passed 1000 consecutive runs with the fix applied. Without the fix, about 2% flakiness was observed in debug mode.

Fixes: SCYLLADB-1478

So far, we haven't observed flakiness of this test on branches, so not backporting yet. Will backport if seen.

Closes scylladb/scylladb#29721

* github.com:scylladb/scylladb:
  test: fix race window test flakiness from residual re-repair
  test: extract _setup_table_for_race_window helper for race window test

2026-05-03 14:47:19 +03:00

alternator

Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta

2026-04-22 15:48:27 +03:00

boost

test: rename sstable_tablet_streaming.cc to match the naming convention

2026-04-30 11:16:39 +03:00

broadcast_tables

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

cluster

Merge 'test: fix race window test flakiness from residual re-repair' from Avi Kivity

2026-05-03 14:47:19 +03:00

cql

Merge 'service: Support adding/removing a datacenter with tablets by changing RF' from Aleksandra Martyniuk

2026-04-22 01:46:11 +02:00

cqlpy

Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta

2026-04-22 15:48:27 +03:00

ldap

test: ldap: add test for pruner crash during shutdown

2026-04-24 13:34:09 +02:00

lib

compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode

2026-04-20 16:59:09 -03:00

manual

test: auth_cluster: use safe_driver_shutdown() for Cluster teardown

2026-04-21 17:45:11 +02:00

nodetool

test: migrate runtime pytest.skip() to typed skip_env()

2026-04-19 11:09:29 +02:00

perf

audit: split startup into construction and storage phases

2026-04-28 18:58:42 +02:00

pylib

test: add --keep-duplicates and assign RUN_ID via shared cache

2026-04-29 02:36:05 +00:00

pylib_test

test.py: fix framework test

2026-04-25 18:04:55 +02:00

raft

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

resource

test/ldap: add LDAP filter-injection reproducers

2026-04-08 13:53:49 +02:00

rest_api

test: migrate runtime pytest.skip() to typed skip_env()

2026-04-19 11:09:29 +02:00

scylla_gdb

test: migrate runtime pytest.skip() to typed skip_bug()

2026-04-19 11:10:42 +02:00

unit

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

vector_search

cql3: statement_restrictions: prepare statement_restrictions for capturing this

2026-04-19 20:57:03 +03:00

__init__.py

test.py: delete dead code in test.py

2026-04-16 22:08:31 +02:00

CMakeLists.txt

test/cmake: add missing tests to boost test suite

2026-03-29 16:17:45 +03:00

conftest.py

test.py: remove testpy_test_fixture_scope

2026-04-16 22:08:33 +02:00

pytest.ini

test: exclude pylib_test from default test runs

2026-04-22 11:38:40 +02:00

README.md

test: rename "cql-pytest" to "cqlpy"

2024-11-06 16:48:36 +02:00

README.md

Scylla in-source tests.

For details on how to run the tests, see docs/dev/testing.md

Shared C++ utils, libraries are in lib/, for Python - pylib/

alternator - Python tests which connect to a single server and use the DynamoDB API unit, boost, raft - unit tests in C++ cqlpy - Python tests which connect to a single server and use CQL topology* - tests that set up clusters and add/remove nodes cql - approval tests that use CQL and pre-recorded output rest_api - tests for Scylla REST API Port 9000 scylla-gdb - tests for scylla-gdb.py helper script nodetool - tests for C++ implementation of nodetool

If you can use an existing folder, consider adding your test to it. New folders should be used for new large categories/subsystems, or when the test environment is significantly different from some existing suite, e.g. you plan to start scylladb with different configuration, and you intend to add many tests and would like them to reuse an existing Scylla cluster (clusters can be reused for tests within the same folder).

To add a new folder, create a new directory, and then copy & edit its suite.ini.