scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Files

Piotr Dulikowski 8dfd455001 Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev

- Fix table drop blocking for the full client timeout when in-flight writes can't reach quorum
- Handle unhandled timeout exception in the wait-for-leader loop during group startup

When a strongly consistent table is dropped, `schedule_raft_group_deletion`() calls `g->close()` which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires (~seconds), unnecessarily delaying group deletion.

Additionally, the wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout but never catches the exception if it fires, leaving the group in an indeterminate state.

SCYLLADB-2080 fix:
- Reorder `schedule_raft_group_deletion`: initiate gate close (prevents new operations), then abort the raft server (unblocks stuck writes by causing `raft::stopped_error`), then await the gate future (resolves immediately since holders are released).
- Handle `raft::stopped_error` in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return `no_such_column_family` (CQL layer converts to InvalidRequest: unconfigured table). Otherwise fall through to the default timeout handling.
- Replace gate->hold() with try_hold() + on_internal_error in acquire_server, with a comment explaining why the gate can never be closed at that point (table removal in `schema_applier::commit_on_shard` precedes gate closure, with no scheduling point in between).

Timeout handling fix:
- Use `coroutine::as_future` in the wait-for-leader loop to catch timeout exceptions gracefully — log a warning and break out instead of propagating unhandled.

Includes a cluster test reproducer (test_drop_table_unblocks_stuck_write) that:
1. Pauses a write on the leader before add_entry
2. Drops the table (follower destroys its group immediately)
3. Resumes the write — verifies it fails promptly with InvalidRequest ("unconfigured table") instead of hanging for 15 seconds

backport: no need, strong consistency is not released yet

Fixes: SCYLLADB-2080

Closes scylladb/scylladb#30105

* github.com:scylladb/scylladb:
  strong consistency/groups_manager: handle timeout in update() wait-for-leader loop
  strong consistency: abort raft server before gate close when dropping a table
  test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080

2026-05-28 09:59:20 +02:00

alternator

tree: add missing -present to copyright headers

2026-05-21 10:57:42 +02:00

boost

Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity

2026-05-27 09:42:06 +03:00

broadcast_tables

LICENSE: Update to version 1.1

2026-04-12 19:46:33 +03:00

cluster

Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev

2026-05-28 09:59:20 +02:00

cql

test: remove dead suite subclasses and legacy execution pipeline

2026-05-17 22:16:31 +03:00

cqlpy

Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski

2026-05-27 15:34:57 +03:00

dist_test

docker: fix coredump collection when host uses pipe-based core_pattern

2026-05-12 14:16:22 +03:00

ldap

tree: add missing -present to copyright headers

2026-05-21 10:57:42 +02:00

lib

treewide: replace deprecated smp::count and smp::all_cpus() with new APIs

2026-05-26 17:35:20 +03:00

manual

schema_builder: make shard_count an explicit constructor parameter

2026-05-26 11:55:56 +03:00

nodetool

test/nodetool: fix mock server port race by using a fixed port on a unique IP

2026-05-04 15:33:19 +02:00

perf

Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity

2026-05-27 09:42:06 +03:00

pylib

test: fix format string typo in error logging in ldap_server.py

2026-05-27 17:22:21 +03:00

pylib_test

test/pylib: add cached Scylla package installer

2026-05-17 17:43:56 +03:00

raft

treewide: replace deprecated smp::count and smp::all_cpus() with new APIs

2026-05-26 17:35:20 +03:00

resource

test/ldap: add LDAP filter-injection reproducers

2026-04-08 13:53:49 +02:00

rest_api

test: migrate runtime pytest.skip() to typed skip_env()

2026-04-19 11:09:29 +02:00

scylla_gdb

Update seastar submodule

2026-05-20 13:47:12 +03:00

unit

treewide: replace deprecated smp::count and smp::all_cpus() with new APIs

2026-05-26 17:35:20 +03:00

vector_search

test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc

2026-05-26 00:37:54 +02:00

__init__.py

test.py: delete dead code in test.py

2026-04-16 22:08:31 +02:00

CMakeLists.txt

test/cmake: add missing tests to boost test suite

2026-03-29 16:17:45 +03:00

conftest.py

test.py: remove testpy_test_fixture_scope

2026-04-16 22:08:33 +02:00

pytest.ini

test: exclude pylib_test from default test runs

2026-04-22 11:38:40 +02:00

README.md

…

README.md

Scylla in-source tests.

For details on how to run the tests, see docs/dev/testing.md

Shared C++ utils, libraries are in lib/, for Python - pylib/

alternator - Python tests which connect to a single server and use the DynamoDB API unit, boost, raft - unit tests in C++ cqlpy - Python tests which connect to a single server and use CQL topology* - tests that set up clusters and add/remove nodes cql - approval tests that use CQL and pre-recorded output rest_api - tests for Scylla REST API Port 9000 scylla-gdb - tests for scylla-gdb.py helper script nodetool - tests for C++ implementation of nodetool

If you can use an existing folder, consider adding your test to it. New folders should be used for new large categories/subsystems, or when the test environment is significantly different from some existing suite, e.g. you plan to start scylladb with different configuration, and you intend to add many tests and would like them to reuse an existing Scylla cluster (clusters can be reused for tests within the same folder).

To add a new folder, create a new directory, and then copy & edit its suite.ini.