scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 13:06:57 +00:00

Author	SHA1	Message	Date
Botond Dénes	ae17596c2a	Merge 'Demote log level on split failure during shutdown' from Raphael Raph Carvalho Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Fixes https://github.com/scylladb/scylladb/issues/24850. Only 2026.1 is affected. Closes scylladb/scylladb#29032 * github.com:scylladb/scylladb: replica: Demote log level on split failure during shutdown service: Demote log level on split failure during shutdown	2026-03-18 16:21:05 +02:00
Dawid Mędrek	a8dd13731f	Merge 'Improve debuggability of test/cluster/test_data_resurrection_in_memtable.py' from Botond Dénes This test was observed to fail in CI recently but there is not enough information in the logs to figure out what went wrong. This PR makes a few improvements to make the next investigation easier, should it be needed: * storage-service: add table name to mutation write failure error messages. * database: the `database_apply` error injection used to cause trouble, catching writes to bystander tables, making tests flaky. To eliminate this, it gained a filter to apply only to non-system keyspaces. Unfortunately, this still allows it to catch writes to the trace tables. While this should not fail the test, it reduces observability, as some traces disappear. Improve this error injection to only apply to selected table. Also merge it with the `database_apply_wait` error injection, to streamline the code a bit. * test/test_data_resurrection_in_memtable.py: dump data from the datable, before the checks for expected data, so if checks fail, the data in the table is known. Refs: SCYLLADB-812 Refs: SCYLLADB-870 Fixes: SCYLLADB-1050 (by restricting `database_apply` error injection, so it doesn't affect writes to system traces) Backport: test related improvement, no backport Closes scylladb/scylladb#28899 * github.com:scylladb/scylladb: test/cluster/test_data_resurrection_in_memtable.py: dump rows before check replica/database: consolidate the two database_apply error injections service/storage_proxy: add name of table to error message for write errors	2026-03-17 13:35:19 +01:00
Raphael S. Carvalho	ee87b66033	replica: Demote log level on split failure during shutdown Dtest failed with: table - Failed to load SSTable .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db of origin memtable due to std::runtime_error (Cannot split .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db because manager has compaction disabled, reason might be out of space prevention), it will be unlinked... The reason is that the error above is being triggered when the cause is shutdown, not out of space prevention. Let's distinguish between the two cases and log the error with warning level on shutdown. Fixes https://github.com/scylladb/scylladb/issues/24850. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 12:03:17 -03:00
Łukasz Paszkowski	826fd5d6c3	test/storage: harden out-of-space prevention tests around restart and disk-utilization transitions The tests in test_out_of_space_prevention.py are flaky. Three issues contribute: 1. After creating/removing the blob file that simulates disk pressure, the tests immediately checked derived state (e.g., "compaction_manager - Drained") without first confirming the disk space monitor had detected the utilization change. Fix: explicitly wait for "Reached/Dropped below critical disk utilization level" right after creating/removing the blob file, before checking downstream effects. 2. Several tests called `manager.driver_connect()` or omitted reconnection entirely after `server_restart()` / `server_start()`. The pre-existing driver session can silently reconnect multiple times, causing subsequent CQL queries to fail. Fix: call `reconnect_driver()` after every node restart. Additionally, call `wait_for_cql_and_get_hosts()` where CQL is used afterward, to ensure all connection pools are established. 3. Some log assertions used marks captured before a restart, so they could match pre-restart messages or miss messages emitted in the correct post-restart window. Fix: refresh marks at the right points. Apart from that, the patch fixes a typo: autotoogle -> autotoggle. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-655 Closes scylladb/scylladb#28626	2026-03-09 14:45:09 +02:00
Botond Dénes	f375aae257	replica/database: consolidate the two database_apply error injections Into a single database_apply one. Add three parameters: * ks_name and cf_name to filter the tables to be affected * what - what to do: throw or wait This leads to smaller footprint in the code and improved filtering for table names at the cost of some extra error injection params in the tests.	2026-03-05 11:44:02 +02:00
Andrei Chekun	6ae58c6fa6	test.py: move storage tests to cluster subdirectory Move the storage test suite from test/storage/ to test/cluster/storage/ to consolidate related cluster-based tests.This removes the standalone test/storage/suite.yaml as the tests will use the cluster's test configuration. Initially these tests were in cluster, but to use unshare at first iteration they were moved outside. Now they are using another way to handle volumes without unshare, they should be in cluster Closes scylladb/scylladb#28634	2026-02-23 16:14:15 +02:00

6 Commits