scylladb

Author	SHA1	Message	Date
copilot-swe-agent[bot]	c06760cf15	Fix multiple issues in test_out_of_space_prevention.py - Fix variable name error: host[0] → hosts[0] on line 98 - Add missing await keywords for async operations on lines 209 and 385 - Rename class random_content_file to RandomContentFile (PascalCase) - Fix function name typo: test_autotoogle_compaction → test_autotoggle_compaction Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>	2025-12-23 09:25:16 +00:00
Łukasz Paszkowski	2cb9bb8f3a	test_user_writes_rejection: Disable speculative retries This test starts a 3-node cluster and creates a large blob file so that one node reaches critical disk utilization, triggering write rejections on that node. The test then writes data with CL=QUORUM and validates that the data: - did not reach the critically utilized node - did reach the remaining two nodes By default, tables use speculative retries to determine when coordinators may query additional replicas. Since the validation uses CL=ONE, it is possible that an additional request is sent to satisfy the consistency level. As a result: - the first check may fail if the additional request is sent to a node that already contains data, making it appear as if data reached the critically utilized node - the second check may fail if the additional request is sent to the critically utilized node, making it appear as if data did not reach the healthy node The patch fixes the flakiness by disabling the speculative retries. Fixes https://github.com/scylladb/scylladb/issues/27212 Closes scylladb/scylladb#27488	2025-12-19 09:39:09 +02:00
Lakshmi Narayanan Sreethar	4d442f48db	compaction/compaction_descriptor: introduce compaction_type::Major Introduce a new compaction_type enum : `Major`. This type will be used by the next patches to differentiate between major compaction and regular compaction (compaction_type::Compaction). Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>	2025-10-29 19:21:53 +05:30
Łukasz Paszkowski	7ec369b900	database: Log message after critical_disk_utilization mode is set This is a follow-up of the previous fix: https://github.com/scylladb/scylladb/pull/26030 The test test_user_writes_rejection starts a 3-node cluster and creates a large file on one of the nodes, to trigger the out-of-space prevention mechanism, which should reject writes on that node. It waits for the log message 'Setting critical disk utilization mode: true' and then executes a write expecting the node to reject it. Currently, the message is logged before the `_critical_disk_utilization` variable is actually updated. This causes the test to fail sporadically if it runs quickly enough. The fix splits the logging into two steps: 1. "Asked to set critical disk utilization mode" - logged before any action 2) "Set critical disk utilization mode" - logged after `_critical_disk_utilization` has been updated The tests are updated to wait for the second message. Fixes https://github.com/scylladb/scylladb/issues/26004 Closes scylladb/scylladb#26392	2025-10-20 13:24:10 +03:00
Łukasz Paszkowski	62e27e0f77	test_out_of_space_prevention.py: Fix flaky test_node_restart_while_tablet_split test The test starts a 3-node cluster and immediately creates a big file on the first nodes in order to trigger the out of space prevention to disable compaction, including the SPLIT compaction. In order to trigger a SPLIT compaction, a keyspace with 1 initial tablet is created followed by alter statement with `tablets = {'min_tablet_count': 2}`. This triggers a resize decision that should not finalize due to disabled compaction on the first node. The test is flaky because, the keyspace is created with RF=1 and there is no guarantee that the tablet replica will be located on the first node with critical disk utilization. If that is not the case, the split is finalized and the test fails, because it expect the split to be blocked. Change to RF=3. This ensures there is exactly one tablet replica on each node, including the one with critical disk utilization. So SPLIT is blocked until the disk utilization on the first node, drops below the critical level. Fixes: https://github.com/scylladb/scylladb/issues/25861 Closes scylladb/scylladb#26225	2025-09-25 11:54:48 +03:00
Łukasz Paszkowski	5f6df4eb97	test/storage: Properly mount/clear volumes Due to a missing functionality in PythonTest, `unshare` is never used to mount volumes. As a consequence: + volumes are created with sudo which is undesired + they are not cleared automatically Even having the missing support in place, the approach with mounting volumes with `unshare` would not work as http server, a pool of clusters, and scylla cluster manager are started outside of the new namespace. Thus cluster would have no access to volumes created with `unshare`. The new approach that works with and without dbuild and does not require sudo, uses the following three commands to mount a volume: truncate -s 100M /tmp/mydevice.img mkfs.ext4 /tmp/mydevice.img fuse2fs /tmp/mydevice.img test/ Additionally, a proper cleanup is performed, i.e. servers are stopped gracefully and and volumes are unmounted after the tests using them are completed. Fixes: https://github.com/scylladb/scylladb/issues/25906 Closes scylladb/scylladb#26065	2025-09-25 11:05:50 +03:00
Łukasz Paszkowski	29de947851	test_out_of_space_prevention.py: Fix flaky test_user_writes_rejection test The test starts a 3-node cluster and immediately creates a big file on one of the nodes, to trigger the out of space prevention to start rejecting writes on this node. Then a write is executed and checked it did not reach the node with critical disk utilization but reached the remaining nodes (it should, RF=3 is set) However, when not specified, a default LOCAL_ONE consistency level is used. This means that only one node is required to acknowledge the write. After the write, the test checks if the write + did NOT reach the node with critical disk utilization (works) + did reach the remaning nodes This can cause the test to fail sporadically as the write might not yet be on the last node. Use CL=QUORUM instead. Fixes: https://github.com/scylladb/scylladb/issues/26004 Closes scylladb/scylladb#26030	2025-09-25 08:05:45 +03:00
Łukasz Paszkowski	e34deea50e	tests/cluster: Add new storage tests The storage submodule contains tests that require mounted volumes to be executed. The volumes are created automatically with the `volumes_factory` fixture. The tests in this suite are executed with the custom launcher `unshare -mr pytest` Test scenarios (when one node reaches critical disk utilization): 1. Reject user table writes 2. Disable/Enabled compaction 3. Reject split compactions 4. New split compactions not triggered 5. Abort tablet repair 6. Disable/Enabled incoming tablet migrations 7. Restart a node while a tablet split is triggered	2025-08-29 14:56:13 +02:00

8 Commits