Tests in test_out_of_space_prevention.py spend a large fraction of
time creating a random “blob” file to cross the 0.8 critical disk
utilization threshold. With 100MB volumes this requires writing
~70–80MB of data, which is slow inside Docker/Podman-backed volumes.
Most tests only use ~11MB of data, so large volumes are unnecessary.
Reduce the test volume size to 20MB so the critical threshold is
reached at ~16MB and the blob file is much smaller.
This cuts ~5–6s per test.
Set `--tablet-load-stats-refresh-interval-in-seconds=1` for this module’s
clusters applicable to all tests. This significantly reduces runtime
for the slowest cases:
- test_reject_split_compaction: 75.62s -> 23.04s
- test_split_compaction_not_triggered: 69.36s -> 22.98s
Finishing the deprecation of the skip_mode function in favor of
pytest.mark.skip_mode. This PR is only cleaning and migrating leftover tests
that are still used and old way of skip_mode.
Closesscylladb/scylladb#28299
Reaching critical disk utilization on destination means the draining
either caused it, or at least works against reliveing it. So it's
better to cancel those requests. In case of decommission, if critical
disk utilization was caused by it due to not enough capacity, aborting
decomission will bring capacity back to the system and rebalancing
will relieve critical disk utlization.
The current code:
```
try:
cql.execute(f"INSERT INTO {cf} (pk, t) VALUES (-1, 'x')", host=host[0], execution_profile=cl_one_profile).result()
except Exception:
pass
```
contains a typo: `host=host[0]` which throws an exception becase Host
object is not subscriptable. The test does not fail because the except
block is too broad and suppresses all exceptions.
Fixing the typo alone is insufficient. The write still succeeds because
the remaining nodes are UP and the query uses CL=ONE, so no failure
should be expected.
Another source of flakiness is data verification:
```
SELECT * FROM {cf} WHERE pk = 0;
```
Even when a coordinator is explicitly provided, using CL=ONE does not
guarantee a local read. The coordinator may forward the read request to
another replica, causing the verification to fail nondeterministically.
This patch rewrites the tests to address these issues:
- Fix the typo: `host[0]` to `hosts[0]`
- Verify data using `MUTATION_FRAGMENTS({cf})` which guarantees a local
read on the coordinator node
- Reconnect the driver after node restart
Fixes https://github.com/scylladb/scylladb/issues/27933Closesscylladb/scylladb#27934
Function skip_mode works only on function and only in cluster test. This if OK
when we need to skip one test, but it's not possible to use it with pytestmark
to automatically mark all tests in the file. The goal of this PR is to migrate
skip_mode to be dynamic pytest.mark that can be used as ordinary mark.
Closesscylladb/scylladb#27853
[avi: apply to test/cluster/test_tablets.py::test_table_creation_wakes_up_balancer]
Split prepare can run concurrently with repair.
Consider this:
1) split prepare starts
2) incremental repair starts
3) split prepare finishes
4) incremental repair produces unsplit sstable
5) split is not happening on sstable produced by repair
5.1) that sstable is not marked as repaired yet
5.2) might belong to repairing set (has compaction disabled)
6) split executes
7) repairing or repaired set has unsplit sstable
If split was acked to coordinator (meaning prepare phase finished),
repair must make sure that all sstables produced by it are split.
It's not happening today with incremental repair because it disables
split on sstables belonging to repairing group. And there's a window
where sstables produced by repair belong to that group.
To solve the problem, we want the invariant where all sealed sstables
will be split.
To achieve this, streaming consumers are patched to produce unsealed
sstable, and the new variant add_new_sstable_and_update_cache() will
take care of splitting the sstable while it's unsealed.
If no split is needed, the new sstable will be sealed and attached.
This solution was also needed to interact nicely with out of space
prevention too. If disk usage is critical, split must not happen on
restart, and the invariant aforementioned allows for it, since any
unsplit sstable left unsealed will be discarded on restart.
The streaming consumer will fail if disk usage is critical too.
The reason interposer consumer doesn't fully solve the problem is
because incremental repair can start before split, and the sstable
being produced when split decision was emitted must be split before
attached. So we need a solution which covers both scenarios.
Fixes#26041.
Fixes#27414.
Should be backported to 2025.4 that contains incremental repair
Closesscylladb/scylladb#26528
* github.com:scylladb/scylladb:
test: Add reproducer for split vs intra-node migration race
test: Verify split failure on behalf of repair during critical disk utilization
test: boost: Add failure_when_adding_new_sstable_test
test: Add reproducer for split vs incremental repair race condition
compaction: Fail split of new sstable if manager is disabled
replica: Don't split in do_add_sstable_and_update_cache()
streaming: Leave sstables unsealed until attached to the table
replica: Wire add_new_sstables_and_update_cache() into intra-node streaming
replica: Wire add_new_sstable_and_update_cache() into file streaming consumer
replica: Wire add_new_sstable_and_update_cache() into streaming consumer
replica: Document old add_sstable_and_update_cache() variants
replica: Introduce add_new_sstables_and_update_cache()
replica: Introduce add_new_sstable_and_update_cache()
replica: Account for sstables being added before ACKing split
replica: Remove repair read lock from maybe_split_new_sstable()
compaction: Preserve state of input sstable in maybe_split_new_sstable()
Rename maybe_split_sstable() to maybe_split_new_sstable()
sstables: Allow storage::snapshot() to leave destination sstable unsealed
sstables: Add option to leave sstable unsealed in the stream sink
test: Verify unsealed sstable can be compacted
sstables: Allow unsealed sstable to be loaded
sstables: Restore sstable_writer_config::leave_unsealed
This test starts a 3-node cluster and creates a large blob file so that one
node reaches critical disk utilization, triggering write rejections on that
node. The test then writes data with CL=QUORUM and validates that the data:
- did not reach the critically utilized node
- did reach the remaining two nodes
By default, tables use speculative retries to determine when coordinators may
query additional replicas.
Since the validation uses CL=ONE, it is possible that an additional request
is sent to satisfy the consistency level. As a result:
- the first check may fail if the additional request is sent to a node that
already contains data, making it appear as if data reached the critically
utilized node
- the second check may fail if the additional request is sent to the critically
utilized node, making it appear as if data did not reach the healthy node
The patch fixes the flakiness by disabling the speculative retries.
Fixes https://github.com/scylladb/scylladb/issues/27212Closesscylladb/scylladb#27488
Introduce a new compaction_type enum : `Major`.
This type will be used by the next patches to differentiate between
major compaction and regular compaction (compaction_type::Compaction).
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
This is a follow-up of the previous fix: https://github.com/scylladb/scylladb/pull/26030
The test test_user_writes_rejection starts a 3-node cluster and
creates a large file on one of the nodes, to trigger the out-of-space
prevention mechanism, which should reject writes on that node.
It waits for the log message 'Setting critical disk utilization mode: true'
and then executes a write expecting the node to reject it.
Currently, the message is logged before the `_critical_disk_utilization`
variable is actually updated. This causes the test to fail sporadically
if it runs quickly enough.
The fix splits the logging into two steps:
1. "Asked to set critical disk utilization mode" - logged before any action
2) "Set critical disk utilization mode" - logged after `_critical_disk_utilization` has been updated
The tests are updated to wait for the second message.
Fixes https://github.com/scylladb/scylladb/issues/26004Closesscylladb/scylladb#26392
The test starts a 3-node cluster and immediately creates a big file
on the first nodes in order to trigger the out of space prevention to
disable compaction, including the SPLIT compaction.
In order to trigger a SPLIT compaction, a keyspace with 1 initial tablet
is created followed by alter statement with `tablets = {'min_tablet_count': 2}`.
This triggers a resize decision that should not finalize due to
disabled compaction on the first node.
The test is flaky because, the keyspace is created with RF=1 and there
is no guarantee that the tablet replica will be located on the first node
with critical disk utilization. If that is not the case, the split
is finalized and the test fails, because it expect the split to be
blocked.
Change to RF=3. This ensures there is exactly one tablet replica on
each node, including the one with critical disk utilization. So SPLIT
is blocked until the disk utilization on the first node, drops below
the critical level.
Fixes: https://github.com/scylladb/scylladb/issues/25861Closesscylladb/scylladb#26225
Due to a missing functionality in PythonTest, `unshare` is never used
to mount volumes. As a consequence:
+ volumes are created with sudo which is undesired
+ they are not cleared automatically
Even having the missing support in place, the approach with mounting
volumes with `unshare` would not work as http server, a pool of clusters,
and scylla cluster manager are started outside of the new namespace.
Thus cluster would have no access to volumes created with `unshare`.
The new approach that works with and without dbuild and does not require
sudo, uses the following three commands to mount a volume:
truncate -s 100M /tmp/mydevice.img
mkfs.ext4 /tmp/mydevice.img
fuse2fs /tmp/mydevice.img test/
Additionally, a proper cleanup is performed, i.e. servers are stopped
gracefully and and volumes are unmounted after the tests using them are
completed.
Fixes: https://github.com/scylladb/scylladb/issues/25906Closesscylladb/scylladb#26065
The test starts a 3-node cluster and immediately creates a big file
on one of the nodes, to trigger the out of space prevention to start
rejecting writes on this node. Then a write is executed and checked it
did not reach the node with critical disk utilization but reached
the remaining nodes (it should, RF=3 is set)
However, when not specified, a default LOCAL_ONE consistency level
is used. This means that only one node is required to acknowledge the
write.
After the write, the test checks if the write
+ did NOT reach the node with critical disk utilization (works)
+ did reach the remaning nodes
This can cause the test to fail sporadically as the write might not
yet be on the last node.
Use CL=QUORUM instead.
Fixes: https://github.com/scylladb/scylladb/issues/26004Closesscylladb/scylladb#26030
The storage submodule contains tests that require mounted volumes
to be executed. The volumes are created automatically with the
`volumes_factory` fixture.
The tests in this suite are executed with the custom launcher
`unshare -mr pytest`
Test scenarios (when one node reaches critical disk utilization):
1. Reject user table writes
2. Disable/Enabled compaction
3. Reject split compactions
4. New split compactions not triggered
5. Abort tablet repair
6. Disable/Enabled incoming tablet migrations
7. Restart a node while a tablet split is triggered