Add .set_skip_when_empty() to four metrics in the db module that are
only incremented on very rare error paths and are almost always zero:
- cache::pinned_dirty_memory_overload: described as 'should sit
constantly at 0, nonzero is indicative of a bug'
- corrupt_data::entries_reported: only fires on actual data corruption
- hints::corrupted_files: only fires on on-disk hint file corruption
- rate_limiter::failed_allocations: only fires when the rate limiter
hash table is completely full and gives up allocating, requiring
extreme cardinality pressure
These metrics create unnecessary reporting overhead when they are
perpetually zero. set_skip_when_empty() suppresses them from metrics
output until they become non-zero.
AI-Assisted: yes
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
get_session() was passing timeout=0.1 to patient_exclusive_cql_connection
and patient_cql_connection, leaving only 0.1 seconds for the retry loop
in retry_till_success(). Since each connection attempt can take up to 5
seconds (connect_timeout=5), the retry loop effectively got only one
attempt with no chance to retry on transient NoHostAvailable errors.
Use the default timeout=30 seconds, consistent with all other callers.
Fixes: SCYLLADB-1373
Closesscylladb/scylladb#29332
When a Scylla node starts, the scylla-image-setup.service invokes the
`scylla_swap_setup` script to provision swap. This script allocates a
swap file and creates a swap systemd unit to delegate control to
systemd. By default, systemd injects a Before=swap.target dependency
into every swap unit, allowing other services to use swap.target to wait
for swap to be enabled.
On Azure, this doesn't work so well because we store the swap file on
the ephemeral disk [1] which has network dependencies (`_netdev` mount
option, configured by cloud-init [2]). This makes the swap.target
indirectly depend on the network, leading to dependency cycles such as:
swap.target -> mnt-swapfile.swap -> mnt.mount -> network-online.target
-> network.target -> systemd-resolved.service -> tmp.mount -> swap.target
This patch breaks the cycle by removing the swap unit from swap.target
using DefaultDependencies=no. The swap unit will still be activated via
WantedBy=multi-user.target, just not during early boot.
Although this problem is specific to Azure, this patch applies the fix
to all clouds to keep the code simple.
Fixes#26519.
Fixes SCYLLADB-1257
[1] https://github.com/scylladb/scylla-machine-image/pull/426
[2] https://github.com/canonical/cloud-init/pull/1213#issuecomment-1026065501
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#28504
When test_exception_safety_of_update_from_memtable was converted from
manual fail_after()/catch to with_allocation_failures() in 74db08165d,
the populate_range() call ended up inside the failure injection scope
without a scoped_critical_alloc_section guard. The other two tests
converted in the same commit (test_exception_safety_of_transitioning...
and test_exception_safety_of_partition_scan) were correctly guarded.
Without the guard, the allocation failure injector can sometimes
target an allocation point inside the cleanup path of populate_range().
In a rare corner case, this triggers a bad_alloc in a noexcept context
(reader_concurrency_semaphore::stop()), causing std::terminate.
Fixes SCYLLADB-1346
Closesscylladb/scylladb#29321
Fixes: SCYLLADB-1106
* Small fix in scylla_cluster - remove debug print
* Fix GSServer::unpublish so it does not except if publish was not called beforehand
* Improve dockerized_server so mock server logs echo to the test log to help diagnose CI failures (because we don't collect log files from mocks etc, and in any case correlation will be much easier).
No backport needed.
Closesscylladb/scylladb#29112
* github.com:scylladb/scylladb:
dockerized_service: Convert log reader to pipes and push to test log
test::cluster::conftest::GSServer: Fix unpublish for when publish was not called
scylla_cluster: Use thread safe future signalling
scylla_cluster: Remove left-over debug printout
The test was failing because the call to:
await log.wait_for('Stopping.*ongoing compactions')
was missing the 'from_mark=log_mark' argument. The log mark was updated
(line: log_mark = await log.mark()) immediately after detecting
'splitting_mutation_writer_switch_wait: waiting', and just before
launching the shutdown task. However, the wait_for call on the following
line was scanning from the beginning of the log, not from that mark.
As a result, the search immediately matched old 'Stopping N tasks for N
ongoing compactions for table system.X due to table removal' messages
emitted during initial server bootstrap (for system.large_partitions,
system.large_rows, system.large_cells), rather than waiting for the
shutdown to actually stop the user-table split compaction.
This caused the test to prematurely send the message to the
'splitting_mutation_writer_switch_wait' injection. The split compaction
was unblocked before the shutdown had aborted it, so it completed
successfully. Since the split succeeded, 'Failed to complete splitting
of table' was never logged.
Meanwhile, 'storage_service_drain_wait' was blocking do_drain() waiting
for a message. With the split already done, the test was stuck waiting
for the expected failure log that would never come (600s timeout). At
the same time, after 60s the 'storage_service_drain_wait' injection
timed out internally, triggering on_internal_error() which -- with
--abort-on-internal-error=1 -- crashed the server (exit code -6).
Fix: pass from_mark=log_mark to the wait_for('Stopping.*ongoing
compactions') call so it only matches messages that appear after the
shutdown has started, ensuring the test correctly synchronizes with the
shutdown aborting the user-table split compaction before releasing the
injection.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1319.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#29311
Replace the random port selection with an OS-assigned port. We open
a temporary TCP socket, bind it to (ip, 0) with SO_REUSEADDR, read back
the port number the OS selected, then close the socket before launching
rest_api_mock.py.
Add reuse_address=True and reuse_port=True to TCPSite in rest_api_mock.py
so the server itself can also reclaim a TIME_WAIT port if needed.
Fixes: SCYLLADB-1275
Closesscylladb/scylladb#29314
On slow/overloaded CI machines the lowres_clock timer may not have
fired after the fixed 2x sleep, causing the assertion on
get_abort_exception() to fail. Replace the fixed sleep with
sleep(1x) + eventually_true() which retries with exponential backoff,
matching the pattern already used in test_time_based_cache_eviction.
Fixes: SCYLLADB-1311
Closesscylladb/scylladb#29299
make sure the driver is stopped even though cluster
teardown throws and avoid potential stale driver
connections entering infinite reconnect loops which
exhaust cpu resources.
Fixes: SCYLLADB-1189
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#29230
Debug mode shuffles task position in the queue. So the following is possible:
1) shard 1 calls manual_clock::advance(). This expires timers on shard 1 and queues a background smp call to shard 0 which will expire timers there
2) the smp::submit_to(0, ...) from shard 1 called by the test sumbits the call
3) shard 0 creates tasks for both calls, but (2) is run first, and preempts the reactor
4) shard 1 sees the completion, completes m_svc.invoke_on(1, ..)
5) shard 0 inserts the completion from (4) before task from (1)
6) the check on shard 0: m.find(id1) fails because the timer is not expired yet
To fix that, wait for timer expiration on shard 0, so that the test
doesn't depend on task execution order.
Note: I was not able to reproduce the problem locally using test.py --mode
debug --repeat 1000.
It happens in jenkins very rarely. Which is expected as the scenario which
leads to this is quite unlikely.
Fixes SCYLLADB-1265
Closesscylladb/scylladb#29290
The test exercises all five node operations (bootstrap, replace, rebuild,
removenode, decommission) and by the end only one node out of four
remains alive. The CQL driver session, however, still holds stale
references to the dead hosts in its connection pool and load-balancing
policy state.
When the new_test_keyspace context manager exits and attempts
DROP KEYSPACE, the driver routes the query to the dead hosts first,
gets ConnectionShutdown from each, and throws NoHostAvailable before
ever trying the single live node.
Fix by calling driver_connect() after the decommission step, which
closes the old session and creates a fresh one connected only to the
servers the test manager reports as running.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1313.
Closesscylladb/scylladb#29306
The multishard_query_test/fuzzy_test was timing out (SIGKILL after
15 minutes) in release mode CI.
In release mode the test generates up to 64 partitions with up to
1000 clustering rows and 1000 range tombstones each. With deeply
nested randomly-generated types (e.g. frozen<map<varint,
frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed
the 15-minute CI timeout.
Reduce the release-mode clustering-row and range-tombstone
distributions from 0-1000 to 0-200. This caps the worst case at
~12,800 rows -- still 2x the devel-mode maximum (0-100) and
sufficient to exercise multi-partition paged scanning with many
pages.
Fixes: SCYLLADB-1270
No need to backport for now, only appeared on master.
Closesscylladb/scylladb#29293
* github.com:scylladb/scylladb:
test: clean up fuzzy_test_config and add comments
test: fix fuzzy_test timeout in release mode
This commit improves how test.py chohoses the default number of
parallele jobs.
This update keeps logic of selecting number of jobs from memory and cpu limits
but simplifies the heuristic so it is smoother, easier to reason about.
This avoids discontinuities such as neighboring machine sizes producing
unexpectedly different job counts, and behaves more predictably on asymmetric
machines where CPU and RAM do not scale together.
Compared to the current threshold-based version, this approach:
- avoids hard jumps around memory cutoffs
- avoids bucketed debug scaling based on CPU count
- keeps CPU and memory as separate constraints and combines them in one place
- avoids double-penalizing debug mode
- is easier to tune later by adjusting a few constants instead of rewriting branching logic
Closesscylladb/scylladb#28904
get_audit_partitions_for_operation() returns None when no audit log
rows are found. In _test_insert_failure_doesnt_report_success_assign_nodes,
this None is passed to set(), causing TypeError: 'NoneType' object is
not iterable.
The audit log entry may not yet be visible immediately after executing
the INSERT, so use wait_for() from test.pylib.util with exponential
backoff to poll until the entry appears. Import it as wait_for_async
to avoid shadowing the existing wait_for from test.cluster.dtest.dtest_class,
which has a different signature (timeout vs deadline).
Fixes SCYLLADB-1330
Closesscylladb/scylladb#29289
Switch _promoted_indexes storage in partition_index_page from
managed_vector to chunked_managed_vector to avoid large contiguous
allocations.
Avoid allocation failure (or crashes with --abort-on-internal-error)
when large partitions have enough promoted index entries to trigger a
large allocation with managed_vector.
Fixes: SCYLLADB-1315
Closesscylladb/scylladb#29283
Remove the unused timeout field from fuzzy_test_config. It was
declared, initialized per build mode, and logged, but never actually
enforced anywhere.
Document the intentionally small max_size (1024 bytes) passed to
read_partitions_with_paged_scan in run_fuzzy_test_scan: it forces
many pages per scan to stress the paging and result-merging logic.
The multishard_query_test/fuzzy_test was timing out (SIGKILL after
15 minutes) in release mode CI.
In release mode the test generates up to 64 partitions with up to
1000 clustering rows and 1000 range tombstones each. With deeply
nested randomly-generated types (e.g. frozen<map<varint,
frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed
the 15-minute CI timeout.
Reduce the release-mode clustering-row and range-tombstone
distributions from 0-1000 to 0-200. This caps the worst case at
~12,800 rows -- still 2x the devel-mode maximum (0-100) and
sufficient to exercise multi-partition paged scanning with many
pages.
Fixes: SCYLLADB-1270
The test was flaky because it stopped dc2_node immediately after an
LWT write, before cross-DC replication could complete. The LWT commit
uses LOCAL_QUORUM, which only guarantees persistence in the
coordinator's DC. Replication to the remote DC is async background
work, and CAS mutations don't store hints. Stopping dc2_node could
drop in-flight RPCs, leaving DC1 without the mutation.
Fix by polling both live DC1 nodes after the write to confirm
cross-DC replication completed before stopping dc2_node. Both nodes
must have the data so that the later ConsistentRead=True
(LOCAL_QUORUM) read on restarted node1 is guaranteed to succeed.
Fixes SCYLLADB-1267
Closesscylladb/scylladb#29287
The approach taken in 1ae2ae50a6 turned
out to be incorrect. The Raft member requesting a read barrier could
incorrectly advance its commit_idx and break linearizability. We revert that
commit in this PR.
We also remake the read barrier optimization with a completely new approach.
We make the leader replicate to the non-voting requester of a read barrier if
its `commit_idx` is behind.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-998
No backport: the issue is present only in master.
Closesscylladb/scylladb#29216
* github.com:scylladb/scylladb:
raft: speed up read barrier requested by non-voters
Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe"
Fix two independent race conditions in the syslog audit test that cause intermittent `assert 2 <= 1` failures in `assert_entries_were_added`.
**Datagram ordering race:**
`UnixSockerListener` used `ThreadingUnixDatagramServer`, where each datagram spawns a new thread. The notification barrier in `get_lines()` assumes FIFO handling, but the notification thread can win the lock before an audit entry thread, so `clear_audit_logs()` misses entries that arrive moments later. Fix: switch to sequential `UnixDatagramServer`.
**Config reload race:**
The live-update path used `wait_for_config` (REST API poll on shard 0) which can return before `broadcast_to_all_shards()` completes. Fix: wait for `"completed re-reading configuration file"` in the server log after each SIGHUP, which guarantees all shards have the new config.
Fixes SCYLLADB-1277
This is CI improvement for the latest code. No need for backport.
Closesscylladb/scylladb#29282
* github.com:scylladb/scylladb:
test: cluster: wait for full config reload in audit live-update path
test: cluster: fix syslog listener datagram ordering race
`test_limit_concurrent_requests` could create far more tables than intended
because worker threads looped indefinitely and only the probe path terminated
the test. In practice, workers often hit `RequestLimitExceeded` first, but the
test kept running and creating tables, increasing memory pressure and causing
flakiness due to bad_alloc errors in logs.
Fix by replacing the old probe-driven termination with worker-driven
termination. Workers now run until any worker sees
`RequestLimitExceeded`.
Fixes SCYLLADB-1181
Closesscylladb/scylladb#29270
A joining node hung forever if the topology coordinator added it to the
group 0 configuration before the node reached `post_server_start`. In
that case, `server->get_configuration().contains(my_id)` returned true
and the node broke out of the join loop early, skipping
`post_server_start`. `_join_node_group0_started` was therefore never set,
so the node's `join_node_response` RPC handler blocked indefinitely.
Meanwhile the topology coordinator's `respond_to_joining_node` call
(which has no timeout) hung forever waiting for the reply that never came.
Fix by only taking the early-break path when not starting as a follower
(i.e. when the node is the discovery leader or is restarting). A joining
node must always reach `post_server_start`.
We also provide a regression test. It takes 6s in dev mode.
Fixes SCYLLADB-959
Closesscylladb/scylladb#29266
_apply_config_to_running_servers used wait_for_config (REST API poll)
to confirm live config updates. The REST API reads from shard 0 only,
so it can return before broadcast_to_all_shards() completes — other
shards may still have stale audit config, generating unexpected entries.
Additionally, server_remove_config_option for absent keys sent separate
SIGHUPs before server_update_config, and the single wait_for_config at
the end could match a completion from an earlier SIGHUP.
Wait for "completed re-reading configuration file" in the server log
after each SIGHUP-producing operation. This message is logged only
after both read_config() and broadcast_to_all_shards() finish,
guaranteeing all shards have the new config. Each operation gets its
own mark+wait so no stale completion is matched.
Fixes SCYLLADB-1277
UnixSockerListener used ThreadingUnixDatagramServer, which spawns a
new thread per datagram. The notification barrier in get_lines() relies
on all prior datagrams being handled before the notification. With
threading, the notification handler can win the lock before an audit
entry handler, so get_lines() returns before the entry is appended.
clear_audit_logs() then clears an incomplete buffer, and the late
entry leaks into the next test's before/after diff.
Switch to sequential UnixDatagramServer. The server thread now handles
datagrams in kernel FIFO order, so the notification is always processed
after all preceding audit entries.
Refs SCYLLADB-1277
We achieve this by making the leader replicate to the non-voting requester
of a read barrier if its commit_idx is behind.
There are some corner cases where the new `replicate_to(*opt_progress, true);`
call will be a no-op, while the corresponding call in `tick_leader()` would
result in sending the AppendEntries RPC to the follower. These cases are:
- `progress.state == follower_progress::state::PROBE && progress.probe_sent`,
- `progress.state == follower_progress::state::PIPELINE
&& progress.in_flight == follower_progress::max_in_flight`.
We could try to improve the optimization by including some of the cases above,
but it would only complicate the code without noticeable benefits (at least
for group0).
Note: this is the second attempt for this optimization. The first approach
turned out to be incorrect and was reverted in the previous commit. The
performance improvement is the same as in the previous case.
Use get_cql_exclusive(node1) so the driver only connects to node1 and
never attempts to contact the stopped node2. The test was flaky because
the driver received `Host has been marked down or removed` from node2.
Fixes: SCYLLADB-1227
Closesscylladb/scylladb#29268
The test was using time.sleep(1) (a blocking call) to wait after
scheduling the stop_compaction task, intending to let it register on
the server before releasing the sstable_cleanup_wait injection point.
However, time.sleep() blocks the asyncio event loop entirely, so the
asyncio.create_task(stop_compaction) task never gets to run during the
sleep. After the sleep, the directly-awaited message_injection() runs
first, releasing the injection point before stop_compaction is even
sent. By the time stop_compaction reaches Scylla, the cleanup has
already completed successfully -- no exception is raised and the test
fails.
Fix by replacing time.sleep(1) with await asyncio.sleep(1), which
yields control to the event loop and allows the stop_compaction task
to actually send its HTTP request before message_injection is called.
Fixes: SCYLLADB-834
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#29202
Fix the ordering of the concurrency limit check in the Alternator HTTP server so it happens before memory acquisition, and reduce test pressure to avoid LSA exhaustion on the memory-constrained test node.
The patch moves the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. The check was originally placed before memory acquisition but was inadvertently moved after it during a refactoring. This allowed unlimited requests to pile up consuming memory, reading bodies, verifying signatures, and decompressing — all before being rejected. Restores the original ordering and mirrors the CQL transport (`transport/server.cc`).
Lowers `concurrent_requests_limit` from 5 to 3 and the thread multiplier from 5 to 2 (6 threads instead of 25). This is still sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512MB per shard can sustain.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1248
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1181
The test started to fail quite recently. It affects master only. No backport is needed. We might want to consider backporting a commit moving the concurrency check earlier.
Closesscylladb/scylladb#29272
* github.com:scylladb/scylladb:
test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion
alternator: check concurrency limit before memory acquisition
The test_limit_concurrent_requests dtest uses concurrent CreateTable
requests to verify Alternator's concurrency limiting. Each admitted
CreateTable triggers Raft consensus, schema mutations, and memtable
flushes—all of which consume LSA memory. On the 1 GB test node
(2 SMP × 512 MB), the original settings (limit=5, 25 threads) created
enough flush pressure to exhaust the LSA emergency reserve, producing
logalloc::bad_alloc errors in the node log. The test was always
marginal under these settings and became flaky as new system tables
increased baseline LSA usage over time.
Lower concurrent_requests_limit from 5 to 3 and the thread multiplier
from 5 to 2 (6 threads total). This is still well above the limit and
sufficient to reliably trigger RequestLimitExceeded, while keeping flush
pressure within what 512 MB per shard can sustain.
The concurrency limit check in the Alternator server was positioned after
memory acquisition (get_units), request body reading (read_entire_stream),
signature verification, and decompression. This allowed unlimited requests
to pile up consuming memory before being rejected, exhausting LSA memory
and causing logalloc::bad_alloc errors that cascade into Raft applier
and topology coordinator failures, breaking subsequent operations.
Without this fix, test_limit_concurrent_requests on a 1GB node produces
50 logalloc::bad_alloc errors and cascading failures: reads from
system.scylla_local fail, the Raft applier fiber stops, the topology
coordinator stops, and all subsequent CreateTable operations fail with
InternalServerError (500). With this fix, the cascade is eliminated --
admitted requests may still cause LSA pressure on a memory-constrained
node, but the server remains functional.
Move the concurrency check to right after the content-length early-out,
before any memory acquisition or I/O. This mirrors the CQL transport
which correctly checks concurrency before memory acquisition
(transport/server.cc).
The concurrency check was originally added in 1b8c946ad7 (Sep 2020)
*before* memory acquisition, which at the time lived inside with_gate
(after the concurrency gate). The ordering was inverted by f41dac2a3a
(Mar 2021, "avoid large contiguous allocation for request body"), which
moved get_units() earlier in the function to reserve memory before
reading the newly-introduced content stream -- but inadvertently also
moved it before the concurrency check. c3593462a4 (Mar 2025) further
worsened the situation by adding a 16MB fallback reservation for
requests without Content-Length and ungzip/deflate decompression steps
-- all before the concurrency check -- greatly increasing the memory
consumed by requests that would ultimately be rejected.
**The Bug**
Assertion failure: `SCYLLA_ASSERT(res.second)` in `raft/server.cc`
when creating a snapshot transfer for a destination that already had a
stale in-flight transfer.
**Root Cause**
If a node loses leadership and later becomes leader again before the next
`io_fiber` iteration, the old transfer from the previous term can remain
in `_snapshot_transfers` while `become_leader()` resets progress state.
When the new term emits `install_snapshot(dst)`, `send_snapshot(dst)`
tries to create a new entry for the same destination and can hit the
assertion.
**The Fix**
Abort all in-flight snapshot transfers in `process_fsm_output()` when
`term_and_vote` is persisted. A term/vote change marks existing transfers
as stale, so we clean them up before dispatching messages from that batch
and before any new snapshot transfer is started.
With cross-term cleanup moved to the term-change path, `send_snapshot()`
now asserts the within-term invariant that there is at most one in-flight
transfer per destination.
Fixes: SCYLLADB-862
Backport: The issue is reproducible in master, but is present in all
active branches.
Closesscylladb/scylladb#29092
This reverts commit c30607d80b.
With the default configuration, enabling DDL has no effect because
no `audit_keyspaces` or `audit_tables` are specified. Including DDL
in the default categories can be misleading for some customers, and
ideally we would like to avoid it.
However, DDL has been one of the default audit categories for years,
and removing it risks silently breaking existing deployments that
depend on it. Therefore, the recent change to disable DDL by default
is reverted.
Fixes: SCYLLADB-1155
Closesscylladb/scylladb#29169
test_reboot uses a custom restart function that SIGKILLs and restarts
nodes sequentially. After all nodes are back up, the test proceeded
directly to reads after wait_for_cql_and_get_hosts(), which only
confirms CQL reachability.
While a node is restarted, other nodes might execute global token
metadata barriers, which advance the topology fence version. The
restarted node has to learn about the new version before it can send
reads/writes to the other nodes. The test issues reads as soon as the
CQL port is opened, which might happen before the last restarted node
learns of the latest topology version. If this node acts as a
coordinator for reads/write before this happens, these will fail as the
other nodes will reject the ops with the outdated topology fence
version.
Fix this by replacing wait_for_cql_and_get_hosts() on the abrupt-restart
path with the more robus get_ready_cql(), which makes sure servers see
each other before refreshing the cql connection. This should ensure that
nodes have exchanged gossip and converged on topology state before any
reads are executed. The rolling_restart() path is unaffected as it
handles this internally.
Fixes: SCYLLADB-557
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#29211
`test_crashed_node_substitution` intermittently failed:
```python
assert len(gossiper_eps) == (len(server_eps) + 1)
```
The test crashed the node right after a single ACK2 handshake (`finished do_send_ack2_msg`), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node.
This change: Wait until the gossiper state is visible on peers before continuing the test and asserting.
Fixes: [SCYLLADB-1256](https://scylladb.atlassian.net/browse/SCYLLADB-1256).
backport: this issue may affect CI for all branches, so should be backported to all versions.
[SCYLLADB-1256]: https://scylladb.atlassian.net/browse/SCYLLADB-1256?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQClosesscylladb/scylladb#29254
* github.com:scylladb/scylladb:
test: test_crashed_node_substitution: add docstring and fix whitespace
test: fix race condition in test_crashed_node_substitution
If lwt_workload() sends an update immediately after a
rolling restart, the coordinator might still see a replica as
down due to gossip lagging behind. Concurrently restarting another
node leaves only one available replica, failing the
LOCAL_QUORUM requirement for learn or eventually consistent
sp::query() in sp::cas() and resulting in
a mutation_write_failure_exception.
We fix this problem by waiting for the restarted server
to see 2 other peers. The server_change_version
doesn't do that by default -- it passes
wait_others=0 to server_start().
Fixes SCYLLADB-1136
Closesscylladb/scylladb#29234
`test_crashed_node_substitution` intermittently failed:
```
assert len(gossiper_eps) == (len(server_eps) + 1)
```
The test crashed the node right after a single ACK2 handshake
("finished do_send_ack2_msg"), assuming the node state was
visible to all peers. However, since gossip is eventually
consistent, the update may not have propagated yet, so some
nodes did not see the failed node.
This change: Wait until the gossiper state is visible on
peers before continuing the test and asserting.
Fixes: SCYLLADB-1256.
SSTable unlinking is async, so in some cases it may happen that
the upload dir is not empty immediately after refresh is done.
This patch adjusts test_refresh_deletes_uploaded_sstables so
it waits with a timeout till the upload dir becomes empty
instead of just assuming the API will sync on sstables being
gone.
Fixes SCYLLADB-1190
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#29215
Remove unused `pytest.mark.single_node` marker from `TestCQLAudit`.
Rename `TestCQLAudit` to `CQLAuditTester` to reflect that it is a test helper, not a test class. This avoids accidental pytest collection and subsequent warning about `__init__`.
Logs before the fixes:
```
test/cluster/test_audit.py:514: 14 warnings
/home/dario/dev/scylladb/test/cluster/test_audit.py:514: PytestCollectionWarning: cannot collect test class 'TestCQLAudit' because it has a __init__ constructor (from: cluster/test_audit.py)
@pytest.mark.single_node
```
Fixes SCYLLADB-1237
This is an addition to the latest master code. No backport needed.
Closesscylladb/scylladb#29237
* github.com:scylladb/scylladb:
test: audit: rename TestCQLAudit to CQLAuditTester
test: audit: remove unused pytest.mark.single_node
pytest tries to collect tests for execution in several ways.
One is to pick all classes that start with 'Test'. Those classes
must not have custom '__init__' constructor. TestCQLAudit does.
TestCQLAudit after migration from test/cluster/dtest is not a test
class anymore, but rather a helper class. There are two ways to fix
this:
1. Add __init__ = False to the TestCQLAudit class
2. Rename it to not start with 'Test'
Option 2 feels better because the new name itself does not convey
the wrong message about its role.
Fixes SCYLLADB-1237
Remove unused pytest.mark.single_node in TestCQLAudit class.
This is a leftover from audit tests migration from
test/cluster/dtest to test/cluster.
Refs SCYLLADB-1237
Previously, the result of when_all was discarded. when_all stores
exceptions in the returned futures rather than throwing, so the outer
catch(in_use&) could never trigger. Now we capture the when_all result
and inspect each future individually to properly detect in_use from
either stream.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1216Closesscylladb/scylladb#29219
Change wait_for() defaults from period=1s/no backoff to period=0.1s
with 1.5x backoff capped at 1.0s. This catches fast conditions in
100ms instead of 1000ms, benefiting ~100 call sites automatically.
Add completion logging with elapsed time and iteration count.
Tested local with test/cluster/test_fencing.py::test_fence_hints (dev mode),
log output:
wait_for(at_least_one_hint_failed) completed in 0.83s (4 iterations)
wait_for(exactly_one_hint_sent) completed in 1.34s (5 iterations)
Fixes SCYLLADB-738
Closesscylladb/scylladb#29173
The test was flaky. The scenario looked like this:
1. Stop server 1.
2. Set its rf_rack_valid_keyspaces configuration option to true.
3. Create an RF-rack-invalid keyspace.
4. Start server 1 and expect a failure during start-up.
It was wrong. We cannot predict when the Raft mutation corresponding to
the newly created keyspace will arrive at the node or when it will be
processed. If the check of the RF-rack-valid keyspaces we perform at
start-up was done before that, it won't include the keyspace. This will
lead to a test failure.
Unfortunately, it's not feasible to perform a read barrier during
start-up. What's more, although it would help the test, it wouldn't be
useful otherwise. Because of that, we simply fix the test, at least for
now.
The new scenario looks like this:
1. Disable the rf_rack_valid_keyspaces configuration option on server 1.
2. Start the server.
3. Create an RF-rack-invalid keyspace.
4. Perform a read barrier on server 1. This will ensure that it has
observed all Raft mutations, and we won't run into the same problem.
5. Stop the node.
6. Set its rf_rack_valid_keyspaces configuration option to true.
7. Try to start the node and observe a failure.
This will make the test perform consistently.
---
I ran the test (in dev mode, on my local machine) three times before
these changes, and three times with them. I include the time results
below.
Before:
```
real 0m47.570s
user 0m41.631s
sys 0m8.634s
real 0m50.495s
user 0m42.499s
sys 0m8.607s
real 0m50.375s
user 0m41.832s
sys 0m8.789s
```
After:
```
real 0m50.509s
user 0m43.535s
sys 0m9.715s
real 0m50.857s
user 0m44.185s
sys 0m9.811s
real 0m50.873s
user 0m44.289s
sys 0m9.737s
```
Fixes SCYLLADB-1137
Backport: The test is present on all supported branches, and so we
should backport these changes to them.
Closesscylladb/scylladb#29218
* github.com:scylladb/scylladb:
test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces
test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py
This PR contains two small improvements to `test_incremental_repair.py`
motivated by the sporadic failure of
`test_tablet_incremental_repair_and_scrubsstables_abort`.
The test fails with `assert 3 == 2` on `len(sst_add)` in the second
repair round. The extra SSTable has `repaired_at=0`, meaning scrub
unexpectedly produced more unrepaired SSTables than anticipated. Since
scrub (and compaction in general) logs at DEBUG level and the test did
not enable debug logging, the existing logs do not contain enough
information to determine the root cause.
**Commit 1** fixes a long-standing typo in the helper function name
(`preapre` -> `prepare`).
**Commit 2** enables `compaction=debug` for the Scylla nodes started by
`do_tablet_incremental_repair_and_ops`, which covers all
`test_tablet_incremental_repair_and_*` variants. This will capture full
compaction/scrub activity on the next reproduction, making the failure
diagnosable.
Refs: SCYLLADB-1086
Backport: test improvement, no backport
Closesscylladb/scylladb#29175
* https://github.com/scylladb/scylladb:
test/cluster/test_incremental_repair.py: enable compaction DEBUG logs in do_tablet_incremental_repair_and_ops
test/cluster/test_incremental_repair.py: fix typo preapre -> prepare
Two issues prevented the precompiled header from compiling
successfully when using CMake directly (rather than the
configure.py + ninja build system):
a) Propagate build flags to Rust binding targets reusing the
PCH. The wasmtime_bindings and inc targets reuse the PCH
from scylla-precompiled-header, which is compiled with
Seastar's flags (including sanitizer flags in
Debug/Sanitize modes). Without matching compile options,
the compiler rejects the PCH due to flag mismatch (e.g.,
-fsanitize=address). Link these targets against
Seastar::seastar to inherit the required compile options.
Closesscylladb/scylladb#28941
The test was flaky. The scenario looked like this:
1. Stop server 1.
2. Set its rf_rack_valid_keyspaces configuration option to true.
3. Create an RF-rack-invalid keyspace.
4. Start server 1 and expect a failure during start-up.
It was wrong. We cannot predict when the Raft mutation corresponding to
the newly created keyspace will arrive at the node or when it will be
processed. If the check of the RF-rack-valid keyspaces we perform at
start-up was done before that, it won't include the keyspace. This will
lead to a test failure.
Unfortunately, it's not feasible to perform a read barrier during
start-up. What's more, although it would help the test, it wouldn't be
useful otherwise. Because of that, we simply fix the test, at least for
now.
The new scenario looks like this:
1. Disable the rf_rack_valid_keyspaces configuration option on server 1.
2. Start the server.
3. Create an RF-rack-invalid keyspace.
4. Perform a read barrier on server 1. This will ensure that it has
observed all Raft mutations, and we won't run into the same problem.
5. Stop the node.
6. Set its rf_rack_valid_keyspaces configuration option to true.
7. Try to start the node and observe a failure.
This will make the test perform consistently.
---
I ran the test (in dev mode, on my local machine) three times before
these changes, and three times with them. I include the time results
below.
Before:
```
real 0m47.570s
user 0m41.631s
sys 0m8.634s
real 0m50.495s
user 0m42.499s
sys 0m8.607s
real 0m50.375s
user 0m41.832s
sys 0m8.789s
```
After:
```
real 0m50.509s
user 0m43.535s
sys 0m9.715s
real 0m50.857s
user 0m44.185s
sys 0m9.811s
real 0m50.873s
user 0m44.289s
sys 0m9.737s
```
Fixes SCYLLADB-1137
ERMs created in `calculate_vnode_effective_replication_map` have RF computed based
on the old token metadata during a topology change. The reading replicas, however,
are computed based on the new token metadata (`target_token_metadata`) when
`read_new` is true. That can create a mismatch for EverywhereStrategy during some
topology changes - RF can be equal to the number of reading replicas +-1. During
bootstrap, this can cause the
`everywhere_replication_strategy::sanity_check_read_replicas` check to fail in
debug mode.
We fix the check in this commit by allowing one more reading replica when
`read_new` is true.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1147Closesscylladb/scylladb#29150
Before these changes, we would send mutations to the node and
immediately query the metrics to see how many hints had been written.
However, that could lead to random failures of the test: even if the
mutations have finished executing, hints are stored asynchronously, so
we don't have a guarantee they have already been processed.
To prevent such failures, we rewrite the check: we will perform multiple
checks against the metrics until we have confirmed that the hints have
indeed been written or we hit the timeout.
We're generous with the timeout: we give the test 60 seconds. That
should be enough time to avoid flakiness even on super slow machines,
and if the test does fail, we will know something is really wrong.
As a bonus, we improve the test in general too. We explicitly express
the preconditions we rely on, as well as bump the log level. If the
test fails in the future, it might be very difficult do debug it
without this additional information.
Fixes SCYLLADB-1133
Backport: The test is present on all supported branches. To avoid
running into more failures, we should backport these changes
to them.
Closesscylladb/scylladb#29191
* github.com:scylladb/scylladb:
test: cluster: Increase log level in test_write_cl_any_to_dead_node_generates_hints
test: cluster: Await all mutations concurrently in test_write_cl_any_to_dead_node_generates_hints
test: cluster: Specify min_tablet_count in test_write_cl_any_to_dead_node_generates_hints
test: cluster: Use new_test_table in test_write_cl_any_to_dead_node_generates_hints
test: cluster: Introduce auxiliary function keyspace_has_tablets
test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints
Reduce the timeout for one test to 60 minutes. The longest test we had
so far was ~10-15 minutes. So reducing this timeout is pretty safe and
should help with hanging tests.
Closesscylladb/scylladb#29212
Migrate audit tests from test/cluster/dtest to test/cluster. Optimize their execution time through cluster reuse.
The audit test suite is heavy. There are more than 70 test executions. Environment preparation is a significant part of each test case execution time.
This PR:
1. Copies audit tests from test/cluster/dtest to test/cluster, refactoring and enabling them
2. Groups tests functions by non-live cluster configuration variations to enable cluster reuse between them
- Execution time reduced from 4m 29s to 2m 47s, which is ~38% execution time decrease
3. Removes the old audit tests from test/cluster/dtest
Includes two supporting changes:
- Allow specifying `AuthProvider` in `ManagerClient.get_cql_exclusive`
- Fix server log file handling for clean clusters
Refs [SCYLLADB-573](https://scylladb.atlassian.net/browse/SCYLLADB-573)
This PR is an improvement and does not require a backport.
[SCYLLADB-573]: https://scylladb.atlassian.net/browse/SCYLLADB-573?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQClosesscylladb/scylladb#28650
* github.com:scylladb/scylladb:
test: cluster: fix log clear race condition in test_audit.py
test: pylib: shut down exclusive cql connections in ManagerClient
test: cluster: fix multinode audit entry comparison in test_audit.py
test: cluster: dtest: remove old audit tests
test: cluster: group migrated audit tests for cluster reuse
test: cluster: enable migrated audit tests and make them work
test: pylib: manager_client: specify AuthProvider in get_cql_exclusive
test: pylib: scylla cluster after_test log fix
test: audit: copy audit test from dtest
Maintenance socket path used for PGO is in the node workdir.
When the node workdir path is too long, the maintenance socket path
(workdir/cql.m) can exceed the Unix domain socket sun_path limit
and failing the PGO training pipeline.
To prevent this:
- pass an explicit --maintenance-socket override
pointing to a short determinitic path in /tmp derived from the MD5
hash of the workdir maintenance socket path
- update maintenance_socket_path to return the matching short path
so that exec_cql.py connects to the right socket
The short path socket files are cleaned up after the cluster stops.
The path is using MD5 hash of the workdir path, so it is deterministic.
Fixes SCYLLADB-1070
Closesscylladb/scylladb#29149
Commit faa0ee9844 accidentally broke the way split snapshot mutation was
frozen -- instead of appending the sub-mutation `m` the commit kept the
old variable name of `mut` which in the new code corresponds to "old"
non-split mutation
Fixes#29051
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#29052
There's a flaw in table::query() -- calling querier_opt->close() can dereferences a disengaged std::optional. The fix pretty simple. Once fixed, there are two if-s checking for querier_opt being engaged or not that are worth being merged.
The problem doesn't really shows itself becase table::query() is not called with null saved_querier, so the de-facto if is always correct. However, better to be on safe-side.
The problem doesn't show itself for real, not worth backporting
Closesscylladb/scylladb#29142
* github.com:scylladb/scylladb:
table: merge adjacent querier_opt checks in query()
table: don't close a disengaged querier in query()
Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/147.
To fix the problem, add an explicit `permissions:` block to the workflow
(either at the top level or inside the `trigger-jenkins` job) that
constrains the `GITHUB_TOKEN` to the minimal necessary privileges. This
codifies least-privilege in the workflow itself instead of relying on
repository or organization defaults.
The best minimal, non‑breaking change is to define a root‑level
`permissions:` block with read‑only contents access because the job does
not perform any write operations to the repository, nor does it interact
with issues, pull requests, or other GitHub resources. A conservative,
widely accepted baseline is `contents: read`. If later steps require more
permissions, they can be added explicitly, but for this snippet, no such
need is visible.
Concretely, in `.github/workflows/trigger_jenkins.yaml`, insert:
```yaml
permissions:
contents: read
```
between the `name:` block and the `on:` block (e.g., after line 2).
No additional methods, imports, or definitions are needed since this is
a pure YAML configuration change and does not alter runtime behavior of
the existing shell steps.
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Closesscylladb/scylladb#27815
Potential fix for https://github.com/scylladb/scylladb/security/code-scanning/169.
In general, the fix is to add an explicit `permissions:` block to the
workflow (at the root level or per job) so that the `GITHUB_TOKEN` has
only the minimal scopes needed. Since this job only reads event data and
uses secrets to talk to Jenkins, we can restrict `GITHUB_TOKEN` to
read‑only repository contents.
The single best fix here is to add a top‑level `permissions:` block
right under the `name:` (and before `on:`) in
`.github/workflows/trigger-scylla-ci.yaml`, setting `contents: read`.
This applies to all jobs in the workflow, including `trigger-jenkins`,
and does not alter any existing steps or logic. No additional imports or
methods are needed, as this is purely a YAML configuration change for
GitHub Actions.
Concretely, edit `.github/workflows/trigger-scylla-ci.yaml` to insert:
```yaml
permissions:
contents: read
```
after line 1. No other lines in the file need to change.
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Closesscylladb/scylladb#27812
We increase the log level of `hints_manager` to TRACE in the test.
If it fails, it may be incredibly difficult to debug it without any
additional information.
The test relies on the assumption that mutations will be distributed
more or less uniformly over the nodes. Although in practice this should
not be possible, theoretically it's possible that there's only one
tablet allocated for the table.
To clearly indicate this precondition, we explicitly set the property
`min_tablet_count` when creating the table. This way, we have a gurantee
that the table has multiple tablets. The load balancer should now take
care of distributing them over the nodes equally. Thanks to that,
`servers[1]` will have some tablets, and so it'll be the target for some
of the mutations we perform.
The context manager is the de-facto standard in the test suite. It will
also allow us for a prettier way to conditionally enable per-table
tablet options in the following commit.
The function is adapted from its counterpart in the cqlpy test suite:
cqlpy/util.py::keyspace_has_tablets. We will use it in a commit in this
series to conditionally set tablet properties when creating a table.
It might also be useful in general.
Before these changes, we would send mutations to the node and
immediately query the metrics to see how many hints had been written.
However, that could lead to random failures of the test: even if the
mutations have finished executing, hints are stored asynchronously, so
we don't have a guarantee they have already been processed.
To prevent such failures, we rewrite the check: we will perform multiple
checks against the metrics until we have confirmed that the hints have
indeed been written or we hit the timeout.
We're generous with the timeout: we give the test 60 seconds. That
should be enough time to avoid flakiness even on super slow machines,
and if the test does fail, we will know something is really wrong.
Fixes SCYLLADB-1133
Fixtures previously ran GDB once (module scope) to find live objects
(sstables, tasks, schemas) and stored their addresses. Tests then
reused those addresses in separate GDB invocations. Sometimes these
addresses would become stale and the test would step on use-after-free
(e.g. sstables compacted away between invocations).
Fix by dropping the fixtures. The helper functions used by the fixtures
to obtain the required objects are converted to gdb convenience
functions, which can be used in the same expression as the test command
invocation. Thus, the object is aquired on-demand at the moment it is
used, so it is guaranteed to be fresh and relevant.
Fixes: SCYLLADB-1020
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#28999
The version of fmt installed on my machine refuses to work with
`std::filesystem::path` directly. Add `.string()` calls in places that
attempt to print paths directly in order to make them work.
Closesscylladb/scylladb#29148
When encrypted_data_source::get() caches a trailing block in _next, the next call takes it directly — bypassing input_stream::read(), which checks _eof. It then calls input_stream::read_exactly() on the already-drained stream. Unlike read(), read_up_to(), and consume(), read_exactly() does not check _eof when the buffer is empty, so it calls _fd.get() on a source that already returned EOS.
In production this manifested as stuck encrypted SSTable component downloads during tablet restore: the underlying chunked_download_source hung forever on the post-EOS get(), causing 4 tablets to never complete. The stuck files were always block-aligned sizes (8k, 12k) where _next gets populated and the source is fully consumed in the same call.
Fix by checking _input.eof() before calling read_exactly(). When the stream already reached EOF, buf2 is known to be empty, so the call is skipped entirely.
A comprehensive test is added that uses a strict_memory_source which fails on post-EOS get(), reproducing the exact code path that caused the production deadlock.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1128
Backport to 2025.3/4 and 2026.1 is needed since it fixes a bug that may bite us in production, to be on the safe side
Closesscylladb/scylladb#29110
* github.com:scylladb/scylladb:
encryption: fix deadlock in encrypted_data_source::get()
test_lib: mark `limiting_data_source_impl` as not `final`
Fix formatting after previous patch
Fix indentation after previous patch
test_lib: make limiting_data_source_impl available to tests
The test sporadically fails because scrub produces an unexpected number
of SSTables. Compaction logs are needed to diagnose why, but were not
captured since scrub runs at DEBUG level. Enable compaction=debug for
the servers started by do_tablet_incremental_repair_and_ops so the next
reproduction provides enough information to root-cause the issue.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Without this include the file would not compile on its own. The issue
was most likely masked by the use of precompiled headers in our CI.
Closesscylladb/scylladb#29170
After the previous fix both guarding if-s start with 'if (querier_opt &&'.
Merge them into a single outer 'if (querier_opt)' block to avoid the
redundant check and make the structure easier to follow.
No functional change.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The condition guarding querier_opt->close() was:
When saved_querier is null the short-circuit makes the whole condition true
regardless of whether querier_opt is engaged. If partition_ranges is empty,
query_state::done() is true before the while-loop body ever runs, so querier_opt
is never created. Calling querier_opt->close() then dereferences a disengaged
std::optional — undefined behaviour.
Fix by checking querier_opt first:
This preserves all existing semantics (close when not saving, or when saving
wouldn't be useful) while making the no-querier path safe.
Why this doesn't surface today: the sole production call site, database::query(),
in practice. The API header documents nullptr as valid ("Pass nullptr when
queriers are not saved"), so the bug is real but latent.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
assert_entries_were_added:
- takes a "before" snapshot of the audit log
- yields to execute a statement
- takes an "after" snapshot of the audit log
- computes new rows by diffing "after" minus "before"
If an audit entry generated by prepare() arrives between the snapshot
and the diff, it inflates the new row count and the test fails with
assert 2 <= 1.
Fix by:
- Adding clear_audit_logs() at the end of prepare(), after all setup
- Waiting for the "completed re-reading configuration file" log message
after server_update_config
- Draining pending syslog lines before clearing the buffer
Refs SCYLLADB-573
get_cql_exclusive() creates a Cluster object per call, but never
records it. driver_close() cannot shut it down. The cluster's
internal scheduler thread then tries to submit work to an already
shut down executor. This causes RuntimeError:
RuntimeError: cannot schedule new futures after shutdown
Fix this by tracking every exclusive Cluster in a list and shutting
them all down in driver_close().
Refs SCYLLADB-573
assert_entries_were_added computes new audit rows by slicing the "after"
list at the length of the "before" list: rows_after[len(rows_before):].
This assumes new rows always appear at the tail of the combined sorted
list. In a multinode setup, each node generates its own event_time
timestamps. A new row from node A can sort before an old row from node
B, breaking the tail assumption. The assertion "new rows are not the
last rows in the audit table" then fires.
Fix this by splitting the before/after lists per node and computing the
new rows tail independently for each node. This guarantees that per node
ordering, which is monotonic, is respected, and the combined new rows
are sorted afterwards.
Refs SCYLLADB-573
Since audit tests have been migrated to test/cluster/test_audit.py,
old tests in test/cluster/dtest/audit_test.py have to be removed.
Refs SCYLLADB-573
This patch reorganizes the execution flow of the test functions.
They are grouped to enable cluster reuse between specific test
functions. One of the main contributors to the test execution time
is the cluster preparation. This patch significantly reduces the
total test execution time by having way less new cluster preparation
calls and more cluster reuse.
Performance increase on the developer machine is around 38%:
- before: 4m 29s
- after: 2m 47s
Fixes SCYLLADB-573
Make audit tests from test/cluster/dtest to test/cluster.
test/cluster environment has less overhead, and audit tests
are heavy, their execution taking lots of time. This patch
is part of an effort to improve audit test suite performance.
This patch refactors the tests so that they execute correctly,
as well as enables them. A follow up patch will remove the
audit tests in test/cluster/dtest.
All the tests are confirmed to be running after the change.
No dead code present.
Test test_audit_categories_invalid is not parametrized anymore.
It never used the parametrized helper class, so it just ran
the same logic three times. This is why there are now 74,
and not 76, test executions.
Refs SCYLLADB-573
This patch allows ManagerClient.get_cql_exclusive to accept AuthProvider
as parameter. This will be used in a follow up patch which migrates
audit test suite to test/cluster and requires this functionality for
some tests.
Refs SCYLLADB-573
Before any test, a pool of ScyllaCluster objects is created.
At the beginning of a test suite, a ScyllaClusterManager is created,
and given a reference to the pool.
At the end of a test suite, the ScyllaClusterManager is destroyed.
Before each test case:
- ManagerClient is constructed and connected to the ScyllaClusterManager
of that test suite
- A ScyllaCluster object is fetched from the pool
- If the pool is empty, a new ScyllaCluster object is created
- If the pool is not empty, a cached ScyllaCluster object is returned
After each test case:
- Return ScyllaCluster object from ManagerClient to the pool
- If the cluster is dirty, the pool destroys it
- If the cluster is clean, the pool caches it
- ManagerClient is destroyed
Many actions mark a cluster as dirty. Normal test execution will always
make the cluster be destroyed upon returning to the pool.
ManagerClient.mark_clean is not used in the tests. When it is used,
the flow with cluster reuse happens.
The bug is that the log file is closed even if cluster is not dirty.
This causes an error when trying to log to a reused cluster server.
The solution in this patch is to not close the log file if the cluster
is not dirty. Upon cluster reuse the log file will be open and functional.
Another approach would be to reopen the log file if closed, but this
approach seems more clean.
Refs SCYLLADB-573
This patch just copies the audit test suite from dtest and
disables it in the test config file. Later patches will
update the code and enable the test suite.
Refs SCYLLADB-573
When encrypted_data_source::get() caches a trailing block in
_next, the next call takes it directly — bypassing
input_stream::read(), which checks _eof. It then calls
input_stream::read_exactly() on the already-drained stream.
Unlike read(), read_up_to(), and consume(), read_exactly()
does not check _eof when the buffer is empty, so it calls
_fd.get() on a source that already returned EOS.
In production this manifested as stuck encrypted SSTable
component downloads during tablet restore: the underlying
chunked_download_source hung forever on the post-EOS get(),
causing 4 tablets to never complete. The stuck files were
always block-aligned sizes (8k, 12k) where _next gets
populated and the source is fully consumed in the same call.
Fix by checking _input.eof() before calling read_exactly().
When the stream already reached EOF, buf2 is known to be
empty, so the call is skipped entirely.
A comprehensive test is added that uses a strict_memory_source
which fails on post-EOS get(), reproducing the exact code
path that caused the production deadlock.
co_returnapi_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}",_pending_requests.get_count()));
co_returnapi_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}",_pending_requests.get_count()));
"\ttable : Audit messages written to column family named audit.audit_log.\n")
,audit_categories(this,"audit_categories",liveness::LiveUpdate,value_status::Used,"DCL,AUTH,ADMIN","Comma separated list of operation categories that should be audited.")
,audit_categories(this,"audit_categories",liveness::LiveUpdate,value_status::Used,"DCL,DDL,AUTH,ADMIN","Comma separated list of operation categories that should be audited.")
,audit_tables(this,"audit_tables",liveness::LiveUpdate,value_status::Used,"","Comma separated list of table names (<keyspace>.<table>) that will be audited.")
,audit_keyspaces(this,"audit_keyspaces",liveness::LiveUpdate,value_status::Used,"","Comma separated list of keyspaces that will be audited. All tables in those keyspaces will be audited")
,audit_unix_socket_path(this,"audit_unix_socket_path",value_status::Used,"/dev/log","The path to the unix socket used for writing to syslog. Only applicable when audit is set to syslog.")
sm::make_counter("sstable_reader_recreations",sm::description("number of times sstable reader was recreated due to memtable flush"),_stats.underlying_recreations),
sm::make_counter("sstable_partition_skips",sm::description("number of times sstable reader was fast forwarded across partitions"),_stats.underlying_partition_skips),
sm::make_counter("sstable_row_skips",sm::description("number of times sstable reader was fast forwarded within a partition"),_stats.underlying_row_skips),
sm::make_counter("pinned_dirty_memory_overload",sm::description("amount of pinned bytes that we tried to unpin over the limit. This should sit constantly at 0, and any number different than 0 is indicative of a bug"),_stats.pinned_dirty_memory_overload),
sm::make_counter("pinned_dirty_memory_overload",sm::description("amount of pinned bytes that we tried to unpin over the limit. This should sit constantly at 0, and any number different than 0 is indicative of a bug"),_stats.pinned_dirty_memory_overload).set_skip_when_empty(),
"everywhere_replication_strategy: the number of replicas for everywhere_replication_strategy is {}, "
"cannot be higher than replication factor {} + 1 during the 'read from new replicas' stage of a topology change",
read_replicas.size(),replication_factor);
}
}elseif(read_replicas.size()>replication_factor){
returnseastar::format("everywhere_replication_strategy: the number of replicas for everywhere_replication_strategy is {}, cannot be higher than replication factor {}",read_replicas.size(),replication_factor);
done=awaitcql.run_async(f"SELECT COUNT(*) FROM system_distributed.view_build_status WHERE status = 'SUCCESS' AND view_name = '{name}' ALLOW FILTERING")
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.