`e4da0afb8d5491bf995cbd1d7a7efb966c79ac34` introduces a protection
against resources that are "made up" of thin air to
`reader_concurrency_semaphore`. If there are more `_resources` than
the `_initial_resources`, it means there is a negative leak, and
`on_internal_error_noexcept` is called. In addition to it,
`_resources` is set to `std::max(_resources, _initial_resources)`.
However, the commit message of `e4da0afb8d5491bf995cbd1d7a7efb966c79ac34`
states the opposite: "The detection also clamps the
_resources to _initial_resources, to prevent any damage".
Before this commit, the protection mechanism doesn't clamp
`_resources` to `_initial_resources` but instead keeps `_resources` high,
possibly even indefinitely growing. This commit changes `std::max` to
`std::min` to make the code behave as intended.
Fixes: SCYLLADB-1014
Refs: SCYLLADB-163
Closesscylladb/scylladb#28982
(cherry picked from commit 9247dff8c2)
Closesscylladb/scylladb#28988Closesscylladb/scylladb#29196
vector_search: fix race condition on connection timeout
When a `with_connect` operation timed out, the underlying connection
attempt continued to run in the reactor. This could lead to a crash
if the connection was established/rejected after the client object had
already been destroyed. This issue was observed during the teardown
phase of a upcoming high-availability test case.
This commit fixes the race condition by ensuring the connection attempt
is properly canceled on timeout.
Additionally, the explicit TLS handshake previously forced during the
connection is now deferred to the first I/O operation, which is the
default and preferred behavior.
Fixes: SCYLLADB-832
Backports to 2026.1 and 2025.4 are required, as this issue also exists on those branches and is causing CI flakiness.
- (cherry picked from commit 3107d9083e)
Parent PR: #29031Closesscylladb/scylladb#29360
* github.com:scylladb/scylladb:
vector_search: test: fix flaky test
vector_search: fix race condition on connection timeout
`data_value::to_parsable_string()` crashes with a null pointer dereference when called on a `null` data_value. Return `"null"` instead.
Added tests after the fix. Manually checked that tests fail without the fix.
Fixes SCYLLADB-1350
This is a fix that prevents format crash. No known occurrence in production, but backport is desirable.
Closesscylladb/scylladb#29262
* github.com:scylladb/scylladb:
test: boost: test null data value to_parsable_string
cql3: fix null handling in data_value formatting
(cherry picked from commit 816f2bf163)
Closesscylladb/scylladb#29384Closesscylladb/scylladb#29434
When encrypted_data_source::get() caches a trailing block in _next, the next call takes it directly — bypassing input_stream::read(), which checks _eof. It then calls input_stream::read_exactly() on the already-drained stream. Unlike read(), read_up_to(), and consume(), read_exactly() does not check _eof when the buffer is empty, so it calls _fd.get() on a source that already returned EOS.
In production this manifested as stuck encrypted SSTable component downloads during tablet restore: the underlying chunked_download_source hung forever on the post-EOS get(), causing 4 tablets to never complete. The stuck files were always block-aligned sizes (8k, 12k) where _next gets populated and the source is fully consumed in the same call.
Fix by checking _input.eof() before calling read_exactly(). When the stream already reached EOF, buf2 is known to be empty, so the call is skipped entirely.
A comprehensive test is added that uses a strict_memory_source which fails on post-EOS get(), reproducing the exact code path that caused the production deadlock.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1128
Backport to 2025.3/4 and 2026.1 is needed since it fixes a bug that may bite us in production, to be on the safe side
Closesscylladb/scylladb#29110
* github.com:scylladb/scylladb:
encryption: fix deadlock in encrypted_data_source::get()
Fix formatting after previous patch
Fix indentation after previous patch
(cherry picked from commit 3b9398dfc8)
Closesscylladb/scylladb#29198Closesscylladb/scylladb#29359
A joining node hung forever if the topology coordinator added it to the
group 0 configuration before the node reached `post_server_start`. In
that case, `server->get_configuration().contains(my_id)` returned true
and the node broke out of the join loop early, skipping
`post_server_start`. `_join_node_group0_started` was therefore never set,
so the node's `join_node_response` RPC handler blocked indefinitely.
Meanwhile the topology coordinator's `respond_to_joining_node` call
(which has no timeout) hung forever waiting for the reply that never came.
Fix by only taking the early-break path when not starting as a follower
(i.e. when the node is the discovery leader or is restarting). A joining
node must always reach `post_server_start`.
We also provide a regression test. It takes 6s in dev mode.
Fixes SCYLLADB-959
Closesscylladb/scylladb#29266
(cherry picked from commit b9f82f6f23)
Closesscylladb/scylladb#29291Closesscylladb/scylladb#29308
The test assumes that the sleep duration will be at least the value of
the sleep parameter. However, the actual sleep time can be slightly less
than requested (e.g., a 100ms sleep request might result in a 99ms
sleep).
This commit adjusts the test's time comparison to be more lenient,
preventing test flakiness.
When a `with_connect` operation timed out, the underlying connection
attempt continued to run in the reactor. This could lead to a crash
if the connection was established/rejected after the client object had
already been destroyed. This issue was observed during the teardown
phase of a upcoming high-availability test case.
This commit fixes the race condition by ensuring the connection attempt
is properly canceled on timeout.
Additionally, the explicit TLS handshake previously forced during the
connection is now deferred to the first I/O operation, which is the
default and preferred behavior.
Fixes: SCYLLADB-832
Use get_cql_exclusive(node1) so the driver only connects to node1 and
never attempts to contact the stopped node2. The test was flaky because
the driver received `Host has been marked down or removed` from node2.
Fixes: SCYLLADB-1227
Closesscylladb/scylladb#29268
(cherry picked from commit ab43420d30)
Closesscylladb/scylladb#29278Closesscylladb/scylladb#29355
The test was using time.sleep(1) (a blocking call) to wait after
scheduling the stop_compaction task, intending to let it register on
the server before releasing the sstable_cleanup_wait injection point.
However, time.sleep() blocks the asyncio event loop entirely, so the
asyncio.create_task(stop_compaction) task never gets to run during the
sleep. After the sleep, the directly-awaited message_injection() runs
first, releasing the injection point before stop_compaction is even
sent. By the time stop_compaction reaches Scylla, the cleanup has
already completed successfully -- no exception is raised and the test
fails.
Fix by replacing time.sleep(1) with await asyncio.sleep(1), which
yields control to the event loop and allows the stop_compaction task
to actually send its HTTP request before message_injection is called.
Fixes: SCYLLADB-834
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#29202
(cherry picked from commit 068a7894aa)
Closesscylladb/scylladb#29277Closesscylladb/scylladb#29356
**The Bug**
Assertion failure: `SCYLLA_ASSERT(res.second)` in `raft/server.cc`
when creating a snapshot transfer for a destination that already had a
stale in-flight transfer.
**Root Cause**
If a node loses leadership and later becomes leader again before the next
`io_fiber` iteration, the old transfer from the previous term can remain
in `_snapshot_transfers` while `become_leader()` resets progress state.
When the new term emits `install_snapshot(dst)`, `send_snapshot(dst)`
tries to create a new entry for the same destination and can hit the
assertion.
**The Fix**
Abort all in-flight snapshot transfers in `process_fsm_output()` when
`term_and_vote` is persisted. A term/vote change marks existing transfers
as stale, so we clean them up before dispatching messages from that batch
and before any new snapshot transfer is started.
With cross-term cleanup moved to the term-change path, `send_snapshot()`
now asserts the within-term invariant that there is at most one in-flight
transfer per destination.
Fixes: SCYLLADB-862
Backport: The issue is reproducible in master, but is present in all
active branches.
Closesscylladb/scylladb#29092
(cherry picked from commit 9dad68e58d)
Closesscylladb/scylladb#29264Closesscylladb/scylladb#29357
There are 3 metrics (that goes in every compaction_history entry):
total_tombstone_purge_attempt
total_tombstone_purge_failure_due_to_overlapping_with_memtable
total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable
When a tombstone is not expired (e.g. doesn't satisfy "gc_before" or
grace period), it can be currently accounted as failure due to
overlapping with either memtable or uncompacting sstable.
So those 2 last metrics have noise of *unexpired* tombstones.
What we should do is to only account for expired tombstones in all
those 3 metrics. We lose the info of knowing the amount of tombstones
processed by compaction, now we'll only know about the expired ones.
But those metrics were primarily added for explaining why expired
tombstones cannot be removed.
We could have alternatively added a new field
purge_failure_due_to_being_unexpired or something, but
it requires adding a new field to compaction_history.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-737.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#28669
(cherry picked from commit f33f324f77)
Closesscylladb/scylladb#28743
Currently, the view_update_generator::mutate_MV function acquires a
reference to the keyspace relevant to the operation, then it calls
max_concurrent_for_each and uses that reference inside the lambda passed
to that function. max_concurrent_for_each can preempt and there is no
mechanism that makes sure that the keyspace is alive until the view
updates are generated, so it is possible that the keyspace is freed by
the time the reference is used.
Fix the issue by precomputing the necessary information based on the
keyspace reference right away, and then passing that information by
value to the other parts of the code. It turns out that we only need to
know the replication factor of the datacenter and whether the keyspace
uses a network topology strategy.
Fixes: scylladb/scylladb#28925Closesscylladb/scylladb#28928
(cherry picked from commit 42d70baad3)
Closesscylladb/scylladb#28968Closesscylladb/scylladb#29095
During raft-topology upgrade in 2026.1, service_level_controller::migrate_to_v2() returns early when system_distributed.service_levels is empty. This skips the service_level_version = 2 write, so the cluster is never marked as upgraded to service levels v2 even though there is no data to migrate. Subsequent upgrades may then fail the startup check which requires service_level_version == 2.
Remove the early return and let the migration commit the version marker even when there are no legacy service levels rows to copy.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1198
backport: should be backported to all versions that can be upgraded to 2026.2
Closesscylladb/scylladb#29333
* github.com:scylladb/scylladb:
test/auth_cluster: cover empty legacy table in service level upgrade
service_levels: mark v2 migration complete on empty legacy table
(cherry picked from commit 95e422db48)
Closesscylladb/scylladb#29352
The tests in test_out_of_space_prevention.py are flaky. Three issues contribute:
1. After creating/removing the blob file that simulates disk pressure,
the tests immediately checked derived state (e.g., "compaction_manager
- Drained") without first confirming the disk space monitor had detected
the utilization change. Fix: explicitly wait for "Reached/Dropped below
critical disk utilization level" right after creating/removing the blob
file, before checking downstream effects.
2. Several tests called `manager.driver_connect()` or omitted reconnection
entirely after `server_restart()` / `server_start()`. The pre-existing
driver session can silently reconnect multiple times, causing subsequent
CQL queries to fail. Fix: call `reconnect_driver()` after every node restart.
Additionally, call `wait_for_cql_and_get_hosts()` where CQL is used afterward,
to ensure all connection pools are established.
3. Some log assertions used marks captured before a restart, so they could
match pre-restart messages or miss messages emitted in the correct post-restart
window. Fix: refresh marks at the right points.
Apart from that, the patch fixes a typo: autotoogle -> autotoggle.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-655Closesscylladb/scylladb#28626
(cherry picked from commit 826fd5d6c3)
Closesscylladb/scylladb#28967Closesscylladb/scylladb#29197
ERMs created in `calculate_vnode_effective_replication_map` have RF computed based
on the old token metadata during a topology change. The reading replicas, however,
are computed based on the new token metadata (`target_token_metadata`) when
`read_new` is true. That can create a mismatch for EverywhereStrategy during some
topology changes - RF can be equal to the number of reading replicas +-1. During
bootstrap, this can cause the
`everywhere_replication_strategy::sanity_check_read_replicas` check to fail in
debug mode.
We fix the check in this commit by allowing one more reading replica when
`read_new` is true.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1147Closesscylladb/scylladb#29150
(cherry picked from commit 503a6e2d7e)
Closesscylladb/scylladb#29248Closesscylladb/scylladb#29269
The test intentionally creates huge index pages.
But since 5e7fb08bf3,
the index reader allocates a block of memory for a whole index page,
instead of incrementally allocating small pieces during index parsing.
This giant allocation causes the test to fail spuriously in CI sometimes.
Fix this by disabling sstable compression on the test table,
which puts a hard cap of 2000 keys per index page.
Fixes: SCYLLADB-1152
Closesscylladb/scylladb#29152
(cherry picked from commit f29525f3a6)
Closesscylladb/scylladb#29172Closesscylladb/scylladb#29259
The removenove initiator could have an outdated token ring (still considering
the node removed by the previous removenode a token owner) and unexpectedly
reject the operation.
Fix that by waiting for token ring and group0 consistency before removenode.
Note that the test already checks that consistency, but only for one node,
which is different from the removenode initiator.
This test has been removed in master together with the code being tested
(the gossip-based topology). Hence, the fix is submitted directly to 2026.1.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1103
Backport to all supported branches (other than 2026.1), as the test can fail
there.
Closesscylladb/scylladb#29108
(cherry picked from commit 1398a55d16)
Closesscylladb/scylladb#29205
Move all ${{ }} expression interpolations into env: blocks so they are
passed as environment variables instead of being expanded directly into
shell scripts. This prevents an attacker from escaping the heredoc in
the Validate Comment Trigger step and executing arbitrary commands on
the runner.
The Verify Org Membership step is hardened in the same way for
defense-in-depth.
Refs: GHSA-9pmq-v59g-8fxp
Fixes: SCYLLADB-954
Closesscylladb/scylladb#28935
(cherry picked from commit 977bdd6260)
Closesscylladb/scylladb#28947
nodetool cluster repair without additional params repairs all tablet
keyspaces in a cluster. Currently, if a table is dropped while
the command is running, all tables are repaired but the command finishes
with a failure.
Modify nodetool cluster repair. If a table wasn't specified
(i.e. all tables are repaired), the command finishes successfully
even if a table was dropped.
If a table was specified and it does not exist (e.g. because it was
dropped before the repair was requested), then the behavior remains
unchanged.
Fixes: SCYLLADB-568.
Closesscylladb/scylladb#28739
(cherry picked from commit 2e68f48068)
Closesscylladb/scylladb#29006Closesscylladb/scylladb#29038
mv: allow skipping view updates when a collection is unmodified
When we generate view updates, we check whether we can skip the
entire view update if all columns selected by the view are unmodified.
However, for collection columns, we only check if they were unset
before and after the update.
In this patch we add a check for the actual collection contents.
We perform this check for both virtual and non-virtual selections.
When the column is only a virtual column in the view, it would be
enough to check the liveness of each collection cell, however for
that we'd need to deserialize the entire collection anyway, which
should be effectively as expensive as comparing all of its bytes.
Fixes: SCYLLADB-996
- (cherry picked from commit 01ddc17ab9)
Parent PR: #28839Closesscylladb/scylladb#28977
* github.com:scylladb/scylladb:
Merge 'mv: allow skipping view updates when a collection is unmodified' from Wojciech Mitros
mv: remove dead code in view_updates::can_skip_view_updates
Closesscylladb/scylladb#29094
`test_raft_no_quorum.py::test_cannot_add_new_node` is currently flaky in dev
mode. The bootstrap of the first node can fail due to `add_entry()` timing
out (with the 1s timeout set by the test case).
Other test cases in this test file could fail in the same way as well, so we
need a general fix. We don't want to increase the timeout in dev mode, as it
would slow down the test. The solution is to keep the timeout unchanged, but
set it only after quorum is lost. This prevents unexpected timeouts of group0
operations with almost no impact on the test running time.
A note about the new `update_group0_raft_op_timeout` function: waiting for
the log seems to be necessary only for
`test_quorum_lost_during_node_join_response_handler`, but let's do it
for all test cases just in case (including `test_can_restart` that shouldn't
be flaky currently).
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-913Closesscylladb/scylladb#28998
(cherry picked from commit 526e5986fe)
Closesscylladb/scylladb#29068Closesscylladb/scylladb#29097
Set enable_schema_commitlog for each group0 tables.
Assert that group0 tables use schema commitlog in ensure_group0_schema
(per each command).
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-914.
Needs backport to all live releases as all are vulnerable
Closesscylladb/scylladb#28876
* github.com:scylladb/scylladb:
test: add test_group0_tables_use_schema_commitlog
db: service: remove group0 tables from schema commitlog schema initializer
service: ensure that tables updated via group0 use schema commitlog
db: schema: remove set_is_group0_table param
(cherry picked from commit b90fe19a42)
Closesscylladb/scylladb#28916Closesscylladb/scylladb#28986
The default 100ms timeout for client readiness in tests is too
aggressive. In some test environments, this is not enough time for
client creation, which involves address resolution and TLS certificate
reading, leading to flaky tests.
This commit increases the default client creation timeout to 10 seconds.
This makes the tests more robust, especially in slower execution
environments, and prevents similar flakiness in other test cases.
Fixes: VECTOR-547
Fixes: SCYLLADB-802
Fixes: SCYLLADB-825
Fixes: SCYLLADB-826
Backport to 2025.4 and 2026.1, as the same problem occurs on these branches and can potentially make the CI flaky there as well.
- (cherry picked from commit bf369326d6)
Parent PR: #28879Closesscylladb/scylladb#28895
* github.com:scylladb/scylladb:
vector_search: test: include ANN error in assertion
vector_search: test: fix HTTPS client test flakiness
When the test fails, the assertion message does not include
the error from the ANN request.
This change enhances the assertion to include the specific ANN error,
making it easier to diagnose test failures.
The default 100ms timeout for client readiness in tests is too
aggressive. In some test environments, this is not enough time for
client creation, which involves address resolution and TLS certificate
reading, leading to flaky tests.
This commit increases the default client creation timeout to 10 seconds.
This makes the tests more robust, especially in slower execution
environments, and prevents similar flakiness in other test cases.
Fixes: VECTOR-547, SCYLLADB-802
The test is currently flaky with `reuse_ip = True`. The issue is that the
test retries replace before the first replace is rolled back and the
first replacing node is removed from gossip. The second replacing node
can see the entry of the first replacing node in gossip. This entry has
a newer generation than the entry of the node being replaced, and both
replacing nodes have the same IP as the node being replaced. Therefore,
the second replacing node incorrectly considers this entry as the entry
of the node being replaced. This entry is missing rack and DC, so the
second replace fails with
```
ERROR 2026-02-24 21:19:03,420 [shard 0:main] init - Startup failed:
std::runtime_error (Cannot replace node
8762a9d2-3b30-4e66-83a1-98d16c5dd007/127.61.127.1 with a node on
a different data center or rack.
Current location=UNKNOWN_DC/UNKNOWN_RACK, new location=dc1/rack2)
```
Fixes SCYLLADB-805
Closesscylladb/scylladb#28829
(cherry picked from commit ba7f314cdc)
Closesscylladb/scylladb#28953
This commit updates the documentation for the unified installer.
- The Open Source example is replaced with version 2025.1 (Source Available, currently supported, LTS).
- The info about CentOS 7 is removed (no longer supported).
- Java 8 is removed.
- The example for cassandra-stress is removed (as it was already removed on other installation pages).
Fixes https://github.com/scylladb/scylladb/issues/28150Closesscylladb/scylladb#28152
(cherry picked from commit 855c503c63)
Closesscylladb/scylladb#28910Closesscylladb/scylladb#28927
scylladb/scylla container image doesn't include systemctl binary, while it
is used by perftune.py script shipped within the same image.
Scylla Operator runs this script to tune Scylla nodes/containers,
expecting its all dependencies to be available in the container's PATH.
Without systemctl, the script fails on systems that run irqbalance
(e.g., on EKS nodes) as the script tries to reconfigure irqbalance and
restart it via systemctl afterwards.
Fixes: scylladb/scylla-operator#3080Closesscylladb/scylladb#28567
(cherry picked from commit b4f0eb666f)
Closesscylladb/scylladb#28845
(cherry picked from commit 4cc5c2605f)
Closesscylladb/scylladb#28890
Consider the following scenario:
1. Let nodes A,B,C form a cluster with RF=3
2. Write query with CL=QUORUM is submitted and is acknowledged by
nodes B,C
3. Follow-up read query with CL=QUORUM is sent to verify the write
from the previous step
4. Coordinator sends data/digest requests to the nodes A,B. Since the
node A is missing data, digest mismatches and data reconciliation
is triggered
5. The node A or B fails, becomes unavailable, etc
6. During reconciliation, data requests are sent to node A,B and fail
failing the entire read query
When the above scenario happens, the tests using `start_writes()` fail
with the following stacktrace:
```
...
> await finish_writes()
test/cluster/test_tablets_migration.py:259:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/util.py:241: in finish
await asyncio.gather(*tasks)
test/pylib/util.py:227: in do_writes
raise e
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
worker_id = 1
...
> rows = await cql.run_async(rd_stmt, [pk])
E cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test_1767777001181_bmsvk.test - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1}
```
Note that when a node failure happens before/during a read query,
there is no test failure as the speculative retries are enabled
by default. Hence an additional data/digest read is sent to the third
remaining node.
However, the same speculative read is cancelled the moment, the read
query reaches CL which may trigger a read-repair.
This change:
- Retries the verification read in start_writes() on failure to mitigate
races between reads and node failures
- Adds additional logging to correlate Python exceptions with Scylla logs
Fixes https://github.com/scylladb/scylladb/issues/27478
Fixes https://github.com/scylladb/scylladb/issues/27974
Fixes https://github.com/scylladb/scylladb/issues/27494
Fixes https://github.com/scylladb/scylladb/issues/23529
Note that this change test flakiness observed during tablet transitions.
However, it serves as a workaround for a higher-level issue
https://github.com/scylladb/scylladb/issues/28125Closesscylladb/scylladb#28140
(cherry picked from commit e07fe2536e)
Closesscylladb/scylladb#28825
The futurization refactoring in 9d3755f276 ("replica: Futurize
retrieval of sstable sets in compaction_group_view") changed
maybe_wait_for_sstable_count_reduction() from a single predicated
wait:
```
co_await cstate.compaction_done.wait([..] {
return num_runs_for_compaction() <= threshold
|| !can_perform_regular_compaction(t);
});
```
to a while loop with a predicated wait:
```
while (can_perform_regular_compaction(t)
&& co_await num_runs_for_compaction() > threshold) {
co_await cstate.compaction_done.wait([this, &t] {
return !can_perform_regular_compaction(t);
});
}
```
This was necessary because num_runs_for_compaction() became a
coroutine (returns future<size_t>) and can no longer be called
inside a condition_variable predicate (which must be synchronous).
However, the inner wait's predicate — !can_perform_regular_compaction(t)
— only returns true when compaction is disabled or the table is being
removed. During normal operation, every signal() from compaction_done
wakes the waiter, the predicate returns false, and the waiter
immediately goes back to sleep without ever re-checking the outer
while loop's num_runs_for_compaction() condition.
This causes memtable flushes to hang forever in
maybe_wait_for_sstable_count_reduction() whenever the sstable run
count exceeds the threshold, because completed compactions signal
compaction_done but the signal is swallowed by the predicate.
Fix by replacing the predicated wait with a bare wait(), so that
any signal (including from completed compactions) causes the outer
while loop to re-evaluate num_runs_for_compaction().
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-610Closesscylladb/scylladb#28801
(cherry picked from commit bb57b0f3b7)