Commit Graph

51978 Commits

Author SHA1 Message Date
Anna Stuchlik
356ca32994 doc: update the node size limit
This commit increases the node size limit from 256 to 4096 CPUs
based on be1f566488

Fixes SCYLLADB-1676

Closes scylladb/scylladb#29602

(cherry picked from commit a7b7019f90)

Closes scylladb/scylladb#29846

Closes scylladb/scylladb#29864
2026-05-13 09:36:09 +03:00
copilot-swe-agent[bot]
0e41ef82ac docs: fix typo in materialized views docs - "columns are" instead of "is"
The MV Select Statement description was missing the word "columns" and
used incorrect verb agreement, making the sentence grammatically broken
and ambiguous.

docs/cql/mv.rst: "which of the base table is included" →
"which of the base table columns are included"

Fixes #29662
Closes #29663

Co-authored-by: annastuchlik <37244380+annastuchlik@users.noreply.github.com>
(cherry picked from commit 9e7d67612c)

Closes scylladb/scylladb#29835

Closes scylladb/scylladb#29865
2026-05-13 09:35:00 +03:00
Piotr Dulikowski
2c5ae5cee0 database: add missing co_await on lock in create_local_system_table
The function database::create_local_system_table calls
get_tables_metadata().hold_write_lock(), but does not co_await the
returned future. Effectively, this code does not guarantee mutual
exclusion because it does not wait for the lock to be acquired and does
not guarantee that the lock is held long enough.

Fix this by adding the co_await that was missing.

Found by manual inspection. This code is not known to have caused any
problems so far, but it's clearly wrong - hence the fix.

Fixes: SCYLLADB-1916

Closes scylladb/scylladb#29806

(cherry picked from commit bc482bfdea)

Closes scylladb/scylladb#29815

Closes scylladb/scylladb#29833
2026-05-12 10:16:29 +02:00
Botond Dénes
d33fb5159a sstables/trie: add preemption points in trie_writer
The BTI partition index trie writer flushes all buffered nodes at the
end of each SSTable via complete_until_depth(0), called from
bti_partition_index_writer_impl::finish(). This is a tight synchronous
loop that writes trie nodes through file_writer::write(), which uses a
buffered output_stream: individual writes that fit in the buffer are
plain memcpy operations returning a ready future, so .get() never
yields. As a result the reactor can stall for several milliseconds on
large SSTables.

The entire call chain runs inside seastar::async() (via
sstable::write_components()), so seastar::thread::maybe_yield() is
safe to call here. Add it at the top of both tight loops:
- complete_until_depth(), which iterates over trie depth
- lay_out_children(), which iterates over child branches per node

Fixes SCYLLADB-1885

Closes scylladb/scylladb#29798

(cherry picked from commit d0813769ec)

Closes scylladb/scylladb#29810

Closes scylladb/scylladb#29816
2026-05-11 12:50:56 +03:00
Piotr Dulikowski
02c7f44da4 Merge 'table_helper: fix use-after-free on prepared-statement invalidation' from Marcin Maliszkiewicz
insert() held no local strong ref to the prepared modification_statement
across the suspension in execute(). On a single shard:

1. Fiber A suspends inside _insert_stmt->execute().
2. DROP TABLE / DROP KEYSPACE on the target, or LRU eviction, removes
   the prepared_statements_cache entry, releasing its strong ref.
3. Fiber B re-enters cache_table_info(), sees _prepared_stmt
   (checked_weak_ptr) invalidated, and runs _insert_stmt = nullptr,
   releasing the last strong ref. The modification_statement is freed.
4. Fiber A resumes inside execute() and touches freed *this.

Pin strong ref to _insert_stmt locally before the suspension.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1667

Backport: all supported branches, it's memory corruption bug, long present

Closes scylladb/scylladb#29588

* github.com:scylladb/scylladb:
  test/boost: add dummy case to table_helper_test for non-injection modes
  test/boost: add regression test for table_helper insert() UAF
  utils/error_injection: add waiters() API
  table_helper: fix use-after-free on prepared-statement invalidation

(cherry picked from commit efcc0b6376)

Closes scylladb/scylladb#29747

Closes scylladb/scylladb#29802
2026-05-10 13:58:15 +03:00
Yaniv Michael Kaul
c0bf728edb raft/group0: fix destroy assertion on startup failure
If start_server_for_group0() successfully registers a server in
_raft_gr._servers but a subsequent step (e.g. enable_in_memory_state_machine())
throws, the server is never destroyed because abort_and_drain()/destroy()
check std::get_if<raft::group_id>(&_group0) which was only set after the
entire with_scheduling_group block completed.

Move _group0.emplace<raft::group_id>() inside the lambda, immediately after
start_server_for_group() succeeds, so that cleanup paths can always find
and destroy the registered server.

This fixes the assertion:
  "raft_group_registry - stop(): server for group ... is not destroyed"

which manifests during shutdown after an upgrade where topology_state_load()
fails due to netw::unknown_address.

Backport: Yes, to 2026.1, 2026.2, as it causes a crash on upgrades

Refs: SCYLLADB-1217
Refs: CUSTOMER-340
Refs: CUSTOMER-335
Fixes: SCYLLADB-1809
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes, Opencode/Opus 4.6

Closes scylladb/scylladb#29702

(cherry picked from commit 6179406467)

Closes scylladb/scylladb#29742

Closes scylladb/scylladb#29754
scylla-2026.1.3-candidate-20260510010724 scylla-2026.1.3
2026-05-08 11:24:02 +02:00
Raphael S. Carvalho
0b47672901 repair/replica: Fix race window where post-repair data is wrongly promoted to repaired
During incremental repair, each tablet replica holds three SSTable views:
UNREPAIRED, REPAIRING, and REPAIRED.  The repair lifecycle is:

  1. Replicas snapshot unrepaired SSTables and mark them REPAIRING.
  2. Row-level repair streams missing rows between replicas.
  3. mark_sstable_as_repaired() runs on all replicas, rewriting the
     SSTables with repaired_at = sstables_repaired_at + 1 (e.g. N+1).
  4. The coordinator atomically commits sstables_repaired_at=N+1 and
     the end_repair stage to Raft, then broadcasts
     repair_update_compaction_ctrl which calls clear_being_repaired().

The bug lives in the window between steps 3 and 4.  After step 3, each
replica has on-disk SSTables with repaired_at=N+1, but sstables_repaired_at
in Raft is still N.  The classifier therefore sees:

  is_repaired(N, sst{repaired_at=N+1}) == false
  sst->being_repaired == null   (lost on restart, or not yet set)

and puts them in the UNREPAIRED view.  If a new write arrives and is
flushed (repaired_at=0), STCS minor compaction can fire immediately and
merge the two SSTables.  The output gets repaired_at = max(N+1, 0) = N+1
because compaction preserves the maximum repaired_at of its inputs.

Once step 4 commits sstables_repaired_at=N+1, the compacted output is
classified REPAIRED on the affected replica even though it contains data
that was never part of the repair scan.  Other replicas, which did not
experience this compaction, classify the same rows as UNREPAIRED.  This
divergence is never healed by future repairs because the repaired set is
considered authoritative.  The result is data resurrection: deleted rows
can reappear after the next compaction that merges unrepaired data with the
wrongly-promoted repaired SSTable.

The fix has two layers:

Layer 1 (in-memory, fast path): mark_sstable_as_repaired() now also calls
mark_as_being_repaired(session) on the new SSTables it writes.  This keeps
them in the REPAIRING view from the moment they are created until
repair_update_compaction_ctrl clears the flag after step 4, covering the
race window in the normal (no-restart) case.

Layer 2 (durable, restart-safe): a new is_being_repaired() helper on
tablet_storage_group_manager detects the race window even after a node
restart, when being_repaired has been lost from memory.  It checks:

  sst.repaired_at == sstables_repaired_at + 1
  AND tablet transition kind == tablet_transition_kind::repair

Both conditions survive restarts: repaired_at is on-disk in SSTable
metadata, and the tablet transition is persisted in Raft.  Once the
coordinator commits sstables_repaired_at=N+1 (step 4), is_repaired()
returns true and the SSTable naturally moves to the REPAIRED view.

The classifier in make_repair_sstable_classifier_func() is updated to call
is_being_repaired(sst, sstables_repaired_at) in place of the previous
sst->being_repaired.uuid().is_null() check.

A new test, test_incremental_repair_race_window_promotes_unrepaired_data,
reproduces the bug by:
  - Running repair round 1 to establish sstables_repaired_at=1.
  - Injecting delay_end_repair_update to hold the race window open.
  - Running repair round 2 so all replicas complete mark_sstable_as_repaired
    (repaired_at=2) but the coordinator has not yet committed step 4.
  - Writing post-repair keys to all replicas and flushing servers[1] to
    create an SSTable with repaired_at=0 on disk.
  - Restarting servers[1] so being_repaired is lost from memory.
  - Waiting for autocompaction to merge the two SSTables on servers[1].
  - Asserting that the merged SSTable contains post-repair keys (the bug)
    and that servers[0] and servers[2] do not see those keys as repaired.

NOTE FOR MAINTAINER: Copilot initially only implemented Layer 1 (the
in-memory being_repaired guard), missing the restart scenario entirely.
I pointed out that being_repaired is lost on restart and guided Copilot
to add the durable Layer 2 check.  I also polished the implementation:
moving is_being_repaired into tablet_storage_group_manager so it can
reuse the already-held _tablet_map (avoiding an ERM lookup and try/catch),
passing sstables_repaired_at in from the classifier to avoid re-reading it,
and using compaction_group_for_sstable inside the function rather than
threading a tablet_id parameter through the classifier.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1239.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#29244

(cherry picked from commit 16e387d5f9)
(cherry picked from commit cc3dcc4ba8)

Closes scylladb/scylladb#29411
2026-05-08 11:25:26 +03:00
Patryk Jędrzejczak
b3b333b597 Merge '[Backport 2026.1] Barrier and drain logging' from Scylladb[bot]
Add more logging to barrier and drain rpc to try and pinpoint https://github.com/scylladb/scylladb/issues/26281

Bakport since we want to have it if it happens in the field.

Fixes: SCYLLADB-1837
Refs: #26281

- (cherry picked from commit 11b838e71e)
- (cherry picked from commit e88ce09372)
- (cherry picked from commit 385915c101)
- (cherry picked from commit d2b695aa64)

Parent PR: #29735

Closes scylladb/scylladb#29770

* https://github.com/scylladb/scylladb:
  session, raft_topology: add periodic warnings for hung drain and stale version waits
  session: add info-level logging to drain_closing_sessions
  raft_topology: log sub-step progress in local_topology_barrier
  raft_topology: log read_barrier progress in topology cmd handler
  token_metadata: improve stale versions diagnostics
2026-05-08 10:02:18 +02:00
Łukasz Paszkowski
342a7bfce1 db: fix system.size_estimates to aggregate sstable estimates across all shards
The estimate() function in the size_estimates virtual reader only
considered sstables local to the shard that happened to own the
keyspace's partition key token. Since sstables are distributed across
shards, this caused partition count estimates to be approximately
1/smp_count of the actual value.

This bug has been present since the virtual reader was introduced in
225648780d.

Use db.container().map_reduce0() to aggregate sstable estimates
across all shards. Each shard contributes its local count and
estimated_histogram, which are then merged to produce the correct
total.

Also fix the `test_partitions_estimate_full_overlap` test which becomes
flaky (xpassing ~1% of runs) because autocompaction could merge the
two overlapping sstables before the size estimate was read. Wrap the
test body in nodetool.no_autocompaction_context to prevent this race.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1179
Refs https://github.com/scylladb/scylladb/issues/9083

Closes scylladb/scylladb#29286

(cherry picked from commit 6f364fd3b7)

Closes scylladb/scylladb#29381
2026-05-08 06:38:28 +03:00
Piotr Dulikowski
8a52602ec9 Merge '[Backport 2026.1] vector_search: test: fix flaky test_dns_resolving_repeated' from Scylladb[bot]
The `vector_store_client_test_dns_resolving_repeated` test was intermittently
timing out on CI. The exact root cause is not fully understood, but the
hypothesis is that a single trigger signal can be lost somewhere (not exactly
known where). This is not an issue for the production code because refresh
trigger will be called multiple times whenever all configured nodes will be
unreachable.

Fixes SCYLLADB-1794

Backport to 2026.1 and 2026.2, as the same CI flakiness can occur on these branches.

- (cherry picked from commit e9240587f4)
- (cherry picked from commit 44249c0a75)

Parent PR: #29752

Closes scylladb/scylladb#29796

* github.com:scylladb/scylladb:
  vector_search: test: default timeout in test_dns_resolving_repeated
  vector_search: test: fix flaky test_dns_resolving_repeated
2026-05-08 03:54:43 +02:00
Karol Nowacki
9ed728a5b3 vector_search: test: default timeout in test_dns_resolving_repeated
Replace explicit 1-second timeouts in repeat_until() with the default
STANDARD_WAIT (10s). The 1-second timeout could be too aggressive for
loaded CI environments where lowres_clock granularity (~10ms) combined
with OS scheduling delays and resource contention (-c2 -m2G) could cause
the loop to expire before the DNS refresh task completes its cycle.

This also unifies test timeouts across test cases.

(cherry picked from commit 207de967fb)
2026-05-07 17:04:15 +00:00
Karol Nowacki
b4467fb229 vector_search: test: fix flaky test_dns_resolving_repeated
Move trigger_dns_resolver() inside the repeat_until loop instead of
calling it once before the loop.

The test was intermittently timing out on CI. The exact root cause is not
fully understood, but the hypothesis is that a single trigger signal can
be lost somewhere (not exactly known where). This is not an issue for the
production code because refresh trigger will be called multiple times -
in every query where all configured nodes will be unreachable.

By triggering inside the loop, we ensure the signal is re-sent on
each iteration until the resolver actually performs the refresh and
picks up the new (failing) DNS resolution. This makes the test
resilient to timing-dependent signal loss without changing production
code.

Fixes: SCYLLADB-1794
(cherry picked from commit 4722be1289)
2026-05-07 17:04:15 +00:00
Marcin Maliszkiewicz
83999f7228 Merge 'utils: loading_cache: add insert() that is a no-op when caching is disabled' from Dario Mirovic
When `permissions_validity_in_ms` is set to 0, executing a prepared statement under authentication crashes with:
```
    Assertion `caching_enabled()' failed.
        at utils/loading_cache.hh:319
        in authorized_prepared_statements_cache::insert
```

`loading_cache::get_ptr()` asserts when caching is disabled (expiry == 0), but `authorized_prepared_statements_cache::insert()` was using it purely for its side effect of populating the cache, which is meaningless when caching is off.

Add a new `loading_cache::insert(k, load)` method that is a no-op when caching is disabled and otherwise forwards to `get_ptr()`. Switch `authorized_prepared_statements_cache::insert()` to use it. This
completes the disabled-mode safety contract of the cache for the write side, mirroring the fallback that `get()` already provides for the read side.

Includes a regression test in `test/boost/loading_cache_test.cc` plus a positive test for the new `insert()` overload.

Fixes SCYLLADB-1699

The crash is introduced a long time ago. It is present on all the live versions, from 2025.1 onward. No client tickets, but it should be backported.

Closes scylladb/scylladb#29638

* github.com:scylladb/scylladb:
  test: boost: regression test for loading_cache::insert with caching disabled
  utils: loading_cache: add insert() that is a no-op when caching is disabled

(cherry picked from commit c00fee0316)

Closes scylladb/scylladb#29762

Closes scylladb/scylladb#29782
2026-05-07 10:42:59 +03:00
Marcin Maliszkiewicz
1793814914 Merge '[Backport 2026.1] auth: fix crash on ghost rows in role_permissions' from Andrzej Jackowski
This is manual backport of https://github.com/scylladb/scylladb/pull/29757, because the changes are required on 2026.1 ASAP.

===
The auth cache crashes when it encounters rows in role_permissions that have a live row marker but no permissions column. These “ghost rows” were created by the now-removed auth v2 migration, which used INSERT (creating row markers) instead of UPDATE.

When permissions were later revoked, the row marker remained while the permissions column became null. An empty collection appears as null, since its lifetime is based only on its element's cells.

As a result, when the cache reloads and expects the permissions column to exist, it hits a missing_column exception.

The series removes dead code that was the primary crash site, adds has() guards to the remaining access paths, and includes a test reproducer.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1816

Backport: all supported versions 2026.1, 2025.4, 2025.1

Parent PR: https://github.com/scylladb/scylladb/pull/29757

Closes scylladb/scylladb#29771

* github.com:scylladb/scylladb:
  test: add reproducer for auth cache crash on missing permissions column
  auth: tolerate missing permissions column in authorize()
  auth: add defensive has() guard for role_attributes value column
  auth: remove unused permissions field from cache role_record
2026-05-06 19:21:57 +02:00
Gleb Natapov
269637ffa3 session, raft_topology: add periodic warnings for hung drain and stale version waits
Add periodic warning timers (every 5 minutes) to help diagnose hangs in
barrier_and_drain:

- drain_closing_sessions(): warn if semaphore acquisition or session gate
  close is taking too long, reporting the gate count to show how many
  guards are still alive.
- local_topology_barrier(): warn if stale_versions_in_use() is taking
  too long, reporting the current stale version trackers.
- session::gate_count(): new public accessor for diagnostic purposes.

These warnings help distinguish between the two possible hang points
in barrier_and_drain (stale versions vs session drain) and provide
ongoing visibility into what's blocking progress.

(cherry picked from commit d2b695aa64)
2026-05-06 15:00:44 +03:00
Gleb Natapov
d232b08be6 session: add info-level logging to drain_closing_sessions
drain_closing_sessions() is called as part of the barrier_and_drain
topology command and can block on two things: acquiring the drain
semaphore (if another drain is in progress) and waiting for individual
sessions to close (which blocks until all session guards are released).

Previously, all logging in this function was at debug level, making it
invisible in production logs. When barrier_and_drain hangs, there is no
way to tell whether the function is waiting for the semaphore, waiting
for a specific session to close, or was never called.

Promote logging to info level and add messages at each blocking point:
before/after semaphore acquisition (with count of sessions to drain),
before/after each individual session close (with session id), and at
function completion. This makes it possible to identify the exact
session blocking a topology operation from the node log alone.

(cherry picked from commit 385915c101)
2026-05-06 15:00:44 +03:00
Gleb Natapov
485c2504ad raft_topology: log sub-step progress in local_topology_barrier
When a node processes a barrier_and_drain topology command, it performs
two potentially long-running operations inside local_topology_barrier():
waiting for stale token metadata versions to be released
(stale_versions_in_use) and draining closing sessions
(drain_closing_sessions). Either of these can hang indefinitely -- for
example, stale_versions_in_use blocks until all references to previous
token metadata versions are released, which depends on in-flight
requests completing.

Previously, the only logging was a single 'done' message at the end,
making it impossible to determine which sub-step was blocking when a
barrier_and_drain RPC appeared stuck on a node. In a recent CI failure,
a node never responded to barrier_and_drain during a removenode
operation, and the logs showed the RPC was received but nothing about
what it was waiting on internally.

Add info-level logging before each blocking sub-step, including the
topology version for correlation. This allows diagnosing hangs by
showing whether the node is stuck waiting for stale metadata versions,
stuck draining sessions, or never reached these steps at all.

(cherry picked from commit e88ce09372)
2026-05-06 15:00:44 +03:00
Gleb Natapov
026e870e54 raft_topology: log read_barrier progress in topology cmd handler
When a raft topology command (e.g. barrier_and_drain) is received by a
node, the handler first performs a raft read_barrier to ensure it sees
the latest topology state. This read_barrier can hang indefinitely if
raft cannot achieve quorum, but there was no logging around it, making
it impossible to tell whether the handler was stuck at this step or
somewhere else.

Add info-level logging before and after the read_barrier call in
raft_topology_cmd_handler, including the command type, index, and term.
This allows diagnosing hangs by showing whether the node entered the
read_barrier and whether it completed, narrowing down the root cause
when a topology command RPC appears stuck on the receiver side.

(cherry picked from commit 11b838e71e)
2026-05-06 15:00:44 +03:00
Petr Gusev
dd75f53c5d token_metadata: improve stale versions diagnostics
Before waiting on stale_versions_in_use(), we log the stale versions
the barrier_and_drain handler will wait for, along with the number of
token_metadata references representing each version.
To achieve this, we store a pointer to token_metadata in
version_tracker, traverse the _trackers list, and output all items
with a version smaller than the latest. Since token_metadata
contains the version_tracker instance, it is guaranteed to remain
alive during traversal. To count references, token_metadata now
inherits from enable_lw_shared_from_this.

This helps diagnose tablet migration stalls and allows more
deterministic tests: when a barrier is expected to block, we can
verify that the log contains the expected stale versions rather
than checking that the barrier_and_drain is blocked on
stale_versions_in_use() for a fixed amount of time.

(cherry picked from commit e39f4b399c)
2026-05-06 14:52:11 +03:00
Marcin Maliszkiewicz
2c7d9b49bc test: add reproducer for auth cache crash on missing permissions column
(cherry picked from commit 5c5306c692)
2026-05-06 13:17:06 +02:00
Marcin Maliszkiewicz
82c3509752 auth: tolerate missing permissions column in authorize()
Ghost rows in role_permissions with a live row marker but no permissions
column can occur when permissions created via INSERT (e.g. by the removed
auth v2 migration) are later revoked. The row marker survives the revoke,
leaving a row visible to queries but with permissions=null.

Add a has() guard before accessing the permissions column, matching the
pattern already used in list_all(). Return NONE permissions for such
ghost rows instead of crashing.

(cherry picked from commit df69a5c79b)
2026-05-06 13:17:05 +02:00
Marcin Maliszkiewicz
2306562c01 auth: add defensive has() guard for role_attributes value column
Add a has() check before accessing the value column in role_attributes
to tolerate ghost rows with missing regular columns. In practice this
is unlikely to be a problem since attributes are not typically revoked,
but the guard is added for consistency and defensive programming.

(cherry picked from commit c44625ebdf)
2026-05-06 13:17:04 +02:00
Marcin Maliszkiewicz
3f35b3bc55 auth: remove unused permissions field from cache role_record
The permissions field in role_record was populated by fetch_role() but
never read. Authorization uses cached_permissions instead, which is
loaded via the permission_loader callback. Remove the dead field and
its fetch code.

The removed code also did not check for missing columns before accessing
the permissions set, which could crash on ghost rows left by the removed
auth v2 migration. The migration used INSERT (creating row markers),
and when permissions were later revoked, the row marker survived while
the permissions column became null.

(cherry picked from commit 797bc28aae)
2026-05-06 13:16:52 +02:00
Jenkins Promoter
70fa7453d0 Update ScyllaDB version to: 2026.1.3 2026-05-03 17:38:16 +03:00
Jenkins Promoter
fe9ec0cd5a Update pgo profiles - aarch64 2026-05-01 05:03:53 +03:00
Jenkins Promoter
5f32cbf502 Update pgo profiles - x86_64 2026-05-01 04:21:07 +03:00
Botond Dénes
ecb3f254ad sstables: fix segfault in parse_assert() when message is nullptr
parse_assert() accepts an optional `message` parameter that defaults
to nullptr. When the assertion fails and message is nullptr, it is
implicitly converted to sstring via the sstring(const char*) constructor,
which calls strlen(nullptr) -- undefined behavior that manifests as a
segfault in __strlen_evex.

This turns what should be a graceful malformed_sstable_exception into a
fatal crash. In the case of CUSTOMER-279, a corrupt SSTable triggered
parse_assert() during streaming (in continuous_data_consumer::
fast_forward_to()), causing a crash loop on the affected node.

Fix by guarding the nullptr case with a ternary, passing an empty
sstring() when message is null. on_parse_error() already handles
the empty-message case by substituting "parse_assert() failed".

Fixes: SCYLLADB-1672

Closes scylladb/scylladb#29285

(cherry picked from commit cfebe17592)

Closes scylladb/scylladb#29597
2026-04-30 12:12:27 +03:00
Avi Kivity
af59e9200a build: point seastar submodule at scylla-seastar.git
This allows us to backport seastar commits as the need arises.
2026-04-30 11:51:32 +03:00
Patryk Jędrzejczak
5540a16f1b Merge 'raft: Throw stopped_error if server aborted' from Dawid Mędrek
This PR solves a series of similar problems related to executing
methods on an already aborted `raft::server`. They materialize
in various ways:

* For `add_entry` and `modify_config`, a `raft::not_a_leader` with
  a null ID will be returned IF forwarding is disabled. This wasn't
  a big problem because forwarding has always been enabled for group0,
  but it's something that's nice to fix. It's might be relevant for
  strong consistency that will heavily rely on this code.

* For `wait_for_leader` and `wait_for_state_change`, the calls may
  hang and never resolve. A more detailed scenario is provided in a
  commit message.

For the last two methods, we also extend their descriptions to indicate
the new possible exception type, `raft::stopped_error`. This change is
correct since either we enter the functions and throw the exception
immediately (if the server has already been aborted), or it will be
thrown upon the call to `raft::server::abort`.

We fix both issues. A few reproducer tests have been included to verify
that the calls finish and throw the appropriate errors.

Fixes SCYLLADB-841

Backport: Although the hanging problems haven't been spotted so far
          (at least to the best of my knowledge), it's best to avoid
          running into a problem like that, so let's backport the
          changes to all supported versions. They're small enough.

Closes scylladb/scylladb#28822

* https://github.com/scylladb/scylladb:
  raft: Make methods throw stopped_error if server aborted
  raft: Throw stopped_error if server aborted
  test: raft: Introduce get_default_cluster

(cherry picked from commit bb1a798c2c)

Closes scylladb/scylladb#28903
2026-04-30 08:54:16 +03:00
Tomasz Grabiec
36637e3583 test: pylib: Ignore exceptions in wait_for()
ManagerClient::get_ready_cql() calls server_sees_others(), which waits
for servers to see each other as alive in gossip. If one of the
servers is still early in boot, RESTful API call to
"gossiper/endpoint/live" may fail. It throws an exception, which
currently terminates the wait_for() and propagates up, failing the test.

Fix this by ignoring errors when polling inside wait_for. In case of
timeout, we log the last exception. This should fix the problem not
only in this case, for all uses of wait_for().

Example output:

```
pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140>
deadline = 1775218828.9172852, period = 1.0, before_retry = None
backoff_factor = 1.5, max_period = 1.0, label = None

    async def wait_for(
            pred: Callable[[], Awaitable[Optional[T]]],
            deadline: float,
            period: float = 0.1,
            before_retry: Optional[Callable[[], Any]] = None,
            backoff_factor: float = 1.5,
            max_period: float = 1.0,
            label: Optional[str] = None) -> T:
        tag = label or getattr(pred, '__name__', 'unlabeled')
        start = time.time()
        retries = 0
        last_exception: Exception | None = None
        while True:
            elapsed = time.time() - start
            if time.time() >= deadline:
                timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)"
                if last_exception is not None:
                    timeout_msg += (
                        f"; last exception: {type(last_exception).__name__}: {last_exception}"
                    )
                    raise AssertionError(timeout_msg) from last_exception
                raise AssertionError(timeout_msg)

            try:
>               res = await pred()

test/pylib/util.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    async def _sees_min_others():
>       raise Exception("asd")
E       Exception: asd

test/pylib/manager_client.py:802: Exception

The above exception was the direct cause of the following exception:

manager = <test.pylib.manager_client.ManagerClient object at 0x7f022af7e7b0>

    @pytest.mark.asyncio
    async def test_auth_after_reset(manager: ManagerClient) -> None:
        servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1")
>       cql, _ = await manager.get_ready_cql(servers)

test/cluster/auth_cluster/test_auth_after_reset.py:33:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:137: in get_ready_cql
    await self.servers_see_each_other(servers)
test/pylib/manager_client.py:820: in servers_see_each_other
    await asyncio.gather(*others)
test/pylib/manager_client.py:806: in server_sees_others
    await wait_for(_sees_min_others, time() + interval, period=.5)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140>
deadline = 1775218828.9172852, period = 1.0, before_retry = None
backoff_factor = 1.5, max_period = 1.0, label = None

    async def wait_for(
            pred: Callable[[], Awaitable[Optional[T]]],
            deadline: float,
            period: float = 0.1,
            before_retry: Optional[Callable[[], Any]] = None,
            backoff_factor: float = 1.5,
            max_period: float = 1.0,
            label: Optional[str] = None) -> T:
        tag = label or getattr(pred, '__name__', 'unlabeled')
        start = time.time()
        retries = 0
        last_exception: Exception | None = None
        while True:
            elapsed = time.time() - start
            if time.time() >= deadline:
                timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)"
                if last_exception is not None:
                    timeout_msg += (
                        f"; last exception: {type(last_exception).__name__}: {last_exception}"
                    )
>                   raise AssertionError(timeout_msg) from last_exception
E                   AssertionError: wait_for(_sees_min_others) timed out after 45.30s (46 retries); last exception: Exception: asd

test/pylib/util.py:76: AssertionError
```

Fixes a failure observed in test_auth_after_reset:

```
manager = <test.pylib.manager_client.ManagerClient object at 0x7fb3740e1630>

    @pytest.mark.asyncio
    async def test_auth_after_reset(manager: ManagerClient) -> None:
        servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1")
        cql, _ = await manager.get_ready_cql(servers)
        await cql.run_async("ALTER ROLE cassandra WITH PASSWORD = 'forgotten_pwd'")

        logging.info("Stopping cluster")
        await asyncio.gather(*[manager.server_stop_gracefully(server.server_id) for server in servers])

        logging.info("Deleting sstables")
        for table in ["roles", "role_members", "role_attributes", "role_permissions"]:
            await asyncio.gather(*[manager.server_wipe_sstables(server.server_id, "system", table) for server in servers])

        logging.info("Starting cluster")
        # Don't try connect to the servers yet, with deleted superuser it will be possible only after
        # quorum is reached.
        await asyncio.gather(*[manager.server_start(server.server_id, connect_driver=False) for server in servers])

        logging.info("Waiting for CQL connection")
        await repeat_until_success(lambda: manager.driver_connect(auth_provider=PlainTextAuthProvider(username="cassandra", password="cassandra")))
>       await manager.get_ready_cql(servers)

test/cluster/auth_cluster/test_auth_after_reset.py:50:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:137: in get_ready_cql
    await self.servers_see_each_other(servers)
test/pylib/manager_client.py:819: in servers_see_each_other
    await asyncio.gather(*others)
test/pylib/manager_client.py:805: in server_sees_others
    await wait_for(_sees_min_others, time() + interval, period=.5)
test/pylib/util.py:71: in wait_for
    res = await pred()
test/pylib/manager_client.py:802: in _sees_min_others
    alive_nodes = await self.api.get_alive_endpoints(server_ip)
test/pylib/rest_client.py:243: in get_alive_endpoints
    data = await self.client.get_json(f"/gossiper/endpoint/live", host=node_ip)
test/pylib/rest_client.py:99: in get_json
    ret = await self._fetch("GET", resource_uri, response_type = "json", host = host,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <test.pylib.rest_client.TCPRESTClient object at 0x7fb2404a0650>
method = 'GET', resource = '/gossiper/endpoint/live', response_type = 'json'
host = '127.15.252.8', port = 10000, params = None, json = None, timeout = None
allow_failed = False

    async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
                     host: Optional[str] = None, port: Optional[int] = None,
                     params: Optional[Mapping[str, str]] = None,
                     json: Optional[Mapping] = None, timeout: Optional[float] = None, allow_failed: bool = False) -> Any:
        # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
        assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
        assert response_type is None or response_type in ["text", "json"], \
                f"Invalid response type requested {response_type} (expected 'text' or 'json')"
        # Build the URI
        port = port if port else self.default_port if hasattr(self, "default_port") else None
        port_str = f":{port}" if port else ""
        assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
                "{method} {resource}"
        host_str = host if host is not None else self.default_host
        uri = self.uri_scheme + "://" + host_str + port_str + resource
        logging.debug(f"RESTClient fetching {method} {uri}")

        client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
        async with request(method, uri,
                           connector = self.connector if hasattr(self, "connector") else None,
                           params = params, json = json, timeout = client_timeout) as resp:
            if allow_failed:
                return await resp.json()
            if resp.status != 200:
                text = await resp.text()
>               raise HTTPError(uri, resp.status, params, json, text)
E               test.pylib.rest_client.HTTPError: HTTP error 404, uri: http://127.15.252.8:10000/gossiper/endpoint/live, params: None, json: None, body:
E               {"message": "Not found", "code": 404}

test/pylib/rest_client.py:77: HTTPError
```

Fixes: SCYLLADB-1367

Closes scylladb/scylladb#29323

(cherry picked from commit 74542be5aa)

Closes scylladb/scylladb#29338
2026-04-30 08:53:05 +03:00
Wojciech Mitros
f318968cfe view: apply existing range tombstones after exhausting the update reader
When view_update_builder::on_results() hits the path where the update
fragment reader is already exhausted, it still needs to keep tracking
existing range tombstones and apply them to encountered rows.
Otherwise a row covered by an existing range tombstone can appear
alive while generating the view update and create a spurious view row.

Update the existing tombstone state even on the exhausted-reader path
and apply the effective tombstone to clustering rows before generating
the row tombstone update. Add a cqlpy regression test covering the
partition-delete-after-range-tombstone case.

Fixes: SCYLLADB-1649

Closes scylladb/scylladb#29481

(cherry picked from commit 073710a661)

Closes scylladb/scylladb#29649
2026-04-30 08:51:55 +03:00
Roy Dahan
b7974b9b09 ci: pin GitHub Actions to commit SHAs and migrate to Node.js 24
Pin all external GitHub Actions to full commit SHAs and upgrade to
their latest major versions to reduce supply chain attack surface:

- actions/checkout: v3/v4/v5 -> v6.0.2
- actions/github-script: v7 -> v8.0.0
- actions/setup-python: v5 -> v6.2.0
- actions/upload-artifact: v4 -> v7.0.0
- astral-sh/setup-uv: v6 -> v8.0.0
- mheap/github-action-required-labels: v5.5.2 (pinned)
- redhat-plumbers-in-action/differential-shellcheck: v5.5.6 (pinned)
- codespell-project/actions-codespell: v2.2 (pinned, was @master)

Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true in all 21 workflows that
use JavaScript-based actions to opt into the Node.js 24 runtime now.
This resolves the deprecation warning:

  "Node.js 20 actions are deprecated. Please check if updated versions
   of these actions are available that support Node.js 24. Actions will
   be forced to run with Node.js 24 by default starting June 2nd,
   2026. Node.js 20 will be removed from the runner on September 16th,
   2026."

See: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

scylladb/github-automation references are intentionally left at @main
as they are org-internal reusable workflows.

Fixes: SCYLLADB-1410

Backport: Backport is required for live branches that run GH actions:
2026.1, 2025.4, 2025.1 and 2024.1

Closes scylladb/scylladb#29525

(cherry picked from commit d2d7604188)

Closes scylladb/scylladb#29525
2026-04-30 08:50:25 +03:00
Marcin Maliszkiewicz
9d95e0e6ba Merge 'storage_service: fix REST API races during shutdown and cross-shard forwarding' from Piotr Smaron
REST route removal unregisters handlers but does not wait for requests
that already entered storage_service.  A request can therefore suspend
inside an async operation, restart proceeds to tear the service down,
and the coroutine later resumes against destroyed members such as
_topology_state_machine, _group0, or _sys_ks — a use-after-destruction
bug that surfaces as UBSAN dynamic-type failures (e.g. the crash seen
from topology_state_load()).

Fix this by holding storage_service::_async_gate from the entry
boundary of every externally-triggered async operation so that stop()
drains them before teardown begins.  The gate is acquired in
run_with_api_lock, run_with_no_api_lock, and in individual REST
handlers that bypass those wrappers (reload_raft_topology_state,
mark_excluded, removenode, schema reload, topology-request
waits/abort, cleanup, ring/schema queries, SSTable dictionary
training/publish, and sampling).

Additionally, fix get_ownership() and abort_topology_request() which
forward work to shard 0 but were still referencing the caller-shard's
`this` pointer instead of the destination-shard instance, causing
silent cross-shard access to shard-local state.
Add a cluster regression test that repeatedly exercises the multi-shard
ownership REST path to cover the forwarding fix.

Fixes: SCYLLADB-1415

Should be backported to all branches, the code has been introduced around 2024.1 release.

Closes scylladb/scylladb#29373

* github.com:scylladb/scylladb:
  storage_service: fix shard-0 forwarding in REST helpers
  storage_service: gate REST-facing async operations during shutdown
  storage_service: prepare for async gate in REST handlers

(cherry picked from commit 4043d95810)

Closes scylladb/scylladb#29611
2026-04-27 13:57:32 +02:00
Avi Kivity
ea78ab34d7 Merge '[Backport 2026.1] test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL' from Scylladb[bot]
Tracing events are written to system_traces.events with CL=ANY, so they are only guaranteed to be present on the local node of the query coordinator. Reading them back with the driver default (CL=LOCAL_ONE) may route the query to a replica that has not yet received all events, causing the assertion on 'digest mismatch, starting read repair' to fail intermittently.

Fix execute_with_tracing() to read tracing via the ResponseFuture API with query_cl=ConsistencyLevel.ALL, so events from all replicas are merged before the caller inspects them.

Fixes: SCYLLADB-1707

Backport: fixing flaky test, test failure only seen on master so far so no backport

- (cherry picked from commit b49cf6247f)

Parent PR: #29566

Closes scylladb/scylladb#29631

* github.com:scylladb/scylladb:
  test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL
  replica/database: consolidate the two database_apply error injections
2026-04-25 20:47:46 +03:00
Pawel Pery
a8c7c9d561 vector-store: fix creating local vector search indexes with a part of the partition key
Users ought to have possibility to create the local index for Vector Search
based only on a part of the partition key. This commits provides this by
removing requirements of 'full partition key only' for custom local index.

The commit updates docs to explain that local vector index can use only a part
of the partition key.

The commit implements cqlpy test to check fixed functionality.

Fixes: SCYLLADB-953

Needs to be backported to 2026.1 as it is a fix for local vector indexes.

Closes scylladb/scylladb#28931

(cherry picked from commit 7883f161bb)

Closes scylladb/scylladb#29543
2026-04-25 20:47:18 +03:00
Calle Wilund
739a0f9047 Update position in dma_read(iovec) in create_file_for_seekable_source
Fixes: SCYLLADB-1706

The returned file object does not increment file pos as is. One line fix.
Added test to make sure this read path works as expected.

Closes scylladb/scylladb#29456

(cherry picked from commit c97ce32f47)

Closes scylladb/scylladb#29630
2026-04-25 16:40:32 +03:00
Botond Dénes
9a36e7f362 test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL
Tracing events are written to system_traces.events with CL=ANY, so they
are only guaranteed to be present on the local node of the query
coordinator. Reading them back with the driver default (CL=LOCAL_ONE)
may route the query to a replica that has not yet received all events,
causing the assertion on 'digest mismatch, starting read repair' to fail
intermittently.

Fix execute_with_tracing() to read tracing via the ResponseFuture API
with query_cl=ConsistencyLevel.ALL, so events from all replicas are
merged before the caller inspects them.

Fixes: SCYLLADB-1707

Closes scylladb/scylladb#29566

(cherry picked from commit b49cf6247f)
2026-04-24 18:31:11 +03:00
Botond Dénes
4ce17d20df replica/database: consolidate the two database_apply error injections
Into a single database_apply one. Add three parameters:
* ks_name and cf_name to filter the tables to be affected
* what - what to do: throw or wait

This leads to smaller footprint in the code and improved filtering for
table names at the cost of some extra error injection params in the
tests.

(cherry picked from commit f375aae257)
2026-04-24 18:31:11 +03:00
Dario Mirovic
afbc00bb87 test: use DROP KEYSPACE IF EXISTS in new_test_keyspace cleanup
The new_test_keyspace context manager in test/cluster/util.py uses
DROP KEYSPACE without IF EXISTS during cleanup. The Python driver
has a known bug (scylladb/python-driver#317) where connection pool
renewal after concurrent node bootstraps causes double statement
execution. The DROP succeeds server-side, but the response is lost
when the old pool is closed. The driver retries on the new pool, and
gets ConfigurationException message "Cannot drop non existing keyspace".

The CREATE KEYSPACE in create_new_test_keyspace already uses IF NOT
EXISTS as a workaround for the same driver bug. This patch applies
the same approach to fix DROP KEYSPACE.

Fixes SCYLLADB-1538

Closes scylladb/scylladb#29487

(cherry picked from commit 40740104ab)

Closes scylladb/scylladb#29568
2026-04-24 17:58:41 +03:00
Wojciech Mitros
c89d5d5651 db/view: track range tombstones in update stream during view update building
The view update builder ignored range tombstone changes from the update
stream when there all existing mutation fragments were already consumed.
The old code assumed range tombstones 'remove nothing pre-existing, so
we can ignore it', but this failed to update _update_current_tombstone.
Consequently, when a range delete and an insert within that range appeared
in the same batch, the range tombstone was not applied to the inserted row,
or was applied to a row outside the range that it covered causing it to
incorrectly survive/be deleted in the materialized view.

Fix by handling is_range_tombstone_change() fragments in the update-only
branch, updating _update_current_tombstone so subsequent clustering rows
correctly have the range tombstone applied to them.

Fixes SCYLLADB-1555

Closes scylladb/scylladb#29483

(cherry picked from commit 6011cb8a4c)

Closes scylladb/scylladb#29569
2026-04-24 17:57:59 +03:00
Avi Kivity
c758e3d97a test: bump multishard_query_test querier_cache TTL to 60s to avoid flake
Three test cases in multishard_query_test.cc set the querier_cache entry
TTL to 2s and then assert, between pages of a stateful paged query, that
cached queriers are still present (population >= 1) and that
time_based_evictions stays 0.

The 2s TTL is not load-bearing for what these tests exercise — they are
checking the paging-cache handoff, not TTL semantics. But on busy CI
runners (SCYLLADB-1642 was observed on aarch64 release), scheduling
jitter between saving a reader and sampling the population can exceed
2s. When that happens, the TTL fires, both saved queriers are
time-evicted, population drops to 0, and the assertion
`require_greater_equal(saved_readers, 1u)` fails. The trailing
`require_equal(time_based_evictions, 0)` check never runs because the
earlier assertion has already aborted the iteration — which is why the
Jenkins failure surfaces only as a bare "C++ failure at seastar_test.cc:93".

Reproduced deterministically in test_read_with_partition_row_limits by
injecting a `seastar::sleep(2500ms)` between the save and the sample:
the hook then reports
  population=0 inserts=2 drops=0 time_based_evictions=2 resource_based_evictions=0
and the assertion fires — matching the Jenkins symptoms exactly.

Bump the TTL to 60s in all three affected tests:

  - test_read_with_partition_row_limits (confirmed repro for SCYLLADB-1642)
  - test_read_all                       (same pattern, same invariants — suspect)
  - test_read_all_multi_range           (same pattern, same invariants — suspect)

Leave test_abandoned_read (1s TTL, actually tests TTL-driven eviction)
and test_evict_a_shard_reader_on_each_page (tests manual eviction via
evict_one(); its TTL is not load-bearing but the fix is deferred for a
separate review) unchanged.

Fixes: SCYLLADB-1683

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Closes scylladb/scylladb#29564

(cherry picked from commit f5eb99f149)

Closes scylladb/scylladb#29607
2026-04-24 17:57:11 +03:00
Michael Litvak
4e80e3e224 test/mv/test_mv_staging: wait for cql after restart
Wait for cql on all hosts after restarting a server in the test.

The problem that was observed is that the test restarts servers[1] and
doesn't wait for the cql to be ready on it. On test teardown it drops
the keyspace, trying to execute it on the host that is not ready, and
fails.

Fixes SCYLLADB-1632

Closes scylladb/scylladb#29562

(cherry picked from commit 3468e8de8b)

Closes scylladb/scylladb#29624
2026-04-24 17:56:35 +03:00
Wojciech Mitros
593a6da09c test/cluster: fix flaky test_hints_consistency_during_replace
The test creates a sync point immediately after writing 100 rows
with CL=ANY, without waiting for pending hint writes to complete.

store_hint() is fire-and-forget: it submits do_store_hint() to a gate
and returns immediately. do_store_hint() updates _last_written_rp only
after writing to the commitlog. If create_sync_point() is called before
all do_store_hint() coroutines complete, the captured replay position
is stale, and await_sync_point() returns DONE before all hints are
replayed, leaving some rows missing.

Fix by waiting for the size_of_hints_in_progress metric to reach zero
before creating the sync point, ensuring all in-flight hint writes have
completed and _last_written_rp is up to date. This follows the same
pattern already used in test_sync_point.

Fixes: SCYLLADB-1709

Closes scylladb/scylladb#29623

(cherry picked from commit 7634d3f7d4)

Closes scylladb/scylladb#29632
2026-04-24 17:55:33 +03:00
Botond Dénes
60d5b3959d test/cluster/test_incremental_repair: fix flaky do_tablet_incremental_repair_and_ops
The log grep in get_sst_status searched from the beginning of the log
(no from_mark), so the second-repair assertions were checking cumulative
counts across both repairs rather than counts for the second repair alone.

The expected values (sst_add==2, sst_mark==2) relied on this cumulative
behaviour: 1 from the first repair + 1 from the second = 2. This works
when the second repair encounters exactly one unrepaired sstable, but
fails whenever the second repair sees two.

The second repair can see two unrepaired sstables when the 100 keys
inserted before it (via asyncio.gather) trigger a background auto-flush
before take_storage_snapshot runs. take_storage_snapshot always flushes
the memtable itself, so if an auto-flush already split the batch into two
sstables on disk, the second repair's snapshot contains both and logs
"Added sst" twice, making the cumulative count 3 instead of 2.

Fix: take a log mark per-server before each repair call and pass it to
get_sst_status so each check counts only the entries produced by that
repair. The expected values become 1/0/1 and 1/1/1 respectively,
independent of how many sstables happened to exist beforehand.

get_sst_status gains an optional from_mark parameter (default None)
which preserves existing call sites that intentionally grep from the
start of the log.

Fixes: SCYLLADB-1711

Closes scylladb/scylladb#29484

(cherry picked from commit d280517e27)

Closes scylladb/scylladb#29633
2026-04-24 17:54:34 +03:00
Benny Halevy
cdbf53e9d7 compaction_manager: fix use-after-free in postponed_compactions_reevaluation()
drain() signals the postponed_reevaluation condition variable to terminate
the postponed_compactions_reevaluation() coroutine but does not await its
completion. When enable() is called afterwards, it overwrites
_waiting_reevalution with a new coroutine, orphaning the old one. During
shutdown, really_do_stop() only awaits the latest coroutine via
_waiting_reevalution, leaving the orphaned coroutine still alive. After
sharded::stop() destroys the compaction_manager, the orphaned coroutine
resumes and reads freed memory (is_disabled() accesses _state).

Fix by introducing stop_postponed_compactions(), awaiting the reevaluation coroutine in
both drain() and stop() after signaling it, if postponed_compactions_reevaluation() is running.
It uses an std::optional<future<>> for _waiting_reevalution and std::exchange to leave
_waiting_reevalution disengaged when postponed_compactions_reevaluation() is not running.
This prevents a race between drain() and stop().

While at it, fix typo in _waiting_reevalution -> _waiting_reevaluation.

Fixes: SCYLLADB-1463
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#29443

(cherry picked from commit 05a00fe140)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#29527
2026-04-23 17:04:54 +02:00
Marcin Maliszkiewicz
270ae28c00 Merge '[Backport 2026.1] cql3: prepare list statments metadta_id during prepare statement , send the correct metadata_id directly to the client' from Scylladb[bot]
This series makes result metadata handling for auth LIST statements consistent and adds coverage for the driver-visible behavior.

The first patch makes the result-column metadata construction shared across the affected statements, so the metadata shape used for PREPARE and EXECUTE stays uniform and easier to reason about.

The second patch adds regression coverage for both sides of the metadata-id flow:

- a Python auth-cluster test verifies that prepared LIST ROLES OF returns a non-empty result metadata id and that a later EXECUTE reuses it without METADATA_CHANGED
- a Boost transport test covers the recovery path where the client sends an empty request metadata id and the server responds with METADATA_CHANGED and the full metadata

Together these patches tighten the implementation and protect the prepared-metadata-id behavior exposed to drivers.

Fixes: SCYLLADB-1543

backport: this change should be backported to all active branches to help the driver operation

- (cherry picked from commit de19714763)

Parent PR: #29347

Closes scylladb/scylladb#29477

* github.com:scylladb/scylladb:
  test/cluster: cover prepared LIST metadata ids in one setup Precompute the expected metadata-id hashes for the prepared LIST auth and service-level statements and verify that PREPARE returns them while EXECUTE reuses the prepared metadata without METADATA_CHANGED. Run all cases in a single auth-cluster test after preparing the cluster, role, and service level once through the regular manager fixture.
  cql: expose stable result metadata for prepared LIST statements Prepared LIST statements were not calculating metadata in PREPARE path, and sent empty string hash to client causing problematic behaviour where metadat_id was not recalculated correctly. This patch moves metadata construction into get_result_metadata() for the affected LIST statements and reuse that metadata when building the result set. This gives PREPARE a stable metadata id for LIST ROLES, LIST USERS, LIST PERMISSIONS and the service-level variants. This patch also adds a new boost test that verifies that when an EXECUTE request carries an empty result metadata id while the server has a real metadata id for the result set, the response is marked METADATA_CHANGED and includes the full result metadata plus the server metadata id. This covers the recovery path for clients that send an empty or otherwise unusable metadata id instead of a matching cached one.
2026-04-22 14:06:33 +02:00
Tomasz Grabiec
819969a66a dtest/alternator: stop concurrent-requests test when workers hit limit
`test_limit_concurrent_requests` could create far more tables than intended
because worker threads looped indefinitely and only the probe path terminated
the test. In practice, workers often hit `RequestLimitExceeded` first, but the
test kept running and creating tables, increasing memory pressure and causing
flakiness due to bad_alloc errors in logs.

Fix by replacing the old probe-driven termination with worker-driven
termination. Workers now run until any worker sees
`RequestLimitExceeded`.

Fixes SCYLLADB-1181

Closes scylladb/scylladb#29270

(cherry picked from commit b355bb70c2)

Closes scylladb/scylladb#29292
2026-04-21 19:31:22 +02:00
Ernest Zaslavsky
c5a473bf19 sstables_loader: prevent use-after-free on table drop during streaming
sstables_loader::load_and_stream holds a replica::table& reference via
the sstable_streamer for the entire streaming operation.  If the table
is dropped concurrently (e.g. DROP TABLE or DROP KEYSPACE), the
reference becomes dangling and the next access crashes with SEGV.

This was observed in a longevity-50gb-12h-master test run where a
keyspace was dropped while load_and_stream was still streaming SSTables
from a previous batch.

Fix by acquiring a stream_in_progress() phaser guard in load_and_stream
before creating the streamer.  table::stop() calls
_pending_streams_phaser.close() which blocks until all outstanding
guards are released, keeping the table alive for the duration of the
streaming operation.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1352

Closes scylladb/scylladb#29403

(cherry picked from commit e5e6608f20)

Closes scylladb/scylladb#29558
2026-04-21 15:11:49 +02:00
Benny Halevy
144fdc6c9f test/cluster/dtest: fix test_scrub_static_table flakiness
Pass jvm_args=["--smp", "1"] on both cluster.start() calls to
ensure consistent shard count across restarts, avoiding resharding
on restart. Also pass wait_for_binary_proto=True to cluster.start()
to ensure the CQL port is ready before connecting.

Fixes: SCYLLADB-824

Closes scylladb/scylladb#29548

(cherry picked from commit 34adb0e069)

Closes scylladb/scylladb#29557
2026-04-21 12:54:12 +02:00
Patryk Jędrzejczak
53984ce293 Merge '[Backport 2026.1] raft: Await instead of returning future in wait_for_state_change' from Scylladb[bot]
The `try-catch` expression is pretty much useless in its current form. If we return the future, the awaiting will only be performed by the caller, completely circumventing the exception handling.

As a result, instead of handling `raft::request_aborted` with a proper error message, the user will face `seastar::abort_requested_exception` whose message is cryptic at best. It doesn't even point to the root of the problem.

Fixes SCYLLADB-665

Backport: This is a small improvement and may help when debugging, so let's backport it to all supported versions.

- (cherry picked from commit c36623baad)

- (cherry picked from commit e4f2b62019)

- (cherry picked from commit fae71f79c2)

Parent PR: #28624

Closes scylladb/scylladb#28753

* https://github.com/scylladb/scylladb:
  test: raft: Add test_aborting_wait_for_state_change
  raft: Describe exception types for wait_for_state_change and wait_for_leader
  raft: Await instead of returning future in wait_for_state_change
2026-04-21 12:50:41 +02:00