Commit Graph

51994 Commits

Author SHA1 Message Date
Jenkins Promoter
c19c5d96bf Update pgo profiles - aarch64 2026-05-15 05:03:33 +03:00
Botond Dénes
57ff84f9e7 schema: fix DESCRIBE showing NullCompactionStrategy when compaction is disabled
When a table's compaction is disabled via 'enabled': 'false', the DESCRIBE
output incorrectly showed NullCompactionStrategy instead of the actual strategy.
This happened because schema_properties() called compaction_strategy(), which
returns compaction_strategy_type::null when compaction is disabled. Fix it by
using configured_compaction_strategy(), which always returns the real strategy
type - consistent with how schema_tables.cc serializes it to disk.

Fixes SCYLLADB-1353

Closes scylladb/scylladb#29804

(cherry picked from commit 8d6f031a4a)

Closes scylladb/scylladb#29867

Closes scylladb/scylladb#29886
2026-05-14 17:16:24 +02:00
Wojciech Mitros
d8a9fdddbd test: run test_mv_admission_control_exception on one shard
In the test we perform 2 consecutive writes where the first write
is supposed to increase the view update backlog above the mv
admission control threshold and the second one is expected to be
rejected because of that.

On each node/shard we have 2 types of view update backlogs:
 1. for deciding whether we should admit writes
 2. for propagating the backlog information to other nodes/shards.

For the second write to be rejected, it must be performed on a node
and shard which updated its backlog of type 1.

The view update backlog of type 2. is immediately increased on the
base table replica. For this backlog to be registered as a backlog
of type 1., it needs to be either carried by gossip (happening once
every second) or by attaching it to a replica write response. We
don't want to increase the runtime of tests unnecessarily, so we don't
wait and we rely on the second mechanism. The response to the first
base table write (the one causing increase in the backlog) carries
the increased backlog to the coordinator of this write. So for the
second write to observe the increased backlog, it needs to be coordinated
on the same node+shard as the first write.

We make sure that both writes are coordinated on the same node+shard by
using prepared statements combined with setting the host in `run_async`.
Both writes target the same partition and with prepared statements we
route them directly to the correct shard.

That was the idea, at least. In practice, for the driver to learn the
correct shard, it first needs to learn the token->shard mapping from
the server. For vnodes it can expect a shard by calculating the token
of the affected partition, but for tablets, it had no opportunity to
learn the tablet->shard mapping so the first write may route to any shard.
Additionally, we aren't guaranteed that the driver established connections
to all shards on all nodes at the point of any write. So if a connection
finishes establishing between the two writes, this may also cause us to
coordinate these 2 writes on different shards, leading to a missed view
backlog growth and not-rejected second write.

We fix this in this patch by running the test using one shard on each node.
This way, as long as we perform both writes on the same node, they'll also
be coordinated on the same shard. This also makes the prepared statement and
BoundStatement unnecessary — we can use SimpleStatement with
FallthroughRetryPolicy directly.

Fixes: SCYLLADB-1957

Closes scylladb/scylladb#29862

(cherry picked from commit f3cf20803b)

Closes scylladb/scylladb#29873

Closes scylladb/scylladb#29879
2026-05-14 17:15:36 +02:00
Ferenc Szili
1a7ca53620 test: fix flaky test_tablets_split_merge_with_many_tables
In debug mode, this test can timeout during tablets merge. While the
test already decreases the number of tables in debug mode (20 tables,
instead of 200 for dev mode), this is not enough, and the test can still
timeout during merge. This change reduces the number of tables from 20
to 5 in debug mode.

It also drops the log level for lead_balancer to debug. This should make
any potential future problems with this test easier to investigate.

Fixes: SCYLLADB-1863

Closes scylladb/scylladb#29682

(cherry picked from commit ec4b483e88)

Closes scylladb/scylladb#29786

Closes scylladb/scylladb#29887
2026-05-14 17:14:05 +02:00
Gleb Natapov
de9cbc89bc schema: ensure group0_schema_version is set on boot
Starting from 2024.2, schema versions are assigned by group0 using
raft state IDs instead of being calculated by hashing schema mutations
with MD5. The assigned version is persisted in system.scylla_local as
'group0_schema_version'. When this value is present, it is used as the
schema version; otherwise, the legacy calculate_schema_digest() hash is
used as a fallback.

If a cluster was upgraded from a version before 2024.2 and no schema
change was performed after the upgrade, group0_schema_version will
never have been written, keeping the cluster permanently dependent on
the legacy hashing code path. This prevents us from dropping that code.

Add migration_manager::ensure_group0_schema_version_is_set() which is
called during join_cluster after finish_setup_after_join completes (in
both the raft-topology and non-raft-topology paths). It checks whether
group0_schema_version is set. If not, and the GROUP0_SCHEMA_VERSIONING
feature is enabled, it performs a no-op schema change through group0
(announce with an empty mutation set). The announce() function
automatically appends the group0_schema_version mutation, so this is
sufficient to persist the version on all nodes via raft.

A double-check is performed: once before acquiring the group0 guard
(to avoid unnecessary work), and once after (since start_group0_operation
includes a raft barrier, ensuring any concurrently committed changes
from other nodes are applied locally). This handles the case where
multiple nodes restart simultaneously and race to set the version.

Closes scylladb/scylladb#29858
2026-05-14 15:59:07 +02:00
Botond Dénes
cb26a837ba Merge '[Backport 2026.1] Rare "Vector Store request was aborted"' from Karol Nowacki
To ensure the high-availability logic fits within the default 10s CQL timeout, the default unreachable node detection time has been decreased from 5s to 3s. Consequently, this value has been decoupled from the
read_request_timeout and dedicated configuration options have been introduced: vector_store_unreachable_node_detection_time_in_ms.

Fixes SCYLLADB-1855

Backports to 2025.4 and 2026.1 are required, as spurious CQL timeouts in high-availability scenarios are affecting
these releases in vector search XCloud deployments.

- (cherry picked from commit 9269ca9cf7)
- (cherry picked from commit c643f321af)

Parent PR: https://github.com/scylladb/scylladb/pull/28675

Closes scylladb/scylladb#29773

* github.com:scylladb/scylladb:
  vector_search: decrease default connection timeout to 3s
  vector_search: add unreachable node detection time config
2026-05-14 14:51:59 +03:00
Karol Nowacki
96a8af14a7 vector_search: decrease default connection timeout to 3s
Decrease the default connection timeout to 3s to better align with the
default CQL query timeout of 10s.

The previous timeout allowed only one failover request in high availability
scenario before hitting the CQL query timeout.
By decreasing the timeout to 3s, we can perform up to three failover requests
within the CQL query timeout, which significantly improves the chances of
successfully completing the query in high availability scenarios.

Fixes: SCYLLADB-95
(cherry picked from commit c643f321af)
2026-05-14 05:41:55 +00:00
Karol Nowacki
66a6541bc5 vector_search: add unreachable node detection time config
Add option `vector_store_unreachable_node_detection_time_in_ms` to
control parameters related to detecting unreachable vector store nodes.
This parameter is used to set the TCP connect timeout, keepalive
parameters, and TCP_USER_TIMEOUT. By configuring these parameters,
we can detect unreachable vector store nodes faster and trigger
failover mechanisms in a timely manner.

(cherry picked from commit 9269ca9cf7)
2026-05-13 16:20:48 +02:00
Raphael S. Carvalho
399e0f8cb7 mutation_compactor: Fix tombstone GC metrics to account for only expired
There are 3 metrics (that goes in every compaction_history entry):
total_tombstone_purge_attempt
total_tombstone_purge_failure_due_to_overlapping_with_memtable
total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable

When a tombstone is not expired (e.g. doesn't satisfy "gc_before" or
grace period), it can be currently accounted as failure due to
overlapping with either memtable or uncompacting sstable.
So those 2 last metrics have noise of *unexpired* tombstones.

What we should do is to only account for expired tombstones in all
those 3 metrics. We lose the info of knowing the amount of tombstones
processed by compaction, now we'll only know about the expired ones.
But those metrics were primarily added for explaining why expired
tombstones cannot be removed.
We could have alternatively added a new field
purge_failure_due_to_being_unexpired or something, but
it requires adding a new field to compaction_history.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-737.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#28669

(cherry picked from commit f33f324f77)
(cherry picked from commit acbdf34e5a)

Closes scylladb/scylladb#28744
2026-05-13 16:18:00 +03:00
Jenkins Promoter
044240dfb2 Update ScyllaDB version to: 2026.1.4 2026-05-13 13:47:15 +03:00
Patryk Jędrzejczak
35063d6d74 Merge 'topology_coordinator: join tablet load stats refresh in stop()' from Andrzej Jackowski
Commit 2b7aa32 (topology_coordinator: Refresh load stats after
table is created or altered) registered topology_coordinator as a
schema change listener and added on_create_column_family which
fire-and-forgets _tablet_load_stats_refresh.trigger(). The
triggered task runs on the gossip scheduling group via
with_scheduling_group and accesses the topology_coordinator via
'this'.

stop() unregisters the listener but does not wait for any
in-flight refresh task. If a notification fires between
_tablet_load_stats_refresh.join() in run() and unregister_listener
in stop(), the scheduled task can outlive the topology_coordinator
and access freed memory after run_topology_coordinator's coroutine
frame is destroyed.

Wait for the refresh to complete in stop() after unregistering the
listener, ensuring no task can fire after destruction.

Fixes SCYLLADB-1728

Backport to 2026.1 and 2026.2, because the issue was introduced in 2b7aa32

Closes scylladb/scylladb#29653

* https://github.com/scylladb/scylladb:
  test: tablet_stats: reproduce shutdown refresh race
  topology_coordinator: join tablet load stats refresh in stop()

(cherry picked from commit d9dd3bfe53)

Closes scylladb/scylladb#29686

Co-authored-by: Andrzej Jackowski <andrzej.jackowski@scylladb.com>

Closes scylladb/scylladb#29811
2026-05-13 09:42:09 +03:00
Botond Dénes
8d67d8185a Merge '[Backport 2026.1] test: fix flaky test_incremental_repair_race_window_promotes_unrepaired_data' from Raphael Raph Carvalho
The test waited for two "Finished tablet repair" log messages on the
coordinator, expecting one per tablet.  But there are two log sources
that emit messages matching this pattern:

  repair module (repair/repair.cc:2329):
    "Finished tablet repair for table=..."
  topology coordinator (topology_coordinator.cc:2083):
    "Finished tablet repair host=..."

When the coordinator is also a repair replica (always the case with
RF=3 and 3 nodes), both messages appear in the coordinator log for the
same tablet within 1ms of each other.  The test consumed both, thinking
both tablets were done, while the second tablet repair was still running.

From the CI failure logs:

  04:08:09.658 Found: repair[...]: Finished tablet repair for table=...
    global_tablet_id=e42fd650-3542-11f1-9756-85403784a622:0
  04:08:09.660 Found: raft_topology - Finished tablet repair host=...
    tablet=e42fd650-3542-11f1-9756-85403784a622:0

Both messages are for tablet :0.  Tablet :1 repair had not finished yet.

The test then wrote keys 20-29 while the second tablet repair was still
in progress.  That repair flushed the memtable (via
prepare_sstables_for_incremental_repair), including keys 20-29 in the
repair scan, and mark_sstable_as_repaired set repaired_at=2 on the
resulting sstable.  This caused the assertion failure on servers[0]:
  "should not have post-repair keys in repaired sstables, got:
   {20, 21, 22, 23, 24, 25, 26, 27, 28, 29}"

Fix by matching "Finished tablet repair host=" which is unique to the
topology coordinator message and avoids the ambiguity.

Also fix an incorrect comment that said being_repaired=null when at that
point in the test being_repaired is still set to the session_id (the
delay_end_repair_update injection prevents end_repair from running).

Fixes: [SCYLLADB-1478](https://scylladb.atlassian.net/browse/SCYLLADB-1478)

Closes https://github.com/scylladb/scylladb/pull/29444

(cherry picked from commit ebdfa10c8f)
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1903

[SCYLLADB-1478]: https://scylladb.atlassian.net/browse/SCYLLADB-1478?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#29832

* github.com:scylladb/scylladb:
  test/cluster/test_incremental_repair: add retry for residual leadership race
  test/cluster/test_incremental_repair: fix flaky coordinator-change scenario
  test: fix flaky test_incremental_repair_race_window_promotes_unrepaired_data
2026-05-13 09:38:01 +03:00
Asias He
8bd0e562be repair: Reject repair requests where start and end tokens are equal
When a user calls the repair API with identical startToken and endToken
values, the code creates a wrapping interval (T, T]. This causes
unwrap() to split it into (-inf, T] and (T, +inf), covering the entire
token ring and triggering a full repair.

Reject such requests early with an error message matching
Cassandra's behavior: "Start and end tokens must be different."

Fixes: CUSTOMER-368

Closes scylladb/scylladb#29821

(cherry picked from commit 0204372156)

Closes scylladb/scylladb#29836

Closes scylladb/scylladb#29863
2026-05-13 09:37:12 +03:00
Anna Stuchlik
356ca32994 doc: update the node size limit
This commit increases the node size limit from 256 to 4096 CPUs
based on be1f566488

Fixes SCYLLADB-1676

Closes scylladb/scylladb#29602

(cherry picked from commit a7b7019f90)

Closes scylladb/scylladb#29846

Closes scylladb/scylladb#29864
2026-05-13 09:36:09 +03:00
copilot-swe-agent[bot]
0e41ef82ac docs: fix typo in materialized views docs - "columns are" instead of "is"
The MV Select Statement description was missing the word "columns" and
used incorrect verb agreement, making the sentence grammatically broken
and ambiguous.

docs/cql/mv.rst: "which of the base table is included" →
"which of the base table columns are included"

Fixes #29662
Closes #29663

Co-authored-by: annastuchlik <37244380+annastuchlik@users.noreply.github.com>
(cherry picked from commit 9e7d67612c)

Closes scylladb/scylladb#29835

Closes scylladb/scylladb#29865
2026-05-13 09:35:00 +03:00
Piotr Dulikowski
2c5ae5cee0 database: add missing co_await on lock in create_local_system_table
The function database::create_local_system_table calls
get_tables_metadata().hold_write_lock(), but does not co_await the
returned future. Effectively, this code does not guarantee mutual
exclusion because it does not wait for the lock to be acquired and does
not guarantee that the lock is held long enough.

Fix this by adding the co_await that was missing.

Found by manual inspection. This code is not known to have caused any
problems so far, but it's clearly wrong - hence the fix.

Fixes: SCYLLADB-1916

Closes scylladb/scylladb#29806

(cherry picked from commit bc482bfdea)

Closes scylladb/scylladb#29815

Closes scylladb/scylladb#29833
2026-05-12 10:16:29 +02:00
Raphael S. Carvalho
c3d8f279dc test/cluster/test_incremental_repair: add retry for residual leadership race
There is a small race window where Raft leadership could transfer back
to servers[1] between the ensure_group0_leader_on() check and the
actual restart.  If this happens, the new coordinator re-initiates
repair and masks the compaction-merge bug.

Extract the core test logic into _do_race_window_promotes_unrepaired_data()
which directly checks get_topology_coordinator() after restart and raises
_LeadershipTransferred if servers[1] became coordinator.  The test
function calls this helper in a retry loop (up to 5 attempts).

Also add detection of residual re-repairs (where the coordinator
re-initiates tablet repair after seeing tablets stuck in the repair
stage following the topology restart) and fresh keyspace creation on
each retry attempt to avoid state contamination.

Refs: SCYLLADB-1478
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1903
(cherry picked from commit 2615d0e8d8)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2026-05-11 09:38:13 -03:00
Avi Kivity
bceb054bbd test/cluster/test_incremental_repair: fix flaky coordinator-change scenario
The test_incremental_repair_race_window_promotes_unrepaired_data test
was flaky because it hardcodes servers[1] as the restart target but did
not ensure servers[1] was NOT the topology coordinator.

When servers[1] happened to be the Raft group0 leader (topology
coordinator), restarting it killed the leader, forced a new election,
and the new coordinator re-initiated tablet repair.  This re-repair
flushes memtables on all replicas via take_storage_snapshot() and marks
the resulting sstables as repaired -- causing post-repair keys to appear
in repaired sstables on servers[0] and servers[2].  The test then hit
the wrong assertion (servers[0]/[2] contaminated).

Fix: before starting the repair, check whether servers[1] is the
topology coordinator.  If so, move leadership to another server via
ensure_group0_leader_on() so that restarting servers[1] only kills a
follower -- which does not trigger an election or coordinator change.

Reproducibility was confirmed by forcing leadership to servers[1] via
ensure_group0_leader_on() and observing deterministic failure with all
three servers showing post-repair keys in repaired sstables (confirming
the re-repair scenario), then verifying the fix passes reliably.

Fixes: SCYLLADB-1478
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1903
(cherry picked from commit 914b70c75b)
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2026-05-11 09:31:48 -03:00
Avi Kivity
cb7ce9b4cb test: fix flaky test_incremental_repair_race_window_promotes_unrepaired_data
The test waited for two "Finished tablet repair" log messages on the
coordinator, expecting one per tablet.  But there are two log sources
that emit messages matching this pattern:

  repair module (repair/repair.cc:2329):
    "Finished tablet repair for table=..."
  topology coordinator (topology_coordinator.cc:2083):
    "Finished tablet repair host=..."

When the coordinator is also a repair replica (always the case with
RF=3 and 3 nodes), both messages appear in the coordinator log for the
same tablet within 1ms of each other.  The test consumed both, thinking
both tablets were done, while the second tablet repair was still running.

From the CI failure logs:

  04:08:09.658 Found: repair[...]: Finished tablet repair for table=...
    global_tablet_id=e42fd650-3542-11f1-9756-85403784a622:0
  04:08:09.660 Found: raft_topology - Finished tablet repair host=...
    tablet=e42fd650-3542-11f1-9756-85403784a622:0

Both messages are for tablet :0.  Tablet :1 repair had not finished yet.

The test then wrote keys 20-29 while the second tablet repair was still
in progress.  That repair flushed the memtable (via
prepare_sstables_for_incremental_repair), including keys 20-29 in the
repair scan, and mark_sstable_as_repaired set repaired_at=2 on the
resulting sstable.  This caused the assertion failure on servers[0]:
  "should not have post-repair keys in repaired sstables, got:
   {20, 21, 22, 23, 24, 25, 26, 27, 28, 29}"

Fix by matching "Finished tablet repair host=" which is unique to the
topology coordinator message and avoids the ambiguity.

Also fix an incorrect comment that said being_repaired=null when at that
point in the test being_repaired is still set to the session_id (the
delay_end_repair_update injection prevents end_repair from running).

Fixes: SCYLLADB-1478

Closes scylladb/scylladb#29444

(cherry picked from commit ebdfa10c8f)
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1903
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2026-05-11 09:31:43 -03:00
Botond Dénes
d33fb5159a sstables/trie: add preemption points in trie_writer
The BTI partition index trie writer flushes all buffered nodes at the
end of each SSTable via complete_until_depth(0), called from
bti_partition_index_writer_impl::finish(). This is a tight synchronous
loop that writes trie nodes through file_writer::write(), which uses a
buffered output_stream: individual writes that fit in the buffer are
plain memcpy operations returning a ready future, so .get() never
yields. As a result the reactor can stall for several milliseconds on
large SSTables.

The entire call chain runs inside seastar::async() (via
sstable::write_components()), so seastar::thread::maybe_yield() is
safe to call here. Add it at the top of both tight loops:
- complete_until_depth(), which iterates over trie depth
- lay_out_children(), which iterates over child branches per node

Fixes SCYLLADB-1885

Closes scylladb/scylladb#29798

(cherry picked from commit d0813769ec)

Closes scylladb/scylladb#29810

Closes scylladb/scylladb#29816
2026-05-11 12:50:56 +03:00
Piotr Dulikowski
02c7f44da4 Merge 'table_helper: fix use-after-free on prepared-statement invalidation' from Marcin Maliszkiewicz
insert() held no local strong ref to the prepared modification_statement
across the suspension in execute(). On a single shard:

1. Fiber A suspends inside _insert_stmt->execute().
2. DROP TABLE / DROP KEYSPACE on the target, or LRU eviction, removes
   the prepared_statements_cache entry, releasing its strong ref.
3. Fiber B re-enters cache_table_info(), sees _prepared_stmt
   (checked_weak_ptr) invalidated, and runs _insert_stmt = nullptr,
   releasing the last strong ref. The modification_statement is freed.
4. Fiber A resumes inside execute() and touches freed *this.

Pin strong ref to _insert_stmt locally before the suspension.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1667

Backport: all supported branches, it's memory corruption bug, long present

Closes scylladb/scylladb#29588

* github.com:scylladb/scylladb:
  test/boost: add dummy case to table_helper_test for non-injection modes
  test/boost: add regression test for table_helper insert() UAF
  utils/error_injection: add waiters() API
  table_helper: fix use-after-free on prepared-statement invalidation

(cherry picked from commit efcc0b6376)

Closes scylladb/scylladb#29747

Closes scylladb/scylladb#29802
2026-05-10 13:58:15 +03:00
Yaniv Michael Kaul
c0bf728edb raft/group0: fix destroy assertion on startup failure
If start_server_for_group0() successfully registers a server in
_raft_gr._servers but a subsequent step (e.g. enable_in_memory_state_machine())
throws, the server is never destroyed because abort_and_drain()/destroy()
check std::get_if<raft::group_id>(&_group0) which was only set after the
entire with_scheduling_group block completed.

Move _group0.emplace<raft::group_id>() inside the lambda, immediately after
start_server_for_group() succeeds, so that cleanup paths can always find
and destroy the registered server.

This fixes the assertion:
  "raft_group_registry - stop(): server for group ... is not destroyed"

which manifests during shutdown after an upgrade where topology_state_load()
fails due to netw::unknown_address.

Backport: Yes, to 2026.1, 2026.2, as it causes a crash on upgrades

Refs: SCYLLADB-1217
Refs: CUSTOMER-340
Refs: CUSTOMER-335
Fixes: SCYLLADB-1809
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes, Opencode/Opus 4.6

Closes scylladb/scylladb#29702

(cherry picked from commit 6179406467)

Closes scylladb/scylladb#29742

Closes scylladb/scylladb#29754
scylla-2026.1.3-candidate-20260510010724 scylla-2026.1.3
2026-05-08 11:24:02 +02:00
Raphael S. Carvalho
0b47672901 repair/replica: Fix race window where post-repair data is wrongly promoted to repaired
During incremental repair, each tablet replica holds three SSTable views:
UNREPAIRED, REPAIRING, and REPAIRED.  The repair lifecycle is:

  1. Replicas snapshot unrepaired SSTables and mark them REPAIRING.
  2. Row-level repair streams missing rows between replicas.
  3. mark_sstable_as_repaired() runs on all replicas, rewriting the
     SSTables with repaired_at = sstables_repaired_at + 1 (e.g. N+1).
  4. The coordinator atomically commits sstables_repaired_at=N+1 and
     the end_repair stage to Raft, then broadcasts
     repair_update_compaction_ctrl which calls clear_being_repaired().

The bug lives in the window between steps 3 and 4.  After step 3, each
replica has on-disk SSTables with repaired_at=N+1, but sstables_repaired_at
in Raft is still N.  The classifier therefore sees:

  is_repaired(N, sst{repaired_at=N+1}) == false
  sst->being_repaired == null   (lost on restart, or not yet set)

and puts them in the UNREPAIRED view.  If a new write arrives and is
flushed (repaired_at=0), STCS minor compaction can fire immediately and
merge the two SSTables.  The output gets repaired_at = max(N+1, 0) = N+1
because compaction preserves the maximum repaired_at of its inputs.

Once step 4 commits sstables_repaired_at=N+1, the compacted output is
classified REPAIRED on the affected replica even though it contains data
that was never part of the repair scan.  Other replicas, which did not
experience this compaction, classify the same rows as UNREPAIRED.  This
divergence is never healed by future repairs because the repaired set is
considered authoritative.  The result is data resurrection: deleted rows
can reappear after the next compaction that merges unrepaired data with the
wrongly-promoted repaired SSTable.

The fix has two layers:

Layer 1 (in-memory, fast path): mark_sstable_as_repaired() now also calls
mark_as_being_repaired(session) on the new SSTables it writes.  This keeps
them in the REPAIRING view from the moment they are created until
repair_update_compaction_ctrl clears the flag after step 4, covering the
race window in the normal (no-restart) case.

Layer 2 (durable, restart-safe): a new is_being_repaired() helper on
tablet_storage_group_manager detects the race window even after a node
restart, when being_repaired has been lost from memory.  It checks:

  sst.repaired_at == sstables_repaired_at + 1
  AND tablet transition kind == tablet_transition_kind::repair

Both conditions survive restarts: repaired_at is on-disk in SSTable
metadata, and the tablet transition is persisted in Raft.  Once the
coordinator commits sstables_repaired_at=N+1 (step 4), is_repaired()
returns true and the SSTable naturally moves to the REPAIRED view.

The classifier in make_repair_sstable_classifier_func() is updated to call
is_being_repaired(sst, sstables_repaired_at) in place of the previous
sst->being_repaired.uuid().is_null() check.

A new test, test_incremental_repair_race_window_promotes_unrepaired_data,
reproduces the bug by:
  - Running repair round 1 to establish sstables_repaired_at=1.
  - Injecting delay_end_repair_update to hold the race window open.
  - Running repair round 2 so all replicas complete mark_sstable_as_repaired
    (repaired_at=2) but the coordinator has not yet committed step 4.
  - Writing post-repair keys to all replicas and flushing servers[1] to
    create an SSTable with repaired_at=0 on disk.
  - Restarting servers[1] so being_repaired is lost from memory.
  - Waiting for autocompaction to merge the two SSTables on servers[1].
  - Asserting that the merged SSTable contains post-repair keys (the bug)
    and that servers[0] and servers[2] do not see those keys as repaired.

NOTE FOR MAINTAINER: Copilot initially only implemented Layer 1 (the
in-memory being_repaired guard), missing the restart scenario entirely.
I pointed out that being_repaired is lost on restart and guided Copilot
to add the durable Layer 2 check.  I also polished the implementation:
moving is_being_repaired into tablet_storage_group_manager so it can
reuse the already-held _tablet_map (avoiding an ERM lookup and try/catch),
passing sstables_repaired_at in from the classifier to avoid re-reading it,
and using compaction_group_for_sstable inside the function rather than
threading a tablet_id parameter through the classifier.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1239.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#29244

(cherry picked from commit 16e387d5f9)
(cherry picked from commit cc3dcc4ba8)

Closes scylladb/scylladb#29411
2026-05-08 11:25:26 +03:00
Patryk Jędrzejczak
b3b333b597 Merge '[Backport 2026.1] Barrier and drain logging' from Scylladb[bot]
Add more logging to barrier and drain rpc to try and pinpoint https://github.com/scylladb/scylladb/issues/26281

Bakport since we want to have it if it happens in the field.

Fixes: SCYLLADB-1837
Refs: #26281

- (cherry picked from commit 11b838e71e)
- (cherry picked from commit e88ce09372)
- (cherry picked from commit 385915c101)
- (cherry picked from commit d2b695aa64)

Parent PR: #29735

Closes scylladb/scylladb#29770

* https://github.com/scylladb/scylladb:
  session, raft_topology: add periodic warnings for hung drain and stale version waits
  session: add info-level logging to drain_closing_sessions
  raft_topology: log sub-step progress in local_topology_barrier
  raft_topology: log read_barrier progress in topology cmd handler
  token_metadata: improve stale versions diagnostics
2026-05-08 10:02:18 +02:00
Łukasz Paszkowski
342a7bfce1 db: fix system.size_estimates to aggregate sstable estimates across all shards
The estimate() function in the size_estimates virtual reader only
considered sstables local to the shard that happened to own the
keyspace's partition key token. Since sstables are distributed across
shards, this caused partition count estimates to be approximately
1/smp_count of the actual value.

This bug has been present since the virtual reader was introduced in
225648780d.

Use db.container().map_reduce0() to aggregate sstable estimates
across all shards. Each shard contributes its local count and
estimated_histogram, which are then merged to produce the correct
total.

Also fix the `test_partitions_estimate_full_overlap` test which becomes
flaky (xpassing ~1% of runs) because autocompaction could merge the
two overlapping sstables before the size estimate was read. Wrap the
test body in nodetool.no_autocompaction_context to prevent this race.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1179
Refs https://github.com/scylladb/scylladb/issues/9083

Closes scylladb/scylladb#29286

(cherry picked from commit 6f364fd3b7)

Closes scylladb/scylladb#29381
2026-05-08 06:38:28 +03:00
Piotr Dulikowski
8a52602ec9 Merge '[Backport 2026.1] vector_search: test: fix flaky test_dns_resolving_repeated' from Scylladb[bot]
The `vector_store_client_test_dns_resolving_repeated` test was intermittently
timing out on CI. The exact root cause is not fully understood, but the
hypothesis is that a single trigger signal can be lost somewhere (not exactly
known where). This is not an issue for the production code because refresh
trigger will be called multiple times whenever all configured nodes will be
unreachable.

Fixes SCYLLADB-1794

Backport to 2026.1 and 2026.2, as the same CI flakiness can occur on these branches.

- (cherry picked from commit e9240587f4)
- (cherry picked from commit 44249c0a75)

Parent PR: #29752

Closes scylladb/scylladb#29796

* github.com:scylladb/scylladb:
  vector_search: test: default timeout in test_dns_resolving_repeated
  vector_search: test: fix flaky test_dns_resolving_repeated
2026-05-08 03:54:43 +02:00
Karol Nowacki
9ed728a5b3 vector_search: test: default timeout in test_dns_resolving_repeated
Replace explicit 1-second timeouts in repeat_until() with the default
STANDARD_WAIT (10s). The 1-second timeout could be too aggressive for
loaded CI environments where lowres_clock granularity (~10ms) combined
with OS scheduling delays and resource contention (-c2 -m2G) could cause
the loop to expire before the DNS refresh task completes its cycle.

This also unifies test timeouts across test cases.

(cherry picked from commit 207de967fb)
2026-05-07 17:04:15 +00:00
Karol Nowacki
b4467fb229 vector_search: test: fix flaky test_dns_resolving_repeated
Move trigger_dns_resolver() inside the repeat_until loop instead of
calling it once before the loop.

The test was intermittently timing out on CI. The exact root cause is not
fully understood, but the hypothesis is that a single trigger signal can
be lost somewhere (not exactly known where). This is not an issue for the
production code because refresh trigger will be called multiple times -
in every query where all configured nodes will be unreachable.

By triggering inside the loop, we ensure the signal is re-sent on
each iteration until the resolver actually performs the refresh and
picks up the new (failing) DNS resolution. This makes the test
resilient to timing-dependent signal loss without changing production
code.

Fixes: SCYLLADB-1794
(cherry picked from commit 4722be1289)
2026-05-07 17:04:15 +00:00
Marcin Maliszkiewicz
83999f7228 Merge 'utils: loading_cache: add insert() that is a no-op when caching is disabled' from Dario Mirovic
When `permissions_validity_in_ms` is set to 0, executing a prepared statement under authentication crashes with:
```
    Assertion `caching_enabled()' failed.
        at utils/loading_cache.hh:319
        in authorized_prepared_statements_cache::insert
```

`loading_cache::get_ptr()` asserts when caching is disabled (expiry == 0), but `authorized_prepared_statements_cache::insert()` was using it purely for its side effect of populating the cache, which is meaningless when caching is off.

Add a new `loading_cache::insert(k, load)` method that is a no-op when caching is disabled and otherwise forwards to `get_ptr()`. Switch `authorized_prepared_statements_cache::insert()` to use it. This
completes the disabled-mode safety contract of the cache for the write side, mirroring the fallback that `get()` already provides for the read side.

Includes a regression test in `test/boost/loading_cache_test.cc` plus a positive test for the new `insert()` overload.

Fixes SCYLLADB-1699

The crash is introduced a long time ago. It is present on all the live versions, from 2025.1 onward. No client tickets, but it should be backported.

Closes scylladb/scylladb#29638

* github.com:scylladb/scylladb:
  test: boost: regression test for loading_cache::insert with caching disabled
  utils: loading_cache: add insert() that is a no-op when caching is disabled

(cherry picked from commit c00fee0316)

Closes scylladb/scylladb#29762

Closes scylladb/scylladb#29782
2026-05-07 10:42:59 +03:00
Marcin Maliszkiewicz
1793814914 Merge '[Backport 2026.1] auth: fix crash on ghost rows in role_permissions' from Andrzej Jackowski
This is manual backport of https://github.com/scylladb/scylladb/pull/29757, because the changes are required on 2026.1 ASAP.

===
The auth cache crashes when it encounters rows in role_permissions that have a live row marker but no permissions column. These “ghost rows” were created by the now-removed auth v2 migration, which used INSERT (creating row markers) instead of UPDATE.

When permissions were later revoked, the row marker remained while the permissions column became null. An empty collection appears as null, since its lifetime is based only on its element's cells.

As a result, when the cache reloads and expects the permissions column to exist, it hits a missing_column exception.

The series removes dead code that was the primary crash site, adds has() guards to the remaining access paths, and includes a test reproducer.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1816

Backport: all supported versions 2026.1, 2025.4, 2025.1

Parent PR: https://github.com/scylladb/scylladb/pull/29757

Closes scylladb/scylladb#29771

* github.com:scylladb/scylladb:
  test: add reproducer for auth cache crash on missing permissions column
  auth: tolerate missing permissions column in authorize()
  auth: add defensive has() guard for role_attributes value column
  auth: remove unused permissions field from cache role_record
2026-05-06 19:21:57 +02:00
Gleb Natapov
269637ffa3 session, raft_topology: add periodic warnings for hung drain and stale version waits
Add periodic warning timers (every 5 minutes) to help diagnose hangs in
barrier_and_drain:

- drain_closing_sessions(): warn if semaphore acquisition or session gate
  close is taking too long, reporting the gate count to show how many
  guards are still alive.
- local_topology_barrier(): warn if stale_versions_in_use() is taking
  too long, reporting the current stale version trackers.
- session::gate_count(): new public accessor for diagnostic purposes.

These warnings help distinguish between the two possible hang points
in barrier_and_drain (stale versions vs session drain) and provide
ongoing visibility into what's blocking progress.

(cherry picked from commit d2b695aa64)
2026-05-06 15:00:44 +03:00
Gleb Natapov
d232b08be6 session: add info-level logging to drain_closing_sessions
drain_closing_sessions() is called as part of the barrier_and_drain
topology command and can block on two things: acquiring the drain
semaphore (if another drain is in progress) and waiting for individual
sessions to close (which blocks until all session guards are released).

Previously, all logging in this function was at debug level, making it
invisible in production logs. When barrier_and_drain hangs, there is no
way to tell whether the function is waiting for the semaphore, waiting
for a specific session to close, or was never called.

Promote logging to info level and add messages at each blocking point:
before/after semaphore acquisition (with count of sessions to drain),
before/after each individual session close (with session id), and at
function completion. This makes it possible to identify the exact
session blocking a topology operation from the node log alone.

(cherry picked from commit 385915c101)
2026-05-06 15:00:44 +03:00
Gleb Natapov
485c2504ad raft_topology: log sub-step progress in local_topology_barrier
When a node processes a barrier_and_drain topology command, it performs
two potentially long-running operations inside local_topology_barrier():
waiting for stale token metadata versions to be released
(stale_versions_in_use) and draining closing sessions
(drain_closing_sessions). Either of these can hang indefinitely -- for
example, stale_versions_in_use blocks until all references to previous
token metadata versions are released, which depends on in-flight
requests completing.

Previously, the only logging was a single 'done' message at the end,
making it impossible to determine which sub-step was blocking when a
barrier_and_drain RPC appeared stuck on a node. In a recent CI failure,
a node never responded to barrier_and_drain during a removenode
operation, and the logs showed the RPC was received but nothing about
what it was waiting on internally.

Add info-level logging before each blocking sub-step, including the
topology version for correlation. This allows diagnosing hangs by
showing whether the node is stuck waiting for stale metadata versions,
stuck draining sessions, or never reached these steps at all.

(cherry picked from commit e88ce09372)
2026-05-06 15:00:44 +03:00
Gleb Natapov
026e870e54 raft_topology: log read_barrier progress in topology cmd handler
When a raft topology command (e.g. barrier_and_drain) is received by a
node, the handler first performs a raft read_barrier to ensure it sees
the latest topology state. This read_barrier can hang indefinitely if
raft cannot achieve quorum, but there was no logging around it, making
it impossible to tell whether the handler was stuck at this step or
somewhere else.

Add info-level logging before and after the read_barrier call in
raft_topology_cmd_handler, including the command type, index, and term.
This allows diagnosing hangs by showing whether the node entered the
read_barrier and whether it completed, narrowing down the root cause
when a topology command RPC appears stuck on the receiver side.

(cherry picked from commit 11b838e71e)
2026-05-06 15:00:44 +03:00
Petr Gusev
dd75f53c5d token_metadata: improve stale versions diagnostics
Before waiting on stale_versions_in_use(), we log the stale versions
the barrier_and_drain handler will wait for, along with the number of
token_metadata references representing each version.
To achieve this, we store a pointer to token_metadata in
version_tracker, traverse the _trackers list, and output all items
with a version smaller than the latest. Since token_metadata
contains the version_tracker instance, it is guaranteed to remain
alive during traversal. To count references, token_metadata now
inherits from enable_lw_shared_from_this.

This helps diagnose tablet migration stalls and allows more
deterministic tests: when a barrier is expected to block, we can
verify that the log contains the expected stale versions rather
than checking that the barrier_and_drain is blocked on
stale_versions_in_use() for a fixed amount of time.

(cherry picked from commit e39f4b399c)
2026-05-06 14:52:11 +03:00
Marcin Maliszkiewicz
2c7d9b49bc test: add reproducer for auth cache crash on missing permissions column
(cherry picked from commit 5c5306c692)
2026-05-06 13:17:06 +02:00
Marcin Maliszkiewicz
82c3509752 auth: tolerate missing permissions column in authorize()
Ghost rows in role_permissions with a live row marker but no permissions
column can occur when permissions created via INSERT (e.g. by the removed
auth v2 migration) are later revoked. The row marker survives the revoke,
leaving a row visible to queries but with permissions=null.

Add a has() guard before accessing the permissions column, matching the
pattern already used in list_all(). Return NONE permissions for such
ghost rows instead of crashing.

(cherry picked from commit df69a5c79b)
2026-05-06 13:17:05 +02:00
Marcin Maliszkiewicz
2306562c01 auth: add defensive has() guard for role_attributes value column
Add a has() check before accessing the value column in role_attributes
to tolerate ghost rows with missing regular columns. In practice this
is unlikely to be a problem since attributes are not typically revoked,
but the guard is added for consistency and defensive programming.

(cherry picked from commit c44625ebdf)
2026-05-06 13:17:04 +02:00
Marcin Maliszkiewicz
3f35b3bc55 auth: remove unused permissions field from cache role_record
The permissions field in role_record was populated by fetch_role() but
never read. Authorization uses cached_permissions instead, which is
loaded via the permission_loader callback. Remove the dead field and
its fetch code.

The removed code also did not check for missing columns before accessing
the permissions set, which could crash on ghost rows left by the removed
auth v2 migration. The migration used INSERT (creating row markers),
and when permissions were later revoked, the row marker survived while
the permissions column became null.

(cherry picked from commit 797bc28aae)
2026-05-06 13:16:52 +02:00
Jenkins Promoter
70fa7453d0 Update ScyllaDB version to: 2026.1.3 2026-05-03 17:38:16 +03:00
Jenkins Promoter
fe9ec0cd5a Update pgo profiles - aarch64 2026-05-01 05:03:53 +03:00
Jenkins Promoter
5f32cbf502 Update pgo profiles - x86_64 2026-05-01 04:21:07 +03:00
Botond Dénes
ecb3f254ad sstables: fix segfault in parse_assert() when message is nullptr
parse_assert() accepts an optional `message` parameter that defaults
to nullptr. When the assertion fails and message is nullptr, it is
implicitly converted to sstring via the sstring(const char*) constructor,
which calls strlen(nullptr) -- undefined behavior that manifests as a
segfault in __strlen_evex.

This turns what should be a graceful malformed_sstable_exception into a
fatal crash. In the case of CUSTOMER-279, a corrupt SSTable triggered
parse_assert() during streaming (in continuous_data_consumer::
fast_forward_to()), causing a crash loop on the affected node.

Fix by guarding the nullptr case with a ternary, passing an empty
sstring() when message is null. on_parse_error() already handles
the empty-message case by substituting "parse_assert() failed".

Fixes: SCYLLADB-1672

Closes scylladb/scylladb#29285

(cherry picked from commit cfebe17592)

Closes scylladb/scylladb#29597
2026-04-30 12:12:27 +03:00
Avi Kivity
af59e9200a build: point seastar submodule at scylla-seastar.git
This allows us to backport seastar commits as the need arises.
2026-04-30 11:51:32 +03:00
Patryk Jędrzejczak
5540a16f1b Merge 'raft: Throw stopped_error if server aborted' from Dawid Mędrek
This PR solves a series of similar problems related to executing
methods on an already aborted `raft::server`. They materialize
in various ways:

* For `add_entry` and `modify_config`, a `raft::not_a_leader` with
  a null ID will be returned IF forwarding is disabled. This wasn't
  a big problem because forwarding has always been enabled for group0,
  but it's something that's nice to fix. It's might be relevant for
  strong consistency that will heavily rely on this code.

* For `wait_for_leader` and `wait_for_state_change`, the calls may
  hang and never resolve. A more detailed scenario is provided in a
  commit message.

For the last two methods, we also extend their descriptions to indicate
the new possible exception type, `raft::stopped_error`. This change is
correct since either we enter the functions and throw the exception
immediately (if the server has already been aborted), or it will be
thrown upon the call to `raft::server::abort`.

We fix both issues. A few reproducer tests have been included to verify
that the calls finish and throw the appropriate errors.

Fixes SCYLLADB-841

Backport: Although the hanging problems haven't been spotted so far
          (at least to the best of my knowledge), it's best to avoid
          running into a problem like that, so let's backport the
          changes to all supported versions. They're small enough.

Closes scylladb/scylladb#28822

* https://github.com/scylladb/scylladb:
  raft: Make methods throw stopped_error if server aborted
  raft: Throw stopped_error if server aborted
  test: raft: Introduce get_default_cluster

(cherry picked from commit bb1a798c2c)

Closes scylladb/scylladb#28903
2026-04-30 08:54:16 +03:00
Tomasz Grabiec
36637e3583 test: pylib: Ignore exceptions in wait_for()
ManagerClient::get_ready_cql() calls server_sees_others(), which waits
for servers to see each other as alive in gossip. If one of the
servers is still early in boot, RESTful API call to
"gossiper/endpoint/live" may fail. It throws an exception, which
currently terminates the wait_for() and propagates up, failing the test.

Fix this by ignoring errors when polling inside wait_for. In case of
timeout, we log the last exception. This should fix the problem not
only in this case, for all uses of wait_for().

Example output:

```
pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140>
deadline = 1775218828.9172852, period = 1.0, before_retry = None
backoff_factor = 1.5, max_period = 1.0, label = None

    async def wait_for(
            pred: Callable[[], Awaitable[Optional[T]]],
            deadline: float,
            period: float = 0.1,
            before_retry: Optional[Callable[[], Any]] = None,
            backoff_factor: float = 1.5,
            max_period: float = 1.0,
            label: Optional[str] = None) -> T:
        tag = label or getattr(pred, '__name__', 'unlabeled')
        start = time.time()
        retries = 0
        last_exception: Exception | None = None
        while True:
            elapsed = time.time() - start
            if time.time() >= deadline:
                timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)"
                if last_exception is not None:
                    timeout_msg += (
                        f"; last exception: {type(last_exception).__name__}: {last_exception}"
                    )
                    raise AssertionError(timeout_msg) from last_exception
                raise AssertionError(timeout_msg)

            try:
>               res = await pred()

test/pylib/util.py:80:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    async def _sees_min_others():
>       raise Exception("asd")
E       Exception: asd

test/pylib/manager_client.py:802: Exception

The above exception was the direct cause of the following exception:

manager = <test.pylib.manager_client.ManagerClient object at 0x7f022af7e7b0>

    @pytest.mark.asyncio
    async def test_auth_after_reset(manager: ManagerClient) -> None:
        servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1")
>       cql, _ = await manager.get_ready_cql(servers)

test/cluster/auth_cluster/test_auth_after_reset.py:33:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:137: in get_ready_cql
    await self.servers_see_each_other(servers)
test/pylib/manager_client.py:820: in servers_see_each_other
    await asyncio.gather(*others)
test/pylib/manager_client.py:806: in server_sees_others
    await wait_for(_sees_min_others, time() + interval, period=.5)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140>
deadline = 1775218828.9172852, period = 1.0, before_retry = None
backoff_factor = 1.5, max_period = 1.0, label = None

    async def wait_for(
            pred: Callable[[], Awaitable[Optional[T]]],
            deadline: float,
            period: float = 0.1,
            before_retry: Optional[Callable[[], Any]] = None,
            backoff_factor: float = 1.5,
            max_period: float = 1.0,
            label: Optional[str] = None) -> T:
        tag = label or getattr(pred, '__name__', 'unlabeled')
        start = time.time()
        retries = 0
        last_exception: Exception | None = None
        while True:
            elapsed = time.time() - start
            if time.time() >= deadline:
                timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)"
                if last_exception is not None:
                    timeout_msg += (
                        f"; last exception: {type(last_exception).__name__}: {last_exception}"
                    )
>                   raise AssertionError(timeout_msg) from last_exception
E                   AssertionError: wait_for(_sees_min_others) timed out after 45.30s (46 retries); last exception: Exception: asd

test/pylib/util.py:76: AssertionError
```

Fixes a failure observed in test_auth_after_reset:

```
manager = <test.pylib.manager_client.ManagerClient object at 0x7fb3740e1630>

    @pytest.mark.asyncio
    async def test_auth_after_reset(manager: ManagerClient) -> None:
        servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1")
        cql, _ = await manager.get_ready_cql(servers)
        await cql.run_async("ALTER ROLE cassandra WITH PASSWORD = 'forgotten_pwd'")

        logging.info("Stopping cluster")
        await asyncio.gather(*[manager.server_stop_gracefully(server.server_id) for server in servers])

        logging.info("Deleting sstables")
        for table in ["roles", "role_members", "role_attributes", "role_permissions"]:
            await asyncio.gather(*[manager.server_wipe_sstables(server.server_id, "system", table) for server in servers])

        logging.info("Starting cluster")
        # Don't try connect to the servers yet, with deleted superuser it will be possible only after
        # quorum is reached.
        await asyncio.gather(*[manager.server_start(server.server_id, connect_driver=False) for server in servers])

        logging.info("Waiting for CQL connection")
        await repeat_until_success(lambda: manager.driver_connect(auth_provider=PlainTextAuthProvider(username="cassandra", password="cassandra")))
>       await manager.get_ready_cql(servers)

test/cluster/auth_cluster/test_auth_after_reset.py:50:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test/pylib/manager_client.py:137: in get_ready_cql
    await self.servers_see_each_other(servers)
test/pylib/manager_client.py:819: in servers_see_each_other
    await asyncio.gather(*others)
test/pylib/manager_client.py:805: in server_sees_others
    await wait_for(_sees_min_others, time() + interval, period=.5)
test/pylib/util.py:71: in wait_for
    res = await pred()
test/pylib/manager_client.py:802: in _sees_min_others
    alive_nodes = await self.api.get_alive_endpoints(server_ip)
test/pylib/rest_client.py:243: in get_alive_endpoints
    data = await self.client.get_json(f"/gossiper/endpoint/live", host=node_ip)
test/pylib/rest_client.py:99: in get_json
    ret = await self._fetch("GET", resource_uri, response_type = "json", host = host,
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <test.pylib.rest_client.TCPRESTClient object at 0x7fb2404a0650>
method = 'GET', resource = '/gossiper/endpoint/live', response_type = 'json'
host = '127.15.252.8', port = 10000, params = None, json = None, timeout = None
allow_failed = False

    async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None,
                     host: Optional[str] = None, port: Optional[int] = None,
                     params: Optional[Mapping[str, str]] = None,
                     json: Optional[Mapping] = None, timeout: Optional[float] = None, allow_failed: bool = False) -> Any:
        # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html
        assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}"
        assert response_type is None or response_type in ["text", "json"], \
                f"Invalid response type requested {response_type} (expected 'text' or 'json')"
        # Build the URI
        port = port if port else self.default_port if hasattr(self, "default_port") else None
        port_str = f":{port}" if port else ""
        assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \
                "{method} {resource}"
        host_str = host if host is not None else self.default_host
        uri = self.uri_scheme + "://" + host_str + port_str + resource
        logging.debug(f"RESTClient fetching {method} {uri}")

        client_timeout = ClientTimeout(total = timeout if timeout is not None else 300)
        async with request(method, uri,
                           connector = self.connector if hasattr(self, "connector") else None,
                           params = params, json = json, timeout = client_timeout) as resp:
            if allow_failed:
                return await resp.json()
            if resp.status != 200:
                text = await resp.text()
>               raise HTTPError(uri, resp.status, params, json, text)
E               test.pylib.rest_client.HTTPError: HTTP error 404, uri: http://127.15.252.8:10000/gossiper/endpoint/live, params: None, json: None, body:
E               {"message": "Not found", "code": 404}

test/pylib/rest_client.py:77: HTTPError
```

Fixes: SCYLLADB-1367

Closes scylladb/scylladb#29323

(cherry picked from commit 74542be5aa)

Closes scylladb/scylladb#29338
2026-04-30 08:53:05 +03:00
Wojciech Mitros
f318968cfe view: apply existing range tombstones after exhausting the update reader
When view_update_builder::on_results() hits the path where the update
fragment reader is already exhausted, it still needs to keep tracking
existing range tombstones and apply them to encountered rows.
Otherwise a row covered by an existing range tombstone can appear
alive while generating the view update and create a spurious view row.

Update the existing tombstone state even on the exhausted-reader path
and apply the effective tombstone to clustering rows before generating
the row tombstone update. Add a cqlpy regression test covering the
partition-delete-after-range-tombstone case.

Fixes: SCYLLADB-1649

Closes scylladb/scylladb#29481

(cherry picked from commit 073710a661)

Closes scylladb/scylladb#29649
2026-04-30 08:51:55 +03:00
Roy Dahan
b7974b9b09 ci: pin GitHub Actions to commit SHAs and migrate to Node.js 24
Pin all external GitHub Actions to full commit SHAs and upgrade to
their latest major versions to reduce supply chain attack surface:

- actions/checkout: v3/v4/v5 -> v6.0.2
- actions/github-script: v7 -> v8.0.0
- actions/setup-python: v5 -> v6.2.0
- actions/upload-artifact: v4 -> v7.0.0
- astral-sh/setup-uv: v6 -> v8.0.0
- mheap/github-action-required-labels: v5.5.2 (pinned)
- redhat-plumbers-in-action/differential-shellcheck: v5.5.6 (pinned)
- codespell-project/actions-codespell: v2.2 (pinned, was @master)

Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true in all 21 workflows that
use JavaScript-based actions to opt into the Node.js 24 runtime now.
This resolves the deprecation warning:

  "Node.js 20 actions are deprecated. Please check if updated versions
   of these actions are available that support Node.js 24. Actions will
   be forced to run with Node.js 24 by default starting June 2nd,
   2026. Node.js 20 will be removed from the runner on September 16th,
   2026."

See: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/

scylladb/github-automation references are intentionally left at @main
as they are org-internal reusable workflows.

Fixes: SCYLLADB-1410

Backport: Backport is required for live branches that run GH actions:
2026.1, 2025.4, 2025.1 and 2024.1

Closes scylladb/scylladb#29525

(cherry picked from commit d2d7604188)

Closes scylladb/scylladb#29525
2026-04-30 08:50:25 +03:00
Marcin Maliszkiewicz
9d95e0e6ba Merge 'storage_service: fix REST API races during shutdown and cross-shard forwarding' from Piotr Smaron
REST route removal unregisters handlers but does not wait for requests
that already entered storage_service.  A request can therefore suspend
inside an async operation, restart proceeds to tear the service down,
and the coroutine later resumes against destroyed members such as
_topology_state_machine, _group0, or _sys_ks — a use-after-destruction
bug that surfaces as UBSAN dynamic-type failures (e.g. the crash seen
from topology_state_load()).

Fix this by holding storage_service::_async_gate from the entry
boundary of every externally-triggered async operation so that stop()
drains them before teardown begins.  The gate is acquired in
run_with_api_lock, run_with_no_api_lock, and in individual REST
handlers that bypass those wrappers (reload_raft_topology_state,
mark_excluded, removenode, schema reload, topology-request
waits/abort, cleanup, ring/schema queries, SSTable dictionary
training/publish, and sampling).

Additionally, fix get_ownership() and abort_topology_request() which
forward work to shard 0 but were still referencing the caller-shard's
`this` pointer instead of the destination-shard instance, causing
silent cross-shard access to shard-local state.
Add a cluster regression test that repeatedly exercises the multi-shard
ownership REST path to cover the forwarding fix.

Fixes: SCYLLADB-1415

Should be backported to all branches, the code has been introduced around 2024.1 release.

Closes scylladb/scylladb#29373

* github.com:scylladb/scylladb:
  storage_service: fix shard-0 forwarding in REST helpers
  storage_service: gate REST-facing async operations during shutdown
  storage_service: prepare for async gate in REST handlers

(cherry picked from commit 4043d95810)

Closes scylladb/scylladb#29611
2026-04-27 13:57:32 +02:00
Avi Kivity
ea78ab34d7 Merge '[Backport 2026.1] test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL' from Scylladb[bot]
Tracing events are written to system_traces.events with CL=ANY, so they are only guaranteed to be present on the local node of the query coordinator. Reading them back with the driver default (CL=LOCAL_ONE) may route the query to a replica that has not yet received all events, causing the assertion on 'digest mismatch, starting read repair' to fail intermittently.

Fix execute_with_tracing() to read tracing via the ResponseFuture API with query_cl=ConsistencyLevel.ALL, so events from all replicas are merged before the caller inspects them.

Fixes: SCYLLADB-1707

Backport: fixing flaky test, test failure only seen on master so far so no backport

- (cherry picked from commit b49cf6247f)

Parent PR: #29566

Closes scylladb/scylladb#29631

* github.com:scylladb/scylladb:
  test: fix flaky test_read_repair_with_trace_logging by reading tracing with CL=ALL
  replica/database: consolidate the two database_apply error injections
2026-04-25 20:47:46 +03:00