Commit Graph

48219 Commits

Author SHA1 Message Date
Jenkins Promoter
019eee28f8 Update pgo profiles - x86_64 2026-01-01 04:06:12 +02:00
Gleb Natapov
0c5780590a raft topology: Notify that a node was removed only once
Raft topology goes over all nodes in a 'left' state and triggers 'remove
node' notification in case id/ip mapping is available (meaning the node
left recently), but the problem is that, since the mapping is not removed
immediately, when multiple nodes are removed in succession a notification
for the same node can be sent several times. Fix that by sending
notification only if the node still exists in the peers table. It will
be removed by the first notification and following notification will not
be sent.

Closes scylladb/scylladb#27743

(cherry picked from commit 4a5292e815)

Closes scylladb/scylladb#27911
2025-12-30 11:24:02 +01:00
Gleb Natapov
207e302bbf topology coordinator: set session id for streaming at the correct time
Commit d3efb3ab6f added streaming session for rebuild, but it set
the session and request submission time. The session should be set when
request starts the execution, so this patch moved it to the correct
place.

Closes scylladb/scylladb#27757

(cherry picked from commit 04976875cc)

Closes scylladb/scylladb#27865
2025-12-28 13:34:01 +02:00
Ferenc Szili
2537b9b818 test: fix flakyness caused by TRUNCATE retries
The test test_truncate_during_topology_change tests TRUNCATE TABLE while
bootstrapping a new node. With tablets enabled TRUNCATE is a global
topology operation which needs to serialize with boostrap.

When TRUNCATE TABLE is issued, it first checks if there is an already
queued truncate for the same table. This can happen if a previous
TRUNCATE operation has timed out, and the client retried. The newly
issued truncate will only join the queued one if it is waiting to be
processed, and will fail immediatelly if the TRUNCATE is already being
processed.

In this test, TRUNCATE will be retried after a timeout (1 minute) due to
the default retry policy, and will be retried up to 3 times, while the
bootstrap is delayed by 2 minutes. This means that the test can validate
the result of a truncate which was started after bootstrap was
completed.

Because of the way truncate joins existing truncate operations, we can
also have the following scenario:
- TRUNCATE times out after one minute because the new node is being
  bootstrapped
- the client retries the TRUNCATE command which also times out after 1m
- the third attempt is received during TRUNCATE being processed which
  fails the test

This patch changes the retry policy of the TRUNCATE operation to
FallthroughRetryPolicy which guarantees that TRUNCATE will not be
retried on timeout. It also increases the timeout of the TRUNCATE from 1
to 4 minutes. This way the test will actually validate the performance
of the TRUNCATE operation which was issued during bootstrap, instead of
the subsequent, retried TRUNCATEs which could have been issued after the
bootstrap was complete.

Fixes: #26347

Closes scylladb/scylladb#27245

(cherry picked from commit d883ff2317)

Closes scylladb/scylladb#27505
2025-12-23 17:08:54 +02:00
Yaron Kaikov
5ff70064e5 auto-backport.py: modify instruction for making PR ready for review
Update the comment sent when PR has conflicts with clear instrauctions how to make the PR Ready for review

Fixes: https://scylladb.atlassian.net/browse/RELENG-152

Closes scylladb/scylladb#27547

(cherry picked from commit d3e199984e)

Closes scylladb/scylladb#27563
2025-12-22 15:17:23 +02:00
Anna Stuchlik
ac4d5a0bea doc: remove the links to the Download Center
This commit removes the remaining links to the Download Center on the website.
We no longer use it for installation, and we don't want users to infer that
something like that still exists.

Fixes https://github.com/scylladb/scylladb/issues/27753

Closes scylladb/scylladb#27756

(cherry picked from commit f65db4e8eb)

Closes scylladb/scylladb#27781
2025-12-21 19:23:26 +02:00
Emil Maskovsky
bfcac7547b test/raft: fix race condition in failure_detector_test
The test had a sporadic failure due to a broken promise exception.
The issue was in `test_pinger::ping()` which captured the promise by
move into the subscription lambda, causing the promise to be destroyed
when the lambda was destroyed during coroutine unwinding.

Simplify `test_pinger::ping()` by replacing manual abort_source/promise
logic with `seastar::sleep_abortable()`.
This removes the risk of promise lifetime/race issues and makes the code
simpler and more robust.

Fixes: scylladb/scylladb#27136

Backport to active branches: This fixes a CI test issue, so it is
beneficial to backport the fix. As this is a test-only fix, it is a low
risk change.

Closes scylladb/scylladb#27737

(cherry picked from commit 2a75b1374e)

Closes scylladb/scylladb#27780
2025-12-21 14:15:09 +02:00
Patryk Jędrzejczak
0a7d71663a Merge '[Backport 2025.2] topology_coordinator: handle seastar::abort_requested_exception alongside raft::request_aborted' from Scylladb[bot]
In several exception handlers, only `raft::request_aborted` was being caught and rethrown, while `seastar::abort_requested_exception` was falling through to the generic catch(...) block. This caused the exception to be incorrectly treated as a failure that triggers rollback, instead of being recognized as an abort signal.

For example, during tablet draining, the error log showed: "tablets draining failed with seastar::abort_requested_exception (abort requested). Aborting the topology operation"

This change adds `seastar::abort_requested_exception` handling alongside `raft::request_aborted` in all places where it was missing. When rethrown, these exceptions propagate up to the main `run()` loop where `handle_topology_coordinator_error()` recognizes them as normal abort signals and allows the coordinator to exit gracefully without triggering unnecessary rollback operations.

Fixes: scylladb/scylladb#27255

No backport: The problem was only seen in tests and not reported in customer tickets, so it's enough to fix it in the main branch.

- (cherry picked from commit 37e3dacf33)

Parent PR: #27314

Closes scylladb/scylladb#27661

* https://github.com/scylladb/scylladb:
  topology_coordinator: handle seastar::abort_requested_exception alongside raft::request_aborted
  topology_coordinator: consistently rethrow `raft::request_aborted` for direct/global commands
2025-12-20 19:33:31 +01:00
Patryk Jędrzejczak
eb4523fd03 Merge '[Backport 2025.2] Make direct failure detector verb handler more efficient' from Scylladb[bot]
We saw that in large clusters direct failure detector may cause large task queues to be accumulated. The series address this issue and also moves the code into the correct scheduling group.

Fixes https://github.com/scylladb/scylladb/issues/27142

Backport to all version where 60f1053087 was backported to since it should improve performance in large clusters.

- (cherry picked from commit 82f80478b8)

- (cherry picked from commit 6a6bbbf1a6)

- (cherry picked from commit 86dde50c0d)

Parent PR: #27387

Closes scylladb/scylladb#27480

* https://github.com/scylladb/scylladb:
  direct_failure_detector: run direct failure detector in the gossiper scheduling group
  raft: drop invoke_on from the pinger verb handler
  direct_failure_detector: pass timeout to direct_fd_ping verb
  idl, message: make with_timeout and cancellable verb attributes composable
2025-12-19 16:48:26 +01:00
Emil Maskovsky
b333141924 topology_coordinator: handle seastar::abort_requested_exception alongside raft::request_aborted
In several exception handlers, only raft::request_aborted was being
caught and rethrown, while seastar::abort_requested_exception was
falling through to the generic catch(...) block. This caused the
exception to be incorrectly treated as a failure that triggers
rollback, instead of being recognized as an abort signal.

For example, during tablet draining, the error log showed:
"tablets draining failed with seastar::abort_requested_exception
(abort requested). Aborting the topology operation"

This change adds seastar::abort_requested_exception handling
alongside raft::request_aborted in all places where it was missing.
When rethrown, these exceptions propagate up to the main run() loop
where handle_topology_coordinator_error() recognizes them as normal
abort signals and allows the coordinator to exit gracefully without
triggering unnecessary rollback operations.

Fixes: scylladb/scylladb#27255

(cherry picked from commit 37e3dacf33)
2025-12-19 16:26:01 +01:00
Michael Litvak
9c0528fc9f view_builder: reduce log level for expected aborts during view creation
When draining the view builder, we abort ongoing operations using the
view builder's abort source, which may cause them to fail with
abort_requested_exception or raft::request_aborted exceptions.

Since these failures are expected during shutdown, reduce the log level
in add_new_view from 'error' to 'debug' for these specific exceptions
while keeping 'error' level for unexpected failures.

Closes scylladb/scylladb#26297

(cherry picked from commit 6bc41926e2)

Closes scylladb/scylladb#27538
2025-12-18 16:24:35 +01:00
Emil Maskovsky
789eb79a6a topology_coordinator: consistently rethrow raft::request_aborted for direct/global commands
Ensure all direct and global topology commands rethrow the
`raft::request_aborted` exception when aborted, typically due to
leadership changes. This makes abortion explicit to callers, enabling
proper handling such as retries or workflow termination.

This change completes the work started in PR scylladb/scylladb#23962,
covering all remaining cases where the exception was not rethrown.

Fixes: scylladb/scylladb#23589

(cherry picked from commit 943af1ef1c)
2025-12-17 16:20:41 +01:00
Jenkins Promoter
cc39e23be4 Update pgo profiles - aarch64 2025-12-15 04:40:18 +02:00
Jenkins Promoter
11b899680a Update pgo profiles - x86_64 2025-12-15 04:03:01 +02:00
Jenkins Promoter
f11c584264 Update ScyllaDB version to: 2025.2.6 2025-12-14 14:30:35 +02:00
Benny Halevy
f06269dfca utils: error_injection: wait_for_message: print injection_name and caller source_location on timeout
When waiting for the condition variable times out
we call on_internal_error, but unfortunately, the backtrace
it generates is obfuscated by
`coroutine_handle<seastar::internal::coroutine_traits_base<void>::promise_type>::resume`.

To make the log more useful, print the error injection name
and the caller's source_location in the timeout error message.

Fixes #27531

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#27532

(cherry picked from commit 5f13880a91)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#27581
2025-12-12 14:20:30 +01:00
Anna Stuchlik
3553ae91c9 replace the Driver pages with a link to the new Drivers pages
This commit removes the now redundant driver pages from
the Scylla DB documentation. Instead, the link to the pages
where we moved the diver information is added.
Also, the links are updated across the ScyllaDB manual.

Redirections are added for all the removed pages.

Fixes https://github.com/scylladb/scylladb/issues/26871

Closes scylladb/scylladb#27277

(cherry picked from commit c5580399a8)

Closes scylladb/scylladb#27438
2025-12-12 10:30:34 +01:00
Yaron Kaikov
4c20e64f4f Add JIRA issue validation to backport PR fixes check
Extend the Fixes validation pattern to also accept JIRA issue references
(format: [A-Z]+-\d+) in addition to GitHub issue references. This allows
backport PRs to reference JIRA issues in the format 'Fixes: PROJECT-123'.

Fixes: https://github.com/scylladb/scylladb/issues/27571

Closes scylladb/scylladb#27572

(cherry picked from commit 3dfa5ebd7f)

Closes scylladb/scylladb#27597
2025-12-12 09:36:32 +02:00
Gleb Natapov
9ca1439b78 direct_failure_detector: run direct failure detector in the gossiper scheduling group
When direct failure detector was introduces the idea was that it will
run on the same connection raft group0 verbs are running, but in
60f1053087 raft verbs were moved to run on the gossiper connection
while DIRECT_FD_PING was left where it was. This patch move it to
gossiper connection as well and fix the pinger code to run in gossiper
scheduling group.

(cherry picked from commit 86dde50c0d)
2025-12-09 16:55:00 +02:00
Gleb Natapov
1ee9d1cca5 raft: drop invoke_on from the pinger verb handler
Currently raft direct pinger verb jumps to shard 0 to check if group0 is
alive before replying. The verb runs relatively often, so it is not very
efficient. The patch distributes group0 liveness information (as it
changes) to all shard instead, so that the handler itself does not need
to jump to shard 0.

(cherry picked from commit 6a6bbbf1a6)
2025-12-09 16:55:00 +02:00
Gleb Natapov
d6ec1ab5b9 direct_failure_detector: pass timeout to direct_fd_ping verb
Currently direct_fd_ping runs without timeout, but the verb is not
waited forever, the wait is canceled after a timeout, this timeout
simply is not passed to the rpc. It may create a situation where the
rpc callback can runs on a destination but it is no longer waited on.
Change the code to pass timeout to rpc as well and return earlier from
the rpc handler if the timeout is reached by the time the callback is
called. This is backwards compatible since timeout is passed as
optional.

(cherry picked from commit 82f80478b8)
2025-12-09 14:53:32 +02:00
Benny Halevy
61dd81539b idl, message: make with_timeout and cancellable verb attributes composable
And define `send_message_timeout_cancellable` in rpc_protocol_impl.hh
using the newly introduced rpc_handler entry point
in seastar that accepts both timeout and cancellable params.

Note that the interface to the user still uses abort_source
while internally the funtion allocates a seastar::rpc::cancellable
object.  It is possible to provide an interface that will accept
a rpc::cancellable& from the caller, but the existing messaging api
uses abort_source.  Changing it may be considered in the future.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 0b97806771)
2025-12-09 14:53:32 +02:00
Tomasz Grabiec
104cad1424 Merge '[Backport 2025.2] address_map: Use more efficient and reliable replication method' from Scylladb[bot]
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updates queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.

This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated, because we update mapping on each change of gossip
states. This made bootstrap impossible because nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.

To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.

It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.

Fixes #26865

- (cherry picked from commit ed8d127457)

- (cherry picked from commit 4a85ea8eb2)

- (cherry picked from commit f83c4ffc68)

Parent PR: #26941

Closes scylladb/scylladb#27187

* github.com:scylladb/scylladb:
  address_map: Use barrier() to wait for replication
  address_map: Use more efficient and reliable replication method
  utils: Introduce helper for replicated data structures
  utils: add "fatal" version of utils::on_internal_error()
scylla-2025.2.5-candidate-20251209122041 scylla-2025.2.5
2025-12-05 20:37:14 +01:00
Tomasz Grabiec
6c942a87c7 address_map: Use barrier() to wait for replication
More efficient than 100 pings.

There was one ping in test which was done "so this shard notices the
clock advance". It's not necessary, since obsering completed SMP
call implies that local shard sees the clock advancement done within in.

(cherry picked from commit f83c4ffc68)
2025-12-05 13:25:14 +01:00
Tomasz Grabiec
db61dfb837 address_map: Use more efficient and reliable replication method
Primary issue with the old method is that each update is a separate
cross-shard call, and all later updated queue behind it. If one of the
shards has high latency for such calls, the queue may accumulate and
system will appear unresponsive for mapping changes on non-zero shards.

This happened in the field when one of the shards was overloaded with
sstables and compaction work, which caused frequent stalls which
delayed polling for ~100ms. A queue of 3k address updates
accumulated. This made bootstrap impossible, since nodes couldn't
learn about the IP mapping for the bootstrapping node and streaming
failed.

To protect against that, use a more efficient method of replication
which requires a single cross-shard call to replicate all prior
updates.

It is also more reliable, if replication fails transiently for some
reason, we don't give up and fail all later updates.

Fixes #26865
Fixes #26835

(cherry picked from commit 4a85ea8eb2)
2025-12-05 13:25:14 +01:00
Tomasz Grabiec
3cc75afbbe utils: Introduce helper for replicated data structures
Key goals:
  - efficient (batching updates)
  - reliable (no lost updates)

Will be used in data structures maintained on one designed owning
shard and replicated to other shards.

(cherry picked from commit ed8d127457)
2025-12-05 13:25:14 +01:00
Nadav Har'El
9d27db5e98 utils: add "fatal" version of utils::on_internal_error()
utils::on_internal_error() is a wrapper for Seastar's on_internal_error()
which does not require a logger parameter - because it always uses one
logger ("on_internal_error"). Not needing a unique logger is especially
important when using on_internal_error() in a header file, where we
can't define a logger.

Seastar also has a another similar function, on_fatal_internal_error(),
for which we forgot to implement a "utils" version (without a logger
parameter). This patch fixes that oversight.

In the next patch, we need to use on_fatal_internal_error() in a header
file, so the "utils" version will be useful. We will need the fatal
version because we will encounter an unexpected situation during server
destruction, and if we let the regular on_internal_error() just throw
an exception, we'll be left in an undefined state.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 33476c7b06)
2025-12-05 13:25:14 +01:00
Avi Kivity
79aa95bf76 database: fix overflow when computing data distribution over shards
We store the per-shard chunk count in a uint64_t vector
global_offset, and then convert the counts to offsets with
a prefix sum:

```c++
        // [1, 2, 3, 0] --> [0, 1, 3, 6]
        std::exclusive_scan(global_offset.begin(), global_offset.end(), global_offset.begin(), 0, std::plus());
```

However, std::exclusive_scan takes the accumulator type from the
initial value, 0, which is an int, instead of from the range being
iterated, which is of uint64_t.

As a result, the prefix sum is computed as a 32-bit integer value. If
it exceeds 0x8000'0000, it becomes negative. It is then extended to
64 bits and stored. The result is a huge 64-bit number. Later on
we try to find an sstable with this chunk and fail, crashing on
an assertion.

An example of the failure can be seen here: https://godbolt.org/z/6M8aEbo57

The fix is simple: the initial value is passed as uint64_t instead of int.

Fixes https://github.com/scylladb/scylladb/issues/27417

Closes scylladb/scylladb#27418

(cherry picked from commit 9696ee64d0)
2025-12-04 20:19:39 +02:00
Jenkins Promoter
284bd9fc01 Update ScyllaDB version to: 2025.2.5 2025-12-04 15:49:32 +02:00
Pavel Emelyanov
60a70389a3 Update seastar submodule (SIGABRT on assertion)
* seastar 450e36d5d...7c86e59c7 (1):
  > util: make SEASTAR_ASSERT() failure generate SIGABRT

Fixes #27127

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#27404
2025-12-04 13:01:07 +03:00
Calle Wilund
7338a42cc1 commitlog::read_log_file: Check for eof position on all data reads
Fixes #24346

When reading, we check for each entry and each chunk, if advancing there
will hit EOF of the segment. However, IFF the last chunk being read has
the last entry _exactly_ matching the chunk size, and the chunk ending
at _exactly_ segment size (preset size, typically 32Mb), we did not check
the position, and instead complained about not being able to read.

This has literally _never_ happened in actual commitlog (that was replayed
at least), but has apparently happened more and more in hints replay.

Fix is simple, just check the file position against size when advancing
said position, i.e. when reading (skipping already does).

v2:

* Added unit test

Closes scylladb/scylladb#27236

(cherry picked from commit 59c87025d1)

Closes scylladb/scylladb#27340
2025-12-03 12:23:06 +03:00
Aleksandra Martyniuk
43963c47e6 replica: database: change type of tables_metadata::_ks_cf_to_uuid
If there is a lot of tables, a node reports oversized allocation
in _ks_cf_to_uuid of type flat_hash_map.

Change the type to std::unordered_map to prevent oversized allocations.

Fixes: https://github.com/scylladb/scylladb/issues/26787.

Closes scylladb/scylladb#27165

(cherry picked from commit 19a7d8e248)

Closes scylladb/scylladb#27195
2025-12-03 12:22:46 +03:00
Ernest Zaslavsky
7671134209 streaming:: add more logging
Start logging all missed streaming options like `scope` and `skip_reshape` flags

Fixes: https://github.com/scylladb/scylladb/issues/27299

Closes scylladb/scylladb#27311

(cherry picked from commit 1d5f60baac)

Closes scylladb/scylladb#27338
2025-12-02 12:17:48 +01:00
Jenkins Promoter
af570f75f1 Update pgo profiles - aarch64 2025-12-01 04:40:02 +02:00
Jenkins Promoter
3e6ef72872 Update pgo profiles - x86_64 2025-11-30 21:06:59 -05:00
Patryk Jędrzejczak
4a28929d40 Merge '[Backport 2025.2] locator/node: include _excluded in missing places' from Scylladb[bot]
We currently ignore the `_excluded` field in `node::clone()` and the verbose
formatter of `locator::node`. The first one is a bug that can have
unpredictable consequences on the system. The second one can be a minor
inconvenience during debugging.

We fix both places in this PR.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-72

This PR is a bugfix that should be backported to all supported branches.

- (cherry picked from commit 4160ae94c1)

- (cherry picked from commit 287c9eea65)

Parent PR: #27265

Closes scylladb/scylladb#27289

* https://github.com/scylladb/scylladb:
  locator/node: include _excluded in verbose formatter
  locator/node: preserve _excluded in clone()
2025-11-27 12:31:06 +01:00
Avi Kivity
af451b7997 Merge '[Backport 2025.2] fix notification about expiring erm held for to long' from Scylladb[bot]
Commit 6e4803a750 broke notification about expired erms held for too long since it resets the tracker without calling its destructor (where notification is triggered). Fix the assign operator to call the destructor like it should.

Fixes https://github.com/scylladb/scylladb/issues/27141

- (cherry picked from commit 9f97c376f1)

- (cherry picked from commit 5dcdaa6f66)

Parent PR: #27140

Closes scylladb/scylladb#27274

* github.com:scylladb/scylladb:
  test: test that expired erm that held for too long triggers notification
  token_metadata: fix notification about expiring erm held for to long
2025-11-27 12:24:56 +02:00
Patryk Jędrzejczak
0b2d303c00 locator/node: include _excluded in verbose formatter
It can be helpful during debugging.

(cherry picked from commit 287c9eea65)
2025-11-26 23:04:12 +00:00
Patryk Jędrzejczak
d8b476e39f locator/node: preserve _excluded in clone()
We currently ignore the `_excluded` field in `clone()`. Losing
information about exclusion can have unpredictable consequences. One
observed effect (that led to finding this issue) is that the
`/storage_service/nodes/excluded` API endpoint sometimes misses excluded
nodes.

(cherry picked from commit 4160ae94c1)
2025-11-26 23:04:12 +00:00
Gleb Natapov
18dcbc1783 test: test that expired erm that held for too long triggers notification
(cherry picked from commit 5dcdaa6f66)
2025-11-26 15:08:07 +00:00
Gleb Natapov
9ff81cabed token_metadata: fix notification about expiring erm held for to long
Commit 6e4803a750 broke notification about expired erms held for too
long since it resets the tracker without calling its destructor (where
notification is triggered). Fix assign operator to call destructor.

(cherry picked from commit 9f97c376f1)
2025-11-26 15:08:07 +00:00
Ernest Zaslavsky
c1d83f47b3 streaming: fix loop break condition in tablet_sstable_streamer::stream
Correct the loop termination logic that previously caused
certain SSTables to be prematurely excluded, resulting in
lost mutations. This change ensures all relevant SSTables
are properly streamed and their mutations preserved.

(cherry picked from commit dedc8bdf71)

Closes scylladb/scylladb#27150
Fixes: #26979

Parent PR: #26980
Unfortunatelly the pytest based test cannot be ported back because of changes made to the testing harness and scylla-tools
2025-11-25 11:58:12 +03:00
Avi Kivity
b619fe2882 tools: toolchain: prepare: replace 'reg' with 'skopeo'
The prepare scripts uses 'reg' to verify we're not going to
overwrite an existing image. The 'reg' command is not
available in Fedora 43. Use 'skopeo' instead. Skopeo
is part of the podman ecosystem so hopefully will live longer.

Fixes #27178.

Closes scylladb/scylladb#27179

(cherry picked from commit d6ef5967ef)

Closes scylladb/scylladb#27196
2025-11-24 19:57:38 +02:00
Raphael S. Carvalho
6ecee4deba replica: Fail timed-out single-key read on cleaned up tablet replica
Consider the following:
1) single-key read starts, blocks on replica e.g. waiting for memory.
2) the same replica is migrated away
3) single-key read expires, coordinator abandons it, releases erm.
4) migration advances to cleanup stage, barrier doesn't wait on
   timed-out read
5) compaction group of the replica is deallocated on cleanup
6) that single-key resumes, but doesn't find sstable set (post cleanup)
7) with abort-on-internal-error turned on, node crashes

It's fine for abandoned (= timed out) reads to fail, since the
coordinator is gone.
For active reads (non timed out), the barrier will wait for them
since their coordinator holds erm.
This solution consists of failing reads which underlying tablet
replica has been cleaned up, by just converting internal error
to plain exception.

Fixes #26229.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#27078

(cherry picked from commit 74ecedfb5c)

Closes scylladb/scylladb#27152
scylla-2025.2.4 scylla-2025.2.4-candidate-20251123021921
2025-11-21 17:46:21 +03:00
Patryk Jędrzejczak
5e237c0d0e test: test_raft_recovery_stuck: ensure mutual visibility before using driver
Not waiting for nodes to see each other as alive can cause the driver to
fail the request sent in `wait_for_upgrade_state()`.

scylladb/scylladb#19771 has already replaced concurrent restarts with
`ManagerClient.rolling_restart()`, but it has missed this single place,
probably because we do concurrent starts here.

Fixes #27055

Closes scylladb/scylladb#27075

(cherry picked from commit e35ba974ce)

Closes scylladb/scylladb#27108
2025-11-20 10:45:08 +02:00
Botond Dénes
ad3d182e90 Merge '[Backport 2025.2] Automatic cleanup improvements' from Scylladb[bot]
This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously.

Fixes https://github.com/scylladb/scylladb/issues/26866

Backport to all supported version since automatic cleanup behaviour  as it is now may create unexpected by the operator load during cluster resizing.

- (cherry picked from commit e872f9cb4e)

- (cherry picked from commit 0f0ab11311)

Parent PR: #26868

Closes scylladb/scylladb#27091

* github.com:scylladb/scylladb:
  cleanup: introduce "nodetool cluster cleanup" command  to run cleanup on all dirty nodes in the cluster
  cleanup: Add RESTful API to allow reset cleanup needed flag
2025-11-20 10:44:35 +02:00
Botond Dénes
a7856c3d52 Merge '[Backport 2025.2] service/qos: Fall back to default scheduling group when using maintenance socket' from Scylladb[bot]
The service level controller relies on `auth::service` to collect
information about roles and the relation between them and the service
levels (those attached to them). Unfortunately, the service level
controller is initialized way earlier than `auth::service` and so we
had to prevent potential invalid queries of user service levels
(cf. 46193f5e79).

Unfortunately, that came at a price: it made the maintenance socket
incompatible with the current implementation of the service level
controller. The maintenance socket starts early, before the
`auth::service` is fully initialized and registered, and is exposed
almost immediately. If the user attempts to connect to Scylla within
this time window, via the maintenance socket, one of the things that
will happen is choosing the right service level for the connection.
Since the `auth::service` is not registered, Scylla with fail an
assertion and crash.

A similar scenario occurs when using maintenance mode. The maintenance
socket is how the user communicates with the database, and we're not
prepared for that either.

To avoid unnecessary crashes, we add new branches if the passed user is
absent or if it corresponds to the anonymous role. Since the role
corresponding to a connection via the maintenance socket is the anonymous
role, that solves the problem.

Some accesses to `auth::service` are not affected and we do not modify
those.

Fixes scylladb/scylladb#26816

Backport: yes. This is a fix of a regression.

- (cherry picked from commit c0f7622d12)

- (cherry picked from commit 222eab45f8)

- (cherry picked from commit 394207fd69)

- (cherry picked from commit b357c8278f)

Parent PR: #26856

Closes scylladb/scylladb#27034

* github.com:scylladb/scylladb:
  test/cluster/test_maintenance_mode.py: Wait for initialization
  test: Disable maintenance mode correctly in test_maintenance_mode.py
  test: Fix keyspace in test_maintenance_mode.py
  service/qos: Do not crash Scylla if auth_integration absent
2025-11-20 10:43:18 +02:00
Gleb Natapov
86d6e759bc cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster
97ab3f6622 changed "nodetool cleanup" (without arguments) to run
cleanup on all dirty nodes in the cluster. This was somewhat unexpected,
so this patch changes it back to run cleanup on the target node only (and
reset "cleanup needed" flag afterwards) and it adds "nodetool cluster
cleanup" command that runs the cleanup on all dirty nodes in the
cluster.

(cherry picked from commit 0f0ab11311)
2025-11-19 10:35:39 +02:00
Gleb Natapov
80d92d68e0 cleanup: Add RESTful API to allow reset cleanup needed flag
Cleaning up a node using per keyspace/table interface does not reset cleanup
needed flag in the topology. The assumption was that running cleanup on
already clean node does nothing and completes quickly. But due to
https://github.com/scylladb/scylladb/issues/12215 (which is closed as
WONTFIX) this is not the case. This patch provides the ability to reset
the flag in the topology if operator cleaned up the node manually
already.

(cherry picked from commit e872f9cb4e)
2025-11-19 10:14:38 +02:00
Avi Kivity
6206b57008 Merge '[Backport 2025.2] Synchronize tablet split and load-and-stream' from Scylladb[bot]
Load-and-stream is broken when running concurrently to the finalization step of tablet split.

Consider this:
1) split starts
2) split finalization executes barrier and succeed
3) load-and-stream runs now, starts writing sstable (pre-split)
4) split finalization publishes changes to tablet metadata
5) load-and-stream finishes writing sstable
6) sstable cannot be loaded since it spans two tablets

two possible fixes (maybe both):

1) load-and-stream awaits for topology to quiesce
2) perform split compaction on sstable that spans both sibling tablets

This patch implements # 1. By awaiting for topology to quiesce,
we guarantee that load-and-stream only starts when there's no
chance coordinator is handling some topology operation like
split finalization.

Fixes https://github.com/scylladb/scylladb/issues/26455.

- (cherry picked from commit 3abc66da5a)

- (cherry picked from commit 4654cdc6fd)

Parent PR: #26456

Closes scylladb/scylladb#26647

* github.com:scylladb/scylladb:
  sstables_loader: Don't bypass synchronization with busy topology
  test: Add reproducer for l-a-s and split synchronization issue
  sstables_loader: Synchronize tablet split and load-and-stream
2025-11-17 17:15:37 +02:00