Commit Graph

178 Commits

Author SHA1 Message Date
Botond Dénes
d845de84aa Merge '[Backport 2025.2] truncate: change check for write during truncate into a log warning' from Scylladb[bot]
TRUNCATE TABLE performs a memtable flush and then discards the sstables of the table being truncated. It collects the highest replay position for both of these. When the highest replay position of the discarded sstables is higher than the highest replay position of the flushed memtable, that means that we have had writes during truncate which have been flushed to disk independently of the truncate process. We check for this and trigger an on_internal_error() which throws an exception, informing the user that writing data concurrently with TRUNCATE TABLE is not advised.

The problem with this is that truncate is also called from DROP KEYSPACE and DROP TABLE. These are raft operations and exceptions thrown by them are caught by the (...) exception handler in the raft applier fiber, which then exits leaving the node without the ability to execute subsequent raft commands.

This commit changes the on_internal_error() into a warning log entry. It also outputs to keyspace/table names, and the offending replay positions which caused the check to fail.

This PR also adds a test which validates that TRUNCATE works correctly with concurrent writes. More specifically, it checks that:
- all data written before TRUNCATE starts is deleted
- none of the data after TRUNCATE completes is deleted

Fixes: #25173
Fixes: #25013

Backport is needed in versions which check for truncate with concurrent writes using `on_internal_error()`: 2025.3 2025.2 2025.1

- (cherry picked from commit 268ec72dc9)

- (cherry picked from commit 33488ba943)

Parent PR: #25174

Closes scylladb/scylladb#25349

* github.com:scylladb/scylladb:
  truncate: add test for truncate with concurrent writes
  truncate: change check for write during truncate into a log warning
2025-08-08 11:44:41 +03:00
Botond Dénes
cccf726b54 Merge '[Backport 2025.2] test: introduce upgrade tests to test.py, add a SSTable dict compression upgrade test' from Michał Chojnowski
This PR adds an upgrade test for SSTable compression with shared dictionaries, and adds some bits to pylib and test.py to support that.

In the series, we:

1. Mount $XDG_CACHE_DIR into dbuild.
2. Add a pylib function which downloads and installs a released ScyllaDB package into a subdirectory of $XDG_CACHE_DIR/scylladb/test.py, and returns the path to bin/scylla.
3. Add new methods and params to the cluster manager, which let the test start nodes with historical Scylla executables, and switch executables during the test.
4. Add a test which uses the above to run an upgrade test between the released package and the current build.
5. Add --run-internet-dependent-tests to test.py which lets the user of test.py skip this test (and potentially other internet-dependent tests in the future).

(The patch modifying wait_for_cql_and_get_hosts is a part of the new test — the new test needs it to test how particular nodes in a mixed-version cluster react to some CQL queries.)

This is a follow-up to https://github.com/scylladb/scylladb/pull/23025, split into a separate PR because the potential addition of upgrade tests to test.py deserved a separate thread.

Needs backport to 2025.2, because that's where the tested feature is introduced.

Fixes https://github.com/scylladb/scylladb/issues/24110

- (cherry picked from commit 63218bb094)
- (cherry picked from commit cc7432888e)
- (cherry picked from commit 34098fbd1f)
- (cherry picked from commit 2ef0db0a6b)
- (cherry picked from commit 1ff7e09edc)
- (cherry picked from commit 5da19ff6a6)
- (cherry picked from commit d3cb873532)
- (cherry picked from commit dd878505ca)

Parent PR: https://github.com/scylladb/scylladb/pull/23538

Closes scylladb/scylladb#25158

* github.com:scylladb/scylladb:
  test: add test_sstable_compression_dictionaries_upgrade.py
  test.py: add --run-internet-dependent-tests
  pylib/manager_client: add server_switch_executable
  test/pylib: in add_server, give a way to specify the executable and version-specific config
  pylib: pass scylla_env environment variables to the topology suite
  test/pylib: add get_scylla_2025_1_executable()
  pylib/scylla_cluster: give a way to pass executable-specific options to nodes
  dbuild: mount "$XDG_CACHE_HOME/scylladb"
2025-08-07 06:26:25 +03:00
Raphael S. Carvalho
be94db3ace replica: Fix take_storage_snapshot() running concurrently to merge completion
Some background:
When merge happens, a background fiber wakes up to merge compaction
groups of sibling tablets into main one. It cannot happen when
rebuilding the storage group list, since token metadata update is
not preemptable. So a storage group, post merge, has the main
compaction group and two other groups to be merged into the main.
When the merge happens, those two groups are empty and will be
freed.

Consider this scenario:
1) merge happens, from 2 to 1 tablet
2) produces a single storage group, containing main and two
other compaction groups to be merged into main.
3) take_storage_snapshot(), triggered by migration post merge,
gets a list of pointer to all compaction groups.
4) t__s__s() iterates first on main group, yields.
5) background fiber wakes up, moves the data into main
and frees the two groups
6) t__s__s() advances to other groups that are now freed,
since step 5.
7) segmentation fault

In addition to memory corruption, there's also a potential for
data to escape the iteration in take_storage_snapshot(), since
data can be moved across compaction groups in background, all
belonging to the same storage group. That could result in
data loss.

Readers should all operate on storage group level since it can
provide a view on all the data owned by a tablet replica.
The movement of sstable from group A to B is atomic, but
iteration first on A, then later on B, might miss data that
was moved from B to A, before the iteration reached B.
By switching to storage group in the interface that retrieves
groups by token range, we guarantee that all data of a given
replica can be found regardless of which compaction group they
sit on.

Fixes #23162.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24058

(cherry picked from commit 28056344ba)

Closes scylladb/scylladb#25338
2025-08-06 09:41:56 +03:00
Ferenc Szili
932223414b truncate: add test for truncate with concurrent writes
test_validate_truncate_with_concurrent_writes checks if truncate deletes
all the data written before the truncate starts, and does not delete any
data after truncate completes.

(cherry picked from commit 33488ba943)
2025-08-06 00:51:43 +00:00
Patryk Jędrzejczak
beb28a6155 test: delay setting recovery_leader in the recovery procedure tests
In the previous commit, we made it possible to set `recovery_leader`
on each node just before restarting it. Here, we change all the
tests of the Raft-based recovery procedure to use and test this option.

(cherry picked from commit 9e45e1159b)
2025-08-05 10:59:06 +00:00
Piotr Dulikowski
d9075e6160 test: sl: verify that legacy auth is not queried in sl to raft upgrade
Adjust `test_service_levels_upgrade`: right before upgrade to topology
on raft, enable an error injection which triggers when the standard role
manager is about to query the legacy auth tables in the
system_auth keyspace. The preceding commit which fixes
scylladb/scylladb#24963 makes sure that the legacy tables are not
queried during upgrade to topology on raft, so the error injection does
not trigger and does not cause a problem; without that commit, the test
fails.

(cherry picked from commit 3a082d314c)
2025-07-31 15:13:24 +00:00
Pavel Emelyanov
95b906bea9 Merge '[Backport 2025.2] storage_service: cancel all write requests after stopping transports' from Scylladb[bot]
When a node shuts down, in storage service, after storage_proxy RPCs are stopped, some write handlers within storage_proxy may still be waiting for background writes to complete. These handlers hold appropriate ERMs to block schema changes before the write finishes. After the RPCs are stopped, these writes cannot receive the replies anymore.

If, at the same time, there are RPC commands executing `barrier_and_drain`, they may get stuck waiting for these ERM holders to finish, potentially blocking node shutdown until the writes time out.

This change introduces cancellation of all outstanding write handlers from storage_service after the storage proxy RPCs were stopped.

Fixes scylladb/scylladb#23665

Backport: since this fixes an issue that frequently causes issues in CI, backport to 2025.1, 2025.2, and 2025.3.

- (cherry picked from commit bc934827bc)

- (cherry picked from commit e0dc73f52a)

Parent PR: #24714

Closes scylladb/scylladb#25169

* github.com:scylladb/scylladb:
  storage_service: Cancel all write requests on storage_proxy shutdown
  test: Add test for unfinished writes during shutdown and topology change
2025-07-28 09:25:15 +03:00
Pavel Emelyanov
8622a07bdd Merge '[Backport 2025.2] streaming: Avoid deadlock by running view checks in a separate scheduling group' from Scylladb[bot]
This issue happens with removenode, when RBNO is disabled, so range
streamer is used.

The deadlock happens in a scenario like this:
1. Start 3 nodes: {A, B, C}, RF=2
2. Node A is lost
3. removenode A
4. Both B and C gain ownership of ranges.
5. Streaming sessions are started with crossed directions: B->C, C->B

Readers created by sender side exhaust streaming semaphore on B and C.
Receiver side attempts to obtain a permit indirectly by calling
check_needs_view_update_path(), which reads local tables. That read is
blocked and times-out, causing streaming to fail. The streaming writer
is already using a tracking-only permit.

Even if we didn't deadlock, and the streaming semaphore was simply exhausted
by other receiving sessions (via tracking-only permit), the query may still time-out due to starvation.

To avoid that, run the query under a different scheduling group, which
translates to the system semaphore instead of the maintenance
semaphore, to break the dependency. The gossip group was chosen
because it shouldn't be contended and this change should not interfere
with it much.

Fixes #24807
Fixes #24925

- (cherry picked from commit ee2fa58bd6)

- (cherry picked from commit dff2b01237)

Parent PR: #24929

Closes scylladb/scylladb#25055

* github.com:scylladb/scylladb:
  streaming: Avoid deadlock by running view checks in a separate scheduling group
  service: migration_manager: Run group0 barrier in gossip scheduling group
2025-07-28 09:24:53 +03:00
Tomasz Grabiec
3991e4de28 streaming: Avoid deadlock by running view checks in a separate scheduling group
This issue happens with removenode, when RBNO is disabled, so range
streamer is used.

The deadlock happens in a scenario like this:
1. Start 3 nodes: {A, B, C}, RF=2
2. Node A is lost
3. removenode A
4. Both B and C gain ownership of ranges.
5. Streaming sessions are started with crossed directions: B->C, C->B

Readers created by sender side exhaust streaming semaphore on B and C.
Receiver side attempts to obtain a permit indirectly by calling
check_needs_view_update_path(), which reads local tables. That read is
blocked and times-out, causing streaming to fail. The streaming writer
is already using a tracking-only permit.

To avoid that, run the query under a different scheduling group, which
translates to the system semaphore instead of the maintenance
semaphore, to break the dependency. The gossip group was chosen
because it shouldn't be contended and this change should not interfere
with it much.

Fixes: #24807
(cherry picked from commit dff2b01237)
2025-07-27 22:52:56 +02:00
Sergey Zolotukhin
f15df0bcce storage_service: Cancel all write requests on storage_proxy shutdown
During a graceful node shutdown, RPC listeners are stopped in `storage_service::drain_on_shutdown`
as one of the first steps. However, even after RPCs are shut down, some write handlers in
`storage_proxy` may still be waiting for background writes to complete. These handlers retain the ERM.
Since the RPC subsystem is no longer active, replies cannot be received, and if any RPC commands are
concurrently executing `barrier_and_drain`, they may get stuck waiting for those writes. This can block
the messaging server shutdown and delay the entire shutdown process until the write timeout occurs.

This change introduces the cancellation of all outstanding write handlers in `storage_proxy`
during shutdown to prevent unnecessary delays.

Fixes scylladb/scylladb#23665

(cherry picked from commit e0dc73f52a)
2025-07-24 13:02:56 +00:00
Sergey Zolotukhin
487012e972 test: Add test for unfinished writes during shutdown and topology change
This test reproduces an issue where a topology change and an ongoing write query
during query coordinator shutdown can cause the node to get stuck.

When a node receives a write request, it creates a write handler that holds
a copy of the current table's ERM (Effective Replication Map). The ERM ensures
that no topology or schema changes occur while the request is being processed.

After the query coordinator receives the required number of replica write ACKs
to satisfy the consistency level (CL), it sends a reply to the client. However,
the write response handler remains alive until all replicas respond — the remaining
writes are handled in the background.

During shutdown, when all network connections are closed, these responses can no longer
be received. As a result, the write response handler is only destroyed once the write
timeout is reached.

This becomes problematic because the ERM held by the handler blocks topology or schema
change commands from executing. Since shutdown waits for these commands to complete,
this can lead to unnecessary delays in node shutdown and restarts, and occasional
test case failures.

Test for: scylladb/scylladb#23665

(cherry picked from commit bc934827bc)
2025-07-24 13:02:56 +00:00
Michał Chojnowski
7b5a4cadd7 test: add test_sstable_compression_dictionaries_upgrade.py
(cherry picked from commit dd878505ca)
2025-07-23 19:28:35 +02:00
Michał Chojnowski
1446b4e0ef test.py: add --run-internet-dependent-tests
Later, we will add upgrade tests, which need to download the previous
release of Scylla from the internet.

Internet access is a major dependency, so we want to make those tests
opt-in for now.

(cherry picked from commit d3cb873532)
2025-07-23 19:28:35 +02:00
Tomasz Grabiec
fa1b97f0c5 Merge '[Backport 2025.2] Improve background disposal of tablet_metadata' from Scylladb[bot]
As seen in #23284, when the tablet_metadata contains many tables, even empty ones,
we're seeing a long queue of seastar tasks coming from the individual destruction of
`tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>`.

This change improves `tablet_metadata::clear_gently` to destroy the `tablet_map_ptr` objects
on their owner shard by sorting them into vectors, per- owner shard.

Also, background call to clear_gently was added to `~token_metadata`, as it is destroyed
arbitrarily when automatic token_metadata_ptr variables go out of scope, so that the
contained tablet_metadata would be cleared gently.

Finally, a unit test was added to reproduce the `Too long queue accumulated for gossip` symptom
and verify that it is gone with this change.

Fixes #24814
Refs #23284

This change is not marked as fixing the issue since we still need to verify that there is no impact on query performance, reactor stalls, or large allocations, with a large number of tablet-based tables.

* Since the issue exists in 2025.1, requesting backport to 2025.1 and upwards

- (cherry picked from commit 3acca0aa63)

- (cherry picked from commit 493a2303da)

- (cherry picked from commit e0a19b981a)

- (cherry picked from commit 2b2cfaba6e)

- (cherry picked from commit 2c0bafb934)

- (cherry picked from commit 4a3d14a031)

- (cherry picked from commit 6e4803a750)

Parent PR: #24618

Closes scylladb/scylladb#24863

* github.com:scylladb/scylladb:
  token_metadata_impl: clear_gently: release version tracker early
  test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
  token_metadata: clear_and_destroy_impl when destroyed
  token_metadata: keep a reference to shared_token_metadata
  token_metadata: move make_token_metadata_ptr into shared_token_metadata class
  replica: database: get and expose a mutable locator::shared_token_metadata
  locator: tablets: tablet_metadata: clear_gently: optimize foreign ptr destruction
2025-07-23 17:00:44 +02:00
Benny Halevy
179e2b3bf1 test: cluster: test_tablets_merge: add test_tablet_split_merge_with_many_tables
Reproduces #23284

Currently skipped in release mode since it requires
the `short_tablet_stats_refresh_interval` interval.
Ref #24641

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4a3d14a031)
2025-07-21 09:49:02 +03:00
Michael Litvak
23dbe8952b test: cdc: add test_cdc_with_alter
Add a test that tests adding and dropping a column to a table with CDC
enabled while writing to it.

(cherry picked from commit 86dfa6324f)
2025-07-20 09:07:29 +02:00
Patryk Jędrzejczak
7bb43d812e test: test_zero_token_nodes_multidc: properly handle reads with CL=ONE
The test could fail with RF={DC1: 2, DC2: 0} and CL=ONE when:
- both writes succeeded with the same replica responding first,
- one of the following reads succeeded with the other replica
  responding before it applied mutations from any of the writes.

We fix the test by not expecting reads with CL=ONE to return a row.

We also harden the test by inserting different rows for every pair
(CL, coordinator), where one of the two coordinators is a normal
node from DC1, and the other one is a zero-token node from DC2.
This change makes sure that, for example, every write really
inserts a row.

Fixes scylladb/scylladb#22967

The fix addresses CI flakiness and only changes the test, so it
should be backported.

Closes scylladb/scylladb#23518

(cherry picked from commit 21edec1ace)

Closes scylladb/scylladb#24984
2025-07-15 15:50:21 +02:00
Botond Dénes
9482f45d13 test/cluster/test_read_repair: write 100 rows in trace test
This test asserts that a read repair really happened. To ensure this
happens it writes a single partition after enabling the database_apply
error injection point. For some reason, the write is sometimes reordered
with the error injection and the write will get replicated to both nodes
and no read repair will happen, failing the test.
To make the test less sensitive to such rare reordering, add a
clustering column to the table and write a 100 rows. The chance of *all*
100 of them being reordered with the error injection should be low
enough that it doesn't happen again (famous last words).

Fixes: #24330

Closes scylladb/scylladb#24403

(cherry picked from commit 495f607e73)

Closes scylladb/scylladb#24973
2025-07-15 13:27:31 +03:00
Michael Litvak
15517ba529 tablets: stop storage group on deallocation
When a tablet transitions to a post-cleanup stage on the leaving replica
we deallocate its storage group. Before the storage can be deallocated
and destroyed, we must make sure it's cleaned up and stopped properly.

Normally this happens during the tablet cleanup stage, when
table::cleanup_table is called, so by the time we transition to the next
stage the storage group is already stopped.

However, it's possible that tablet cleanup did not run in some scenario:
1. The topology coordinator runs tablet cleanup on the leaving replica.
2. The leaving replica is restarted.
3. When the leaving replica starts, still in `cleanup` stage, it
   allocates a storage group for the tablet.
4. The topology coordinator moves to the next stage.
5. The leaving replica deallocates the storage group, but it was not
   stopped.

To address this scenario, we always stop the storage group when
deallocating it. Usually it will be already stopped and complete
immediately, and otherwise it will be stopped in the background.

Fixes scylladb/scylladb#24857
Fixes scylladb/scylladb#24828

Closes scylladb/scylladb#24896

(cherry picked from commit fa24fd7cc3)

Closes scylladb/scylladb#24908
2025-07-15 13:25:38 +03:00
Piotr Dulikowski
e4dde34f52 Merge '[Backport 2025.2] main: don't start maintenance auth service if not enabled' from Scylladb[bot]
In f96d30c2b5
we introduced the maintenance service, which is an additional
instance of auth::service. But this service has a somewhat
confusing 2-level startup mechanism: it's initialized with
sharded<Service>::start and then auth::service::start
(different method with the same name to confuse even more).

When maintenance_socket was disabled (default setting), the code
did only the first part of the startup. This registered a config
observer but didn't create a permission_cache instance.
As a result, a crash on SIGHUP when config is reloaded can occur.

Fixes: https://github.com/scylladb/scylladb/issues/24528
Backport: all not eol versions since 6.0 and 2025.1

- (cherry picked from commit 97c60b8153)

- (cherry picked from commit dd01852341)

Parent PR: #24527

Closes scylladb/scylladb#24570

* github.com:scylladb/scylladb:
  test: add test for live updates of permissions cache config
  main: don't start maintenance auth service if not enabled
2025-07-09 09:47:57 +02:00
Michael Litvak
a26d8f72b6 test: test_batchlog_manager: batchlog replay includes cdc
Add a new test that verifies that when replaying batch mutations from
the batchlog, the mutations include cdc augmentation if needed.

This is done in order to verify that it works currently as expected and
doesn't break in the future.

(cherry picked from commit d7af26a437)
2025-07-08 06:25:03 +00:00
Michael Litvak
9012357a4b test: test_batchlog_manager: test batch replay when a node is down
Add a test of the batchlog manager replay loop applying failed batches
while some replica is down.

The test reproduces an issue where the batchlog manager tries to replay
a failed batch, doesn't get a response from some replica, and becomes
stuck.

It verifies that the batchlog manager can eventually recover from this
situation and continue applying failed batches.

(cherry picked from commit a9b476e057)
2025-07-08 06:25:03 +00:00
Marcin Maliszkiewicz
8e783cc23a test: add test for live updates of permissions cache config
(cherry picked from commit dd01852341)
2025-07-07 10:07:02 +02:00
Tomasz Grabiec
440985387e Merge '[Backport 2025.2] repair: postpone repair until topology is not busy ' from Scylladb[bot]
Currently, repair_service::repair_tablets starts repair if there
is no ongoing tablet operations. The check does not consider global
topology operations, like tablet resize finalization.

Hence, if:
- topology is in the tablet_resize_finalization state;
- repair starts (as there is no tablet transitions) and holds the erm;
- resize finalization finishes;

then the repair sees a topology state different than the actual -
it does not see that the storage groups were already split.
Repair code does not handle this case and it results with
on_internal_error.

Start repair when topology is not busy. The check isn't atomic,
as it's done on a shard 0. Thus, we compare the topology versions
to ensure that the business check is valid.

Fixes: https://github.com/scylladb/scylladb/issues/24195.

Needs backport to all branches since they are affected

- (cherry picked from commit df152d9824)

- (cherry picked from commit 83c9af9670)

Parent PR: #24202

Closes scylladb/scylladb#24781

* github.com:scylladb/scylladb:
  test: add test for repair and resize finalization
  repair: postpone repair until topology is not busy
2025-07-02 13:18:22 +02:00
Botond Dénes
9dd8dc5357 Merge '[Backport 2025.2] Fix for cassandra role gets recreated after DROP ROLE' from Scylladb[bot]
This patchset fixes regression introduced by 7e749cd848 when we started re-creating default superuser role and password from the config, even if new custom superuser was created by the user.

Now we'll check, first with CL LOCAL_ONE if there is a need to create default superuser role or password, confirm
it with CL QUORUM and only then atomically create role or password.

If server is started without cluster quorum we'll skip creating role or password.

Fixes https://github.com/scylladb/scylladb/issues/24469
Backport: all versions since 2024.2

- (cherry picked from commit 68fc4c6d61)

- (cherry picked from commit c96c5bfef5)

- (cherry picked from commit 2e2ba84e94)

- (cherry picked from commit f85d73d405)

- (cherry picked from commit d9ec746c6d)

- (cherry picked from commit a3bb679f49)

- (cherry picked from commit 67a4bfc152)

- (cherry picked from commit 0ffddce636)

- (cherry picked from commit 5e7ac34822)

Parent PR: #24451

Closes scylladb/scylladb#24694

* github.com:scylladb/scylladb:
  test: auth_cluster: add test for password reset procedure
  auth: cache roles table scan during startup
  test: auth_cluster: add test for replacing default superuser
  test: pylib: add ability to specify default authenticator during server_start
  auth: split auth-v2 logic for adding default superuser password
  auth: split auth-v2 logic for adding default superuser role
  auth: ldap: fix waiting for underlying role manager
  auth: wait for default role creation before starting authorizer and authenticator
2025-07-02 10:40:56 +03:00
Aleksandra Martyniuk
cbce0ed911 test: add test for repair and resize finalization
Add test that checks whether repair does not start if there is an
ongoing resize finalization.

(cherry picked from commit 83c9af9670)
2025-07-01 20:26:21 +00:00
Patryk Jędrzejczak
e3952fbd35 test: test_raft_recovery_user_data: disable hinted handoff
The test is currently flaky, writes can fail with "Too many in flight
hints: 10485936". See scylladb/scylladb#23565 for more details.

We suspect that scylladb/scylladb#23565 is caused by an infrastructure
issue - slow disks on some machines we run CI jobs on.

Since the test fails often and investigation doesn't seem to be easy,
we first deflake the test in this patch by disabling hinted handoff.

For replacing nodes, we provide `cfg` because there should have been
`cfg` in the first place. The test was correct anyway because:
- `tablets_mode_for_new_keyspaces` is set to `true` by default in
  test/cluster/suite.yaml,
- `endpoint_snitch` is set to `GossipingPropertyFileSnitch` by default
  if the property file is provided in `ScyllaServer.__init__`.

Ref scylladb/scylladb#23565

We should backport this patch to 2025.2 because this test is also flaky
on CI jobs using 2025.2. Older branches don't have this test.

Closes scylladb/scylladb#24364

(cherry picked from commit 8756c233e0)

Fixes #24756

Closes scylladb/scylladb#24757
2025-07-01 19:14:22 +02:00
Michał Chojnowski
50736e9740 test_sstable_compression_dictionaries_basic.py: fix a flaky check
test_dict_memory_limit trains new dictionaries and checks (via metrics)
that the old dictionaries are appropriately cleaned up.
The problem is that the cleanup is asynchronous (because the lifetimes
are handled by foreign_ptr, which sends the destructor call
to the owner shard asynchronously), so the metrics might be
checked a few milliseconds before the old dictionary is cleaned up.

The dict lifetimes are lazy on purpose, the right thing to do is
to just let the test retry the check.

Fixes scylladb/scylladb#24516

Closes scylladb/scylladb#24526

(cherry picked from commit cace55aaaf)

Closes scylladb/scylladb#24653
2025-07-01 12:30:25 +03:00
Avi Kivity
ee733c4d38 Merge '[Backport 2025.2] generic_server: fix connections semaphore config observer' from Scylladb[bot]
In ed3e4f33fd we introduced new connection throttling feature which is controlled by uninitialized_connections_semaphore_cpu_concurrency config. But live updating of it was broken, this patch fixes it.

When the temporary value from observer() is destroyed, it disconnects from updateable_value, so observation stops right away. We need to retain the observer.

Backport: to 2025.2 where this feature was added
Fixes: https://github.com/scylladb/scylladb/issues/24557

- (cherry picked from commit c6a25b9140)

- (cherry picked from commit 45392ac29e)

- (cherry picked from commit 68ead01397)

Parent PR: #24484

Closes scylladb/scylladb#24679

* github.com:scylladb/scylladb:
  test: add test for live updates of generic server config
  utils: don't allow do discard updateable_value observer
  generic_server: fix connections semaphore config observer
2025-07-01 12:29:53 +03:00
Abhinav Jha
160c937efe group0: modify start_operation logic to account for synchronize phase race condition
In the present scenario, the bootstrapping node undergoes synchronize phase after
initialization of group0, then enters post_raft phase and becomes fully ready for
group0 operations. The topology coordinator is agnostic of this and issues stream
ranges command as soon as the node successfully completes `join_group0`. Although for
a node booting into an already upgraded cluster, the time duration for which, node
remains in synchronize phase is negligible but this race condition causes trouble in a
small percentage of cases, since the stream ranges operation fails and node fails to bootstrap.

This commit addresses this issue and updates the error throw logic to account for this
edge case and lets the node wait (with timeouts) for synchronize phase to get over instead of throwing
error.

A regression test is also added to confirm the working of this code change. The test adds a
wait in synchronize phase for newly joining node and releases only after the program counter
reaches the synchronize case in the `start_operation` function. Hence it indicates that in the
updated code, the start_operation will wait for the node to get done with the
synchronize phase instead of throwing error.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#23536

Closes scylladb/scylladb#23829

(cherry picked from commit 5ff693eff6)

Closes scylladb/scylladb#24628
2025-07-01 10:10:55 +02:00
Marcin Maliszkiewicz
19b1922362 test: auth_cluster: add test for password reset procedure
(cherry picked from commit aef531077b)
2025-06-30 20:50:15 +02:00
Marcin Maliszkiewicz
5346d959ff test: auth_cluster: add test for replacing default superuser
This test demonstrates creating custom superuser guide:
https://opensource.docs.scylladb.com/stable/operating-scylla/security/create-superuser.html

(cherry picked from commit d9223b61a2)
2025-06-30 20:50:15 +02:00
Marcin Maliszkiewicz
a54cc8291c test: add test for live updates of generic server config
Affected config: uninitialized_connections_semaphore_cpu_concurrency

(cherry picked from commit 68ead01397)
2025-06-27 16:01:43 +02:00
Raphael S. Carvalho
fa420f8644 replica: Fix truncate assert failure
Truncate doesn't really go well with concurrent writes. The fix (#23560) exposed
a preexisting fragility which I missed.

1) truncate gets RP mark X, truncated_at = second T
2) new sstable written during snapshot or later, also at second T (difference of MS)
3) discard_sstables() get RP Y > saved RP X, since creation time of sstable
with RP Y is equal to truncated_at = second T.

So the problem is that truncate is using a clock of second granularity for
filtering out sstables written later, and after we got low mark and truncate time,
it can happen that a sstable is flushed later within the same second, but at a
different millisecond.
By switching to a millisecond clock (db_clock), we allow sstables written later
within the same second from being filtered out. It's not perfect but
extremely unlikely a new write lands and get flushed in the same
millisecond we recorded truncated_at timepoint. In practice, truncate
will not be used concurrently to writes, so this should be enough for
our tests performing such concurrent actions.
We're moving away from gc_clock which is our cheap lowres_clock, but
time is only retrieved when creating sstable objects, which frequency of
creation is low enough for not having significant consequences, and also
db_clock should be cheap enough since it's usually syscall-less.

Fixes #23771.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24426

(cherry picked from commit 2d716f3ffe)

Closes scylladb/scylladb#24435
2025-06-24 10:02:06 +03:00
Andrzej Jackowski
60bc1c339c test: wait for normal state propagation in test_auth_v2_migration
By default, cluster tests have skip_wait_for_gossip_to_settle=0 and
ring_delay_ms=0. In tests with gossip topology, it may lead to a race,
where nodes see different state of each other.

In case of test_auth_v2_migration, there are three nodes. If the first
node already knows that the third node is NORMAL, and the second node
does not, the system_auth tables can return incomplete results.

To avoid such a race, this commit adds a check that all nodes see other
nodes as NORMAL before any writes are done.

Refs: #24163

Closes scylladb/scylladb#24185

(cherry picked from commit 555d897a15)

Closes scylladb/scylladb#24520
2025-06-24 10:01:42 +03:00
Michael Litvak
305f827888 test/cluster/test_tablets: test restart during tablet cleanup
Add a test that reproduces issue scylladb/scylladb#23481.

The test migrates a tablet from one node to another, and while the
tablet is in some stage of cleanup - either before or right after,
depending on the parameter - the leaving replica, on which the tablet is
cleaned, is restarted.

This is interesting because when the leaving replica starts and loads
its state, the tablet could be in different stages of cleanup - the
SSTables may still exist or they may have been cleaned up already, and
we want to make sure the state is loaded correctly.

(cherry picked from commit bd88ca92c8)
2025-06-17 13:59:10 +00:00
Botond Dénes
a63b22eec6 Merge '[Backport 2025.2] tablets: fix missing data after tablet merge ' from Scylladb[bot]
Consider the following scenario:

1) let's assume tablet 0 has range [1, 5] (pre merge)
2) tablet merge happens, tablet 0 has now range [1, 10]
3) tablet_sstable_set isn't refreshed, so holds a stale state, thinks tablet 0 still has range [1, 5]
4) during a full scan, forward service will intersect the full range with tablet ranges and consume one tablet at a time
5) replica service is asked to consume range [1, 10] of tablet 0 (post merge)

We have two possible outcomes:

With cache bypass:

1) cache reader is bypassed
2) sstable reader is created on range [1, 10]
3) unrefreshed tablet_sstable_set holds stale state, but select correctly all sstables intersecting with range [1, 10]

With cache:

1) cache reader is created
2) finds partition with token 5 is cached
3) sstable reader is created on range [1, 4] (later would fast forward to range [6, 10]; also belongs to tablet 0)
4) incremental selector consumes the pre-merge sstable spanning range [1, 5]
4.1) since the partitioned_sstable_set pre-merge contains only that sstable, EOS is reached
4.2) since EOS is reached, the fast forward to range [6, 10] is not allowed.
So with the set refreshed, sstable set is aligned with tablet ranges, and no premature EOS is signalled, otherwise preventing fast forward to from happening and all data from being properly captured in the read.

This change fixes the bug and triggers a mutation source refresh whenever the number of tablets for the table has changed, not only when we have incoming tablets.

Additionally, includes a fix for range reads that span more than one tablet, which can happen during split execution.

Fixes: https://github.com/scylladb/scylladb/issues/23313

This change needs to be backported to all supported versions which implement tablet merge.

- (cherry picked from commit d0329ca370)

- (cherry picked from commit 1f9f724441)

- (cherry picked from commit 53df911145)

Parent PR: #24287

Closes scylladb/scylladb#24339

* github.com:scylladb/scylladb:
  replica: Fix range reads spanning sibling tablets
  test: add reproducer and test for mutation source refresh after merge
  tablets: trigger mutation source refresh on tablet count change
2025-06-17 08:35:14 +03:00
Ferenc Szili
ba192c1a29 test: add reproducer and test for mutation source refresh after merge
This change adds a reproducer and test for the fix where the local mutation
source is not always refreshed after a tablet merge.

(cherry picked from commit 1f9f724441)
2025-06-15 09:14:37 -03:00
Michał Chojnowski
9c28b812ca db/config: add an option that disables dict-aware sstable compressors in DDL statements
For reasons, we want to be able to disallow dictionary-aware compressors
in chosen deployments.

This patch adds a knob for that. When the knob is disabled,
dictionary-aware compressors will be rejected in the validation
stage of CREATE and ALTER statements.

Closes scylladb/scylladb#24355

(cherry picked from commit 7d26d3c7cb)

Closes scylladb/scylladb#24454
2025-06-13 14:03:32 +03:00
Michael Litvak
d792916e8e test_cdc_generation_clearing: wait for generations to propagate
In test_cdc_generation_clearing we trigger events that update CDC
generations, verify the generations are updated as expected, and verify
the system topology and CDC generations are consistent on all nodes.

Before checking that all nodes are consistent and have the same CDC
generations, we need to consider that the changes are propagated through
raft and take some time to propagate to all nodes.

Currently, we wait for the change to be applied only on the first server
which runs the CDC generation publisher fiber and read the CDC
generations from this single node. The consistency check that follows
could fail if the change was not propagated to some other node yet.

To fix that, before checking consistency with all nodes, we execute a
read barrier on all nodes so they all see the same state as the leader.

Fixes scylladb/scylladb#24407

Closes scylladb/scylladb#24433

(cherry picked from commit 8aeb404893)

Closes scylladb/scylladb#24450
2025-06-10 15:50:40 +03:00
Robert Bindar
b62264e1d9 Add nodetool refresh --scope option
This change adds the --scope option to nodetool refresh.
Like in the case of nodetool restore, you can pass either of:
* node - On the local node.
* rack - On the local rack.
* dc - In the datacenter (DC) where the local node lives.
* all (default) - Everywhere across the cluster.
as scope.

The feature is based on the existing load_and_stream paths, so it
requires passing --load-and-stream to the refresh command.
Also, it is not compatible with the --primary-replica-only option.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#23861

(cherry picked from commit c570941692)
2025-06-04 11:59:17 +03:00
Robert Bindar
36cc0f8e7e Refactor out code from test_restore_with_streaming_scopes
part 5: check_data_is_back

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit 548a1ec20a)
2025-06-04 11:54:07 +03:00
Robert Bindar
a885c87547 Refactor out code from test_restore_with_streaming_scopes
part 4: compute_scope

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit 29309ae533)
2025-06-04 11:54:01 +03:00
Robert Bindar
371fc05943 Refactor out code from test_restore_with_streaming_scopes
part 3: create_dataset

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit a0f0580a9c)
2025-06-04 11:53:51 +03:00
Robert Bindar
4366cd5a81 Refactor out code from test_restore_with_streaming_scopes
part 2: take_snapshot

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit 5171ca385a)
2025-06-04 11:53:43 +03:00
Robert Bindar
38ee119112 Refactor out code from test_restore_with_streaming_scopes
part 1: create_cluster

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
(cherry picked from commit f09bb20ac4)
2025-06-04 11:53:32 +03:00
Michael Litvak
5aca2c134d test_cdc_generation_publishing: fix to read monotonically
The test test_multiple_unpublished_cdc_generations reads the CDC
generation timestamps to verify they are published in the correct order.
To do so it issues reads in a loop with a short sleep period and checks
the differences between consecutive reads, assuming they are monotonic.

However the assumption that the reads are monotonic is not valid,
because the reads are issued with consistency_level=ONE, thus we may read
timestamps {A,B} from some node, then read timestamps {A} from another
node that didn't apply the write of the new timestamp B yet. This will
trigger the assert in the test and fail.

To ensure the reads are monotonic we change the test to use consistency
level ALL for the reads.

Fixes scylladb/scylladb#24262

Closes scylladb/scylladb#24272

(cherry picked from commit 3a1be33143)

Closes scylladb/scylladb#24336
2025-06-02 14:42:57 +03:00
Wojciech Mitros
dec10d348e test: actually wait for tablets to distribute across nodes
In test_tablet_mv_replica_pairing_during_replace, after we create
the tables, we want to wait for their tablets to distribute evenly
across nodes and we have a wait_for for that.
But we don't await this wait_for, so it's a no-op. This patch fixes
it by adding the missing await.

Refs scylladb/scylladb#23982
Refs scylladb/scylladb#23997

Closes scylladb/scylladb#24250

(cherry picked from commit 5074daf1b7)

Closes scylladb/scylladb#24311
2025-05-29 16:44:51 +02:00
Wojciech Mitros
847504ad25 test_mv_tablets_replace: wait for tablet replicas to balance before working on them
In the test test_tablet_mv_replica_pairing_during_replace we stop 2 out of 4 servers while using RF=2.
Even though in the test we use exactly 4 tablets (1 for each replica of a base table and view), intially,
the tablets may not be split evenly between all nodes. Because of this, even when we chose a server that
hosts the view and a different server that hosts the base table, we sometimes stoped all replicas of the
base or the view table because the node with the base table replica may also be a view replica.

After some time, the tablets should be distributed across all nodes. When that happens, there will be
no common nodes with a base and view replica, so the test scenario will continue as planned.

In this patch, we add this waiting period after creating the base and view, and continue the test only
when all 4 tablets are on distinct nodes.

Fixes https://github.com/scylladb/scylladb/issues/23982
Fixes https://github.com/scylladb/scylladb/issues/23997

Closes scylladb/scylladb#24111

(cherry picked from commit bceb64fb5a)

Closes scylladb/scylladb#24126
2025-05-16 11:51:07 +03:00
Pavel Emelyanov
854587c10c Merge '[Backport 2025.2] test/cluster: Adjust tests to RF-rack-valid keyspaces' from Scylladb[bot]
In this PR, we're adjusting most of the cluster tests so that they pass
with the `rf_rack_valid_keyspaces` configuration option enabled. In most
cases, the changes are straightforward and require little to no additional
insight into what the tests are doing or verifying. In some, however, doing
that does require a deeper understanding of the tests we're modifying.
The justification for those changes and their correctness is included in
the commit messages corresponding to them.

Note that this PR does not cover all of the cluster tests. There are few
remaining ones, but they require a bit more effort, so we delegate that
work to a separate PR.

I tested all of the modified tests locally with `rf_rack_valid_keyspaces`
set to true, and they all passed.

Fixes scylladb/scylladb#23959

Backport: we want to backport these changes to 2025.1 since that's the version where we introduced RF-rack-valid keyspaces in. Although the tests are not, by default, run with `rf_rack_valid_keyspaces` enabled yet, that will most likely change in the near future and we'll also want to backport those changes too. The reason for this is that we want to verify that Scylla works correctly even with that constraint.

- (cherry picked from commit dbb8835fdf)

- (cherry picked from commit 9281bff0e3)

- (cherry picked from commit 5b83304b38)

- (cherry picked from commit 73b22d4f6b)

- (cherry picked from commit 2882b7e48a)

- (cherry picked from commit 4c46551c6b)

- (cherry picked from commit 92f7d5bf10)

- (cherry picked from commit 5d1bb8ebc5)

- (cherry picked from commit d3c0cd6d9d)

- (cherry picked from commit 04567c28a3)

- (cherry picked from commit c8c28dae92)

- (cherry picked from commit c4b32c38a3)

- (cherry picked from commit ee96f8dcfc)

Parent PR: #23661

Closes scylladb/scylladb#24121

* github.com:scylladb/scylladb:
  test/cluster/suite.yaml: Enable rf_rack_valid_keyspaces in suite
  test/cluster: Disable rf_rack_valid_keyspaces in problematic tests
  test/cluster/test_tablets: Divide rack into two to adjust tests to RF-rack-validity
  test/cluster/test_tablets: Adjust test_tablet_rf_change to RF-rack-validity
  test/cluster/test_tablet_repair_scheduler.py: Adjust to RF-rack-validity
  test/pylib/repair.py: Assign nodes to multiple racks in create_table_insert_data_for_repair
  test/cluster/test_zero_token_nodes_topology_ops: Adjust to RF-rack-validity
  test/cluster/test_zero_token_nodes_no_replication.py: Adjust to RF-rack-validity
  test/cluster/test_zero_token_nodes_multidc.py: Adjust to RF-rack-validity
  test/cluster/test_not_enough_token_owners.py: Adjust to RF-rack-validity
  test/cluster/test_multidc.py: Adjust to RF-rack-validity
  test/cluster/object_store/test_backup.py: Adjust to RF-rack-validity
  test/cluster: Adjust simple tests to RF-rack-validity
2025-05-16 11:50:45 +03:00