Commit Graph

221 Commits

Author SHA1 Message Date
Avi Kivity
1e0b015c8b Merge 'cql3: Represent create_statement using managed_bytes' from Dawid Mędrek
When describing a table, we need to do it carefully: if some
columns were dropped, we must specify that explicitly by

```
ALTER TABLE {table} DROP {column} USING TIMESTAMP ...
```

in the result of the DESCRIBE statement. Failing to do so
could lead to data resurrection.

However, if a table has been altered many, many times,
we might end up with a huge create statement. Constructing
it could, in turn, trigger an oversized allocation.
Some tests ran into that very problem in fact.

In this commit, we want to mitigate the problem: instead of
allocating a contiguous chunk of memory for the create
statement, we use `bytes_ostream` and `managed_bytes` to
possibly keep data scattered in memory. It makes handling
`cql3::description` less convenient in the code, but since
the struct is pretty much immediately serialized after
creating it, it's a very good trade-off.

A reproducer is intentionally not provided by this commit:
it's easy to test the change, but adding and dropping
a huge number of columns would take a really long amount
of time, so we need to omit it.

Fixes scylladb/scylladb#24018

Backport: all of the supported versions are affected, so we want to backport the changes there.

Closes scylladb/scylladb#24151

* github.com:scylladb/scylladb:
  cql3/description: Serialize only rvalues of description
  cql3: Represent create_statement using managed_string
  cql3/statements/describe_statement.cc: Don't copy descriptions
  cql3: Use managed_bytes instead of bytes in DESCRIBE
  utils/managed_string.hh: Introduce managed_string and fragmented_ostringstream
2025-07-01 21:59:38 +03:00
Tomasz Grabiec
97679002ee Merge 'Co-locate tablets of different tables' from Michael Litvak
Add the option to co-locate tablets of different tables. For example, a base table and its CDC table, or a local index.

main changes and ideas:
* "table group" - a set of one or more tables that should be co-located. (Example: base table and CDC table). A group consists of one base table and zero or more children tables.
* new column `base_table` in `system.tablets`: when creating a new table, it can be set to point to a base table, which the new table's tablets will be co-located with. when it's set, the tablet map information should be retrieved from the base table map. the child map doesn't contain per-tablet information.
* co-located tables always have the same tablet count and the same tablet replicas. each tablet operation - migration, resize, repair - is applied on all tablets in a synchronized manner by the topology coordinator.
* resize decision for a group is made by combining the per-table hints and comparing the average tablet size (over all tablets in the group) with the target tablet size.
* the tablets load balancer works with the base table as a representative of the group. it represents a single migration unit with some `group_size` that is taken into account.
* view tablets are co-located with base tablets when the partition keys match.

Fixes https://github.com/scylladb/scylladb/issues/17043

backport is not needed. this is preliminary work for support of MVs and CDC with tablets.

Closes scylladb/scylladb#22906

* github.com:scylladb/scylladb:
  tablets: validate no clustering row mutations on co-located tables
  raft_group0_client: extend validate_change to mixed_change type
  docs: topology-over-raft: document co-located tables
  tablet-mon.py: visual indication for co-located tablets
  tablet-mon.py: handle co-located tablets
  test/boost/view_schema_test.cc: fix race in wait_until_built
  boost/tablets_test: test load balancing and resize of co-located tablets
  test/tablets: test tablets colocation
  tablets: co-locate view tablets with base when the partition keys match
  test/pylib/tablets: common get_tablet_count api
  test_mv_tablets: use get_tablet_replicas from common tablets api
  test/pylib/tablets: fix test api to read tablet replicas from base table
  tablets: allocator: create co-located tables in a single operation
  alternator: prepare all new tables in a single announcement
  migration_manager: add notification for creating multiple tables
  tablets: read_tablet_transition_stage: read from base table
  storage service: allow repair request only on base tables
  tablets: keyspace_rf_change: apply on base table
  storage service: generate tablet migration updates on base tables
  tablets: replace all_tables method
  tablets: split when all co-located tablets are ready
  tablets: load balancer: sizing plan for table groups
  tablets: load balancer: handle co-located tablets
  tablets: allocate co-located tablets
  tablets: handle migration of co-located tablets
  storage service: add repair colocated tablets rpc
  tablets: save and read tablet metadata of co-located tables
  tablets: represent co-located tables in tablet metadata
  tablets: add base_table column to system.tablets
  docs: update system.tablets schema
2025-07-01 16:02:30 +02:00
Tomasz Grabiec
6290b70d53 Merge 'repair: postpone repair until topology is not busy ' from Aleksandra Martyniuk
Currently, repair_service::repair_tablets starts repair if there
is no ongoing tablet operations. The check does not consider global
topology operations, like tablet resize finalization.

Hence, if:
- topology is in the tablet_resize_finalization state;
- repair starts (as there is no tablet transitions) and holds the erm;
- resize finalization finishes;

then the repair sees a topology state different than the actual -
it does not see that the storage groups were already split.
Repair code does not handle this case and it results with
on_internal_error.

Start repair when topology is not busy. The check isn't atomic,
as it's done on a shard 0. Thus, we compare the topology versions
to ensure that the business check is valid.

Fixes: https://github.com/scylladb/scylladb/issues/24195.

Needs backport to all branches since they are affected

Closes scylladb/scylladb#24202

* github.com:scylladb/scylladb:
  test: add test for repair and resize finalization
  repair: postpone repair until topology is not busy
2025-07-01 16:02:22 +02:00
Dawid Mędrek
ac9062644f cql3: Represent create_statement using managed_string
When describing a table, we need to do it carefully: if some
columns were dropped, we must specify that explicitly by

```
ALTER TABLE {table} DROP {column} USING TIMESTAMP ...
```

in the result of the DESCRIBE statement. Failing to do so
could lead to data resurrection.

However, if a table has been altered many, many times,
we might end up with a huge create statement. Constructing
it could, in turn, trigger an oversized allocation.
Some tests ran into that very problem in fact.

In this commit, we want to mitigate the problem: instead of
allocating a contiguous chunk of memory for the create
statement, we use `fragmented_ostringstream` and `managed_string`
to possibly keep data scattered in memory. It makes handling
`cql3::description` less convenient in the code, but since
the struct is pretty much immediately serialized after
creating it, it's a very good trade-off.

We provide a reproducer. It consistently passes with this commit,
while having about 50% chance of failure before it (based on my
own experiments). Playing with the parameters of the test
doesn't seem to improve that chance, so let's keep it as-is.

Fixes scylladb/scylladb#24018
2025-07-01 12:58:02 +02:00
Michael Litvak
65ed0548d6 test/tablets: test tablets colocation
Add tests with co-located tablets, testing migration and other relevant
operations.
2025-07-01 13:20:19 +03:00
Michael Litvak
e01aae7871 test/pylib/tablets: common get_tablet_count api
Introduce a common get_tablet_count test api instead of it being
duplicated in few tests, and fix it to read the tablet count from the
base table.
2025-07-01 13:20:19 +03:00
Michael Litvak
e719da3739 test_mv_tablets: use get_tablet_replicas from common tablets api
Replace the duplicated get_tablet_replicas method in test_mv_tablets
with the common method from the tablets api, to reduce code duplication
and use the correct method that reads the tablet replicas from the base
table.
2025-07-01 13:20:19 +03:00
Pavel Emelyanov
23d86ede72 Merge 'audit: introduce debug level logs on happy path' from Dario Mirovic
Audit component defines `audit` logger which it uses only for `error` and `info` logs,
regarding `audit` module initialization and errors during audit log writing.
This change introduces `debug` level logs on the happy path of audit log writes.

Fixes: https://github.com/scylladb/scylladb/issues/23773

No backport needed - this is a small quality-of-life improvement.

Closes scylladb/scylladb#24658

* github.com:scylladb/scylladb:
  audit: change audit test logger level to `debug`
  audit: introduce debug level logs on happy path
2025-06-27 20:10:54 +03:00
Dario Mirovic
ec6249b581 audit: change audit test logger level to debug
Audit module tests should show the `debug` level messages.
This change makes audit_test.py `audit` module log level to `debug`.

Closes scylladb/scylladb#23773
2025-06-27 16:27:33 +02:00
Botond Dénes
495f607e73 test/cluster/test_read_repair: write 100 rows in trace test
This test asserts that a read repair really happened. To ensure this
happens it writes a single partition after enabling the database_apply
error injection point. For some reason, the write is sometimes reordered
with the error injection and the write will get replicated to both nodes
and no read repair will happen, failing the test.
To make the test less sensitive to such rare reordering, add a
clustering column to the table and write a 100 rows. The chance of *all*
100 of them being reordered with the error injection should be low
enough that it doesn't happen again (famous last words).

Fixes: #24330

Closes scylladb/scylladb#24403
2025-06-27 16:23:08 +03:00
Piotr Dulikowski
2f7ed8b1d4 Merge 'Fix for cassandra role gets recreated after DROP ROLE' from Marcin Maliszkiewicz
This patchset fixes regression introduced by 7e749cd848 when we started re-creating default superuser role and password from the config, even if new custom superuser was created by the user.

Now we'll check, first with CL LOCAL_ONE if there is a need to create default superuser role or password, confirm
it with CL QUORUM and only then atomically create role or password.

If server is started without cluster quorum we'll skip creating role or password.

Fixes https://github.com/scylladb/scylladb/issues/24469
Backport: all versions since 2024.2

Closes scylladb/scylladb#24451

* github.com:scylladb/scylladb:
  test: auth_cluster: add test for password reset procedure
  auth: cache roles table scan during startup
  test: auth_cluster: add test for replacing default superuser
  test: pylib: add ability to specify default authenticator during server_start
  test: pylib: allow rolling restart without waiting for cql
  auth: split auth-v2 logic for adding default superuser password
  auth: split auth-v2 logic for adding default superuser role
  auth: ldap: fix waiting for underlying role manager
  auth: wait for default role creation before starting authorizer and authenticator
2025-06-26 14:36:25 +02:00
Marcin Maliszkiewicz
5e7ac34822 test: auth_cluster: add test for password reset procedure 2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz
67a4bfc152 test: auth_cluster: add test for replacing default superuser
This test demonstrates creating custom superuser guide:
https://opensource.docs.scylladb.com/stable/operating-scylla/security/create-superuser.html
2025-06-26 12:28:08 +02:00
Piotr Dulikowski
62efe6616a Merge 'mapreduce: add tablet-aware dispatching algorithm' from Andrzej Jackowski
The primary motivation for this change is to reduce the time during which the Effective Replication Map (ERM) is retained by the mapreduce service. This ensures that long aggregate queries do not block topology operations. As ScyllaDB is generally transitioning towards tablets, and using tablets simplifies work dispatching, the decision was made to design the new algorithm specifically for tablets. The goal of the algorithm is to divide the work in such a way that each `tablet_replica` (that is <host, shard> pair) processes two tablets at a time.

The new algorithm can be summarized as follows:
 1. Prepare a tablet_replica -> partition_range mapping where the values     cover the entire space.
 2. For each tablet_replica, in parallel, take two partition ranges and dispatch them to the node hosting the replica. The ERM is released and re-acquired in each iteration, allowing the destination (i.e., tablet_replica) to change for each
artition range (in such cases, the partition range is assigned to the appropriate tablet_replica).

In step 1, the main difference compared to the old algorithm (dispatch_to_vnodes) is that partition ranges are assigned to a tablet_replica rather than just the host.

In step 2, the main difference is that the work is divided into smaller batches, and the ERM is released and re-acquired for each batch.

In the current implementation, each node can correctly handle every partition range, even if the mapreduce supercoordinator does not retain the ERM and the range is absent locally. This is because mapreduce_service::execute_on_this_shard creates a new pager that coordinates the partition range read, including obtaining its own ERM. However, every partition range that is absent locally is handled by shard 0. Therefore, proper routing of partition ranges is necessary to avoid shard 0 overload. This is why, in step 2, the ERM is retained during each batch processing, and the tablet_replica is refreshed for each processed range.

Additionally, shard_id is added to mapreduce request. When shard_id is set, the entire partition range is handled by the specified shard. As the new tablet-aware mapreduce algorithm balances the workload across shards, shard_id ensure that the balance is preserved, even during events such as tablet splits.

This patch series:
 - Refactors a bit mapreduce service, to facilitate having two algorithm versions (one for vnodes and one for tablets).
 - Implements tablet-aware dispatching algorithm.
 - Adds shard_id to mapreduce request and uses the information to handle requests entirely by selected shard.
 - Adds test_long_query_timeout_erm to verify the new functionality.

Fixes: scylladb#21831

No backport, as it is rather new feature than a bugfix.

Closes scylladb/scylladb#24383

* github.com:scylladb/scylladb:
  mapreduce: add missing comma and space in mapreduce_request operator<<
  mapreduce: add shard_id_hint to mapreduce request
  test: add test_long_query_timeout_erm
  mapreduce: add tablet-aware dispatching algorithm
  storage_proxy: make storage_proxy::is_alive public
  mapreduce: remove _shared_token_metadata from mapreduce_service
  mapreduce: move dispatching logic to dispatch_to_vnodes
  mapreduce: remove underscores from variable names
  mapreduce: move req_with_modified_pr handling to a new function
  mapreduce: change next_vnode lambda to get_next_partition_range function
2025-06-26 12:25:39 +02:00
Piotr Dulikowski
23f0d275c8 Merge 'generic_server: fix connections semaphore config observer' from Marcin Maliszkiewicz
In ed3e4f33fd we introduced new connection throttling feature which is controlled by uninitialized_connections_semaphore_cpu_concurrency config. But live updating of it was broken, this patch fixes it.

When the temporary value from observer() is destroyed, it disconnects from updateable_value, so observation stops right away. We need to retain the observer.

Backport: to 2025.2 where this feature was added
Fixes: https://github.com/scylladb/scylladb/issues/24557

Closes scylladb/scylladb#24484

* github.com:scylladb/scylladb:
  test: add test for live updates of generic server config
  utils: don't allow do discard updateable_value observer
  generic_server: fix connections semaphore config observer
2025-06-26 12:25:38 +02:00
Andrzej Jackowski
5f31011111 test: add test_long_query_timeout_erm
This test verifies the effectiveness of the mechanism for releasing ERM
introduced in this patch series. In test scenario, during processing of
a query in mapreduce service, reads are intentionally blocked by
an injected error. However, when table uses tablets, ERM is now often
released by the mapreduce service, so the topology is not blocked to the
end of the request. As a result, it is possible to add a new node
before the query finishes.

Refs. scylladb#21831
2025-06-25 19:22:48 +02:00
Sergey Zolotukhin
0d7de90523 Fix regexp in check_node_log_for_failed_mutations
The regexp that was added in https://github.com/scylladb/scylladb/pull/23658 does not work as expected:
`TRACE`, `INFO` and `DEBUG` level messages are not ignored.
This patch corrects the pattern to ensure those log levels are excluded.

Fixes scylladb/scylladb#23688

Closes scylladb/scylladb#23889
2025-06-25 12:00:16 +03:00
Michał Chojnowski
cace55aaaf test_sstable_compression_dictionaries_basic.py: fix a flaky check
test_dict_memory_limit trains new dictionaries and checks (via metrics)
that the old dictionaries are appropriately cleaned up.
The problem is that the cleanup is asynchronous (because the lifetimes
are handled by foreign_ptr, which sends the destructor call
to the owner shard asynchronously), so the metrics might be
checked a few milliseconds before the old dictionary is cleaned up.

The dict lifetimes are lazy on purpose, the right thing to do is
to just let the test retry the check.

Fixes scylladb/scylladb#24516

Closes scylladb/scylladb#24526
2025-06-25 11:30:28 +03:00
Nadav Har'El
16c1365332 test,alternator: test server-side load balancing with zero-token node
In issue #6527 it was suggested that a zero-token node (a.k.a coordinator-
only node, or data-less node) could serve as a topology-aware Alternator
load balancer - requests could be sent to it and they will be forwarded to
the right node.

This feature was implemented, but we never tested that it actually works
for Alternator requests. So this patch tests this by starting a 5-node
cluster with 4 regular nodes and one zero-token node, and testing that
requests to the zero-token node work as expected.

It is important to know that this feature does indeed work as expected,
and also to have a regression test for it so the feature doesn't break
in the future.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23114
2025-06-25 11:13:15 +03:00
Abhinav Jha
5ff693eff6 group0: modify start_operation logic to account for synchronize phase race condition
In the present scenario, the bootstrapping node undergoes synchronize phase after
initialization of group0, then enters post_raft phase and becomes fully ready for
group0 operations. The topology coordinator is agnostic of this and issues stream
ranges command as soon as the node successfully completes `join_group0`. Although for
a node booting into an already upgraded cluster, the time duration for which, node
remains in synchronize phase is negligible but this race condition causes trouble in a
small percentage of cases, since the stream ranges operation fails and node fails to bootstrap.

This commit addresses this issue and updates the error throw logic to account for this
edge case and lets the node wait (with timeouts) for synchronize phase to get over instead of throwing
error.

A regression test is also added to confirm the working of this code change. The test adds a
wait in synchronize phase for newly joining node and releases only after the program counter
reaches the synchronize case in the `start_operation` function. Hence it indicates that in the
updated code, the start_operation will wait for the node to get done with the
synchronize phase instead of throwing error.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#23536

Closes scylladb/scylladb#23829
2025-06-24 10:04:39 +02:00
Marcin Maliszkiewicz
68ead01397 test: add test for live updates of generic server config
Affected config: uninitialized_connections_semaphore_cpu_concurrency
2025-06-23 17:56:26 +02:00
Patryk Jędrzejczak
6489308ebc Merge 'Introduce a queue of global topology requests.' from Gleb Natapov
Currently only one global topology request (such as truncate, cdc repair, cleanup and alter table) can be pending. If one is already pending others will be rejected with an error. This is not very user friendly, so this series introduces a queue of global requests which allows queuing many global topology requests simultaneously.

Fixes: #16822

No need to backport since this is a new feature.

Closes scylladb/scylladb#24293

* https://github.com/scylladb/scylladb:
  topology coordinator: simplify truncate handling in case request queue feature is disable
  topology coordinator: fix indentation after the previous patch
  topology coordinator: allow running multiple global commands in parallel
  topology coordinator: Implement global topology request queue
  topology coordinator: Do not cancel global requests in cancel_all_requests
  topology coordinator: store request type for each global command
  topology request: make it possible to hold global request types in request_type field
  topology coordinator: move alter table global request parameters into topology_request table
  topology coordinator: move cleanup global command to report completion through topology_request table
  topology coordinator: no need to create updates vector explicitly
  topology coordinator: use topology_request_tracking_mutation_builder::done() instead of open code it
  topology coordinator: handle error during new_cdc_generation command processing
  topology coordinator: remove unneeded semicolon
  topology coordinator: fix indentation after the last commit
  topology coordinator: move new_cdc_generation topology request to use topology_request table for completion
  gms/feature_service: add TOPOLOGY_GLOBAL_REQUEST_QUEUE feature flag
2025-06-23 16:08:09 +03:00
Avi Kivity
c89ab90554 Merge 'main: don't start maintenance auth service if not enabled' from Marcin Maliszkiewicz
In f96d30c2b5
we introduced the maintenance service, which is an additional
instance of auth::service. But this service has a somewhat
confusing 2-level startup mechanism: it's initialized with
sharded<Service>::start and then auth::service::start
(different method with the same name to confuse even more).

When maintenance_socket was disabled (default setting), the code
did only the first part of the startup. This registered a config
observer but didn't create a permission_cache instance.
As a result, a crash on SIGHUP when config is reloaded can occur.

Fixes: https://github.com/scylladb/scylladb/issues/24528
Backport: all not eol versions since 6.0 and 2025.1

Closes scylladb/scylladb#24527

* github.com:scylladb/scylladb:
  test: add test for live updates of permissions cache config
  main: don't start maintenance auth service if not enabled
2025-06-18 20:28:53 +03:00
Marcin Maliszkiewicz
dd01852341 test: add test for live updates of permissions cache config 2025-06-18 11:27:08 +02:00
Botond Dénes
da1a3dd640 Merge 'test: introduce upgrade tests to test.py, add a SSTable dict compression upgrade test' from Michał Chojnowski
This PR adds an upgrade test for SSTable compression with shared dictionaries, and adds some bits to pylib and test.py to support that.

In the series, we:
1. Mount `$XDG_CACHE_DIR` into dbuild.
2. Add a pylib function which downloads and installs a released ScyllaDB package into a subdirectory of `$XDG_CACHE_DIR/scylladb/test.py`, and returns the path to `bin/scylla`.
3. Add new methods and params to the cluster manager, which let the test start nodes with historical Scylla executables, and switch executables during the test.
4. Add a test which uses the above to run an upgrade test between the released package and the current build.
5. Add `--run-internet-dependent-tests` to `test.py` which lets the user of `test.py` skip this test (and potentially other internet-dependent tests in the future).

(The patch modifying `wait_for_cql_and_get_hosts` is a part of the new test — the new test needs it to test how particular nodes in a mixed-version cluster react to some CQL queries.)

This is a follow-up to #23025, split into a separate PR because the potential addition of upgrade tests to `test.py` deserved a separate thread.

Needs backport to 2025.2, because that's where the tested feature is introduced.

Fixes #24110

Closes scylladb/scylladb#23538

* github.com:scylladb/scylladb:
  test: add test_sstable_compression_dictionaries_upgrade.py
  test.py: add --run-internet-dependent-tests
  pylib/manager_client: add server_switch_executable
  test/pylib: in add_server, give a way to specify the executable and version-specific config
  pylib: pass scylla_env environment variables to the topology suite
  test/pylib: add get_scylla_2025_1_executable()
  pylib/scylla_cluster: give a way to pass executable-specific options to nodes
  dbuild: mount "$XDG_CACHE_HOME/scylladb"
2025-06-18 12:21:21 +03:00
Pavel Emelyanov
9aaa33c15a Merge 'main.cc: fix group0 shutdown order' from Petr Gusev
Applier fiber needs local storage, so before shutting down local storage we need to make sure that group0 is stopped.

We also improve the logs for the case when `gate_closed_exception` is thrown while a mutation is being written.

Fixes [scylladb/scylladb#24401](https://github.com/scylladb/scylladb/issues/24401)

Backport: no backport -- not safe and the problem is minor.

Closes scylladb/scylladb#24418

* github.com:scylladb/scylladb:
  storage_service: test_group0_apply_while_node_is_being_shutdown
  main.cc: fix group0 shutdown order
  storage_proxy: log gate_closed_exception
2025-06-16 09:32:34 +03:00
Piotr Dulikowski
238fc24800 Merge 'test: dtest: move audit_test.py to test.py' from Andrzej Jackowski
Copied the entire audit_test.py from scylladb/scylla-dtest, to remove the entire file from scylla-dtest after this patch series is merged. The motivation is to move entire audit testing to from dtests, to make it easier to maintain and more reliable.

After audit_test.py was moved from dtests to test.py, some issues that require fixing arose due to differences between the frameworks.

No backport, moving audit_test.py to test.py is a new testing effort.

Closes scylladb/scylladb#24231

* github.com:scylladb/scylladb:
  test: audit: filter out LOGIN and USE audit logs
  test: audit: remove require mark
  test: audit: wait until raft state is applied in test_permissions
  test: audit: fix problems in audit_test.py
  test: dtest: add dict support to populate in scylla_cluster.py
  test: dtest: copied get_node_ip from dtests to scylla_cluster.py
  test: dtest: copy run_rest_api from dtests to cluster.py
  test: dtest: copy run_in_parallel from dtests to data.py
  test: audit: copy unmodified audit_test.py from dtests
2025-06-12 09:03:45 +02:00
Tomasz Grabiec
eabc1fa6ff Merge 'tablets: deallocate storage state on end_migration' from Michael Litvak
When a tablet is migrated and cleaned up, deallocate the tablet storage
group state on `end_migration` stage, instead of `cleanup` stage:

* When the stage is updated from `cleanup` to `end_migration`, the
  storage group is removed on the leaving replica.
* When the table is initialized, if the tablet stage is `end_migration`
  then we don't allocate a storage group for it. This happens for
  example if the leaving replica is restarted during tablet migration.
  If it's initialized in `cleanup` stage then we allocate a storage
  group, and it will be deallocated when transitioning to
  `end_migration`.

This guarantees that the storage group is always deallocated on the
leaving replica by `end_migration`, and that it is always allocated if
the tablet wasn't cleaned up fully yet.

It is a similar case also for the pending replica when the migration is
aborted. We deallocate the state on `revert_migration` which is the
stage following `cleanup_target`.

Previously the storage group would be allocated when the tablet is
initialized on any of the tablet replicas - also on the leaving replica,
and when the tablet stage is `cleanup` or `end_migration`, and
deallocated during `cleanup`.

This fixes the following issue:

1. A migrating tablet enters cleanup stage
2. the tablet is cleaned up successfuly
3. The leaving replica is restarted, and allocates storage group
4. tablet cleanup is not called because it's already cleaned up
5. the storage group remains allocated on the leaving replica after the
   migration is completed - it's not cleaned up properly.

Fixes https://github.com/scylladb/scylladb/issues/23481

backport to all relevant releases since it's a bug that results in a crash

Closes scylladb/scylladb#24393

* github.com:scylladb/scylladb:
  test/cluster/test_tablets: test restart during tablet cleanup
  test: tablets: add get_tablet_info helper
  tablets: deallocate storage state on end_migration
2025-06-11 17:37:02 +02:00
Aleksandra Martyniuk
83c9af9670 test: add test for repair and resize finalization
Add test that checks whether repair does not start if there is an
ongoing resize finalization.
2025-06-11 16:17:39 +02:00
Gleb Natapov
a9e99d1d3c topology coordinator: allow running multiple global commands in parallel
Now that we have a global request queue do not check that there is
global request before adding another one. Amend truncation test that
expects it explicitly and add another one that checks that two truncates
can be submitted in parallel.
2025-06-11 11:29:33 +03:00
Andrzej Jackowski
e23d79cb62 test: audit: filter out LOGIN and USE audit logs
LOGIN entries can appear at many points during testing, for example,
when a driver creates a new session. Similarly, `USE ks` statements
can appear unexpectedly, especially when the python-driver calls
`set_keyspace_async` for new connections.

To avoid test checks failures,
this commit filters out LOGIN and USE entries in tests that are
not intended to verify these two types of audit logs.
2025-06-11 09:43:51 +02:00
Andrzej Jackowski
876eaf459b test: audit: remove require mark
After moving audit tests to dtests, require marks are no longer
needed because the tests and the code are in the same repository.
2025-06-11 09:43:51 +02:00
Marcin Maliszkiewicz
111cccf8ba test: audit: wait until raft state is applied in test_permissions
Otherwise test is flaky, expecting permissions to be
enforced before they get applied.
2025-06-11 09:43:51 +02:00
Andrzej Jackowski
6c6234979c test: audit: fix problems in audit_test.py
After audit_test.py was moved from dtests to test.py, the
following issues arose due to differences between the frameworks:
  - Some imports were unnecessary or broken
  - The @pytest.mark.dtest_full decorator was no longer needed
  - The `issue_open` attribute in `xmark` is not supported
  - Support for sending SIGHUP is encapsulated
    by `server_update_config` in test.py`
  - A workaround for scylladb#24473 was required

Moreover, suite.yaml was changed to start running audit_test.py
in dev mode.

Ref. scylladb#24473

Co-authored-by: Marcin Maliszkiewicz <marcinmal@scylladb.com>
2025-06-11 09:43:44 +02:00
Petr Gusev
b1050944a3 storage_service: test_group0_apply_while_node_is_being_shutdown 2025-06-10 17:25:03 +02:00
Michael Litvak
bd88ca92c8 test/cluster/test_tablets: test restart during tablet cleanup
Add a test that reproduces issue scylladb/scylladb#23481.

The test migrates a tablet from one node to another, and while the
tablet is in some stage of cleanup - either before or right after,
depending on the parameter - the leaving replica, on which the tablet is
cleaned, is restarted.

This is interesting because when the leaving replica starts and loads
its state, the tablet could be in different stages of cleanup - the
SSTables may still exist or they may have been cleaned up already, and
we want to make sure the state is loaded correctly.
2025-06-09 17:27:45 +03:00
Michael Litvak
8aeb404893 test_cdc_generation_clearing: wait for generations to propagate
In test_cdc_generation_clearing we trigger events that update CDC
generations, verify the generations are updated as expected, and verify
the system topology and CDC generations are consistent on all nodes.

Before checking that all nodes are consistent and have the same CDC
generations, we need to consider that the changes are propagated through
raft and take some time to propagate to all nodes.

Currently, we wait for the change to be applied only on the first server
which runs the CDC generation publisher fiber and read the CDC
generations from this single node. The consistency check that follows
could fail if the change was not propagated to some other node yet.

To fix that, before checking consistency with all nodes, we execute a
read barrier on all nodes so they all see the same state as the leader.

Fixes scylladb/scylladb#24407

Closes scylladb/scylladb#24433
2025-06-09 12:59:04 +02:00
Michał Chojnowski
7d26d3c7cb db/config: add an option that disables dict-aware sstable compressors in DDL statements
For reasons, we want to be able to disallow dictionary-aware compressors
in chosen deployments.

This patch adds a knob for that. When the knob is disabled,
dictionary-aware compressors will be rejected in the validation
stage of CREATE and ALTER statements.

Closes scylladb/scylladb#24355
2025-06-09 13:30:40 +03:00
Raphael S. Carvalho
2d716f3ffe replica: Fix truncate assert failure
Truncate doesn't really go well with concurrent writes. The fix (#23560) exposed
a preexisting fragility which I missed.

1) truncate gets RP mark X, truncated_at = second T
2) new sstable written during snapshot or later, also at second T (difference of MS)
3) discard_sstables() get RP Y > saved RP X, since creation time of sstable
with RP Y is equal to truncated_at = second T.

So the problem is that truncate is using a clock of second granularity for
filtering out sstables written later, and after we got low mark and truncate time,
it can happen that a sstable is flushed later within the same second, but at a
different millisecond.
By switching to a millisecond clock (db_clock), we allow sstables written later
within the same second from being filtered out. It's not perfect but
extremely unlikely a new write lands and get flushed in the same
millisecond we recorded truncated_at timepoint. In practice, truncate
will not be used concurrently to writes, so this should be enough for
our tests performing such concurrent actions.
We're moving away from gc_clock which is our cheap lowres_clock, but
time is only retrieved when creating sstable objects, which frequency of
creation is low enough for not having significant consequences, and also
db_clock should be cheap enough since it's usually syscall-less.

Fixes #23771.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#24426
2025-06-08 15:59:15 +03:00
Andrzej Jackowski
e6eb741e95 test: dtest: add dict support to populate in scylla_cluster.py
Co-authored-by: Evgeniy Naydanov <evgeniy.naydanov@scylladb.com>
2025-06-05 08:20:09 +02:00
Andrzej Jackowski
e3f052d6fb test: dtest: copied get_node_ip from dtests to scylla_cluster.py
Co-authored-by: Marcin Maliszkiewicz <marcinmal@scylladb.com>
2025-06-05 08:20:09 +02:00
Andrzej Jackowski
40e71ad1e6 test: dtest: copy run_rest_api from dtests to cluster.py
Co-authored-by: Marcin Maliszkiewicz <marcinmal@scylladb.com>
2025-06-05 08:20:09 +02:00
Andrzej Jackowski
3da86f04a5 test: dtest: copy run_in_parallel from dtests to data.py
Co-authored-by: Marcin Maliszkiewicz <marcinmal@scylladb.com>
2025-06-05 08:19:54 +02:00
Andrzej Jackowski
a1b1d810f9 test: audit: copy unmodified audit_test.py from dtests
Copied the entire audit_test.py from scylladb/scylla-dtest, to remove
the entire file from scylla-dtest after this patch series is merged.
The motivation is to move entire audit testing to from dtests,
to make it easier to maintain and more reliable.

Changed suite.yaml, to prevent audit_test.py from running because
audit_test.py needs improvement before it starts passing.

Co-authored-by: Marcin Maliszkiewicz <marcinmal@scylladb.com>
2025-06-05 08:19:44 +02:00
Pavel Emelyanov
24f430c6d2 Merge 'test.py: dtest: port next_gating tests from auth_roles_test.py' from Evgeniy Naydanov
Copy `auth_roles_test.py` from scylla-dtest test suite, remove all not next_gating tests from it, and make it works with `test.py`

As a part of the porting process, copy missed utility functions from scylla-dtest, remove unused imports and markers.

Enable the test in `suite.yaml` (run in dev mode only.)

Closes scylladb/scylladb#24343

* github.com:scylladb/scylladb:
  test.py: dtest: make auth_roles_test.py run using test.py
  test.py: dtest: add wait_for_any_log() to tools/log_utils.py
  test.py: dtest: add part of tools/assertions.py
  test.py: dtest: pickup latest code for retrying.py from dtest
  test.py: dtest: copy unmodified auth_roles_test.py
2025-06-03 18:54:47 +03:00
Patryk Jędrzejczak
8756c233e0 test: test_raft_recovery_user_data: disable hinted handoff
The test is currently flaky, writes can fail with "Too many in flight
hints: 10485936". See scylladb/scylladb#23565 for more details.

We suspect that scylladb/scylladb#23565 is caused by an infrastructure
issue - slow disks on some machines we run CI jobs on.

Since the test fails often and investigation doesn't seem to be easy,
we first deflake the test in this patch by disabling hinted handoff.

For replacing nodes, we provide `cfg` because there should have been
`cfg` in the first place. The test was correct anyway because:
- `tablets_mode_for_new_keyspaces` is set to `true` by default in
  test/cluster/suite.yaml,
- `endpoint_snitch` is set to `GossipingPropertyFileSnitch` by default
  if the property file is provided in `ScyllaServer.__init__`.

Ref scylladb/scylladb#23565

We should backport this patch to 2025.2 because this test is also flaky
on CI jobs using 2025.2. Older branches don't have this test.

Closes scylladb/scylladb#24364
2025-06-03 17:48:42 +02:00
Michał Chojnowski
dd878505ca test: add test_sstable_compression_dictionaries_upgrade.py 2025-06-02 15:49:29 +02:00
Michał Chojnowski
d3cb873532 test.py: add --run-internet-dependent-tests
Later, we will add upgrade tests, which need to download the previous
release of Scylla from the internet.

Internet access is a major dependency, so we want to make those tests
opt-in for now.
2025-06-02 15:49:29 +02:00
Evgeniy Naydanov
e780164a67 test.py: dtest: make auth_roles_test.py run using test.py
As a part of the porting process, remove unused imports and
markers, remove non-next_gating tests, and code for old
ScyllaDB versions.

Enable the test in suite.yaml (run in dev mode only)
2025-06-02 05:14:41 +00:00
Evgeniy Naydanov
145c2fed97 test.py: dtest: add wait_for_any_log() to tools/log_utils.py
Copy wait_for_any_log() function from dtest tools/log_utils.py
with few modifications:

 - Add type hints;
 - Change timeout for node.watch_log_for() calls from 0 to 0.1
   because dtest shim's implementation uses asyncio.timeout()
   and 0 means not "one time" but "never run";
 - Use set() instead of list() for `ret` variable;
 - Remove redundant `found` variable.
 - Remove `remaining` variable and use shallow copies to make
   the code more correct.  As a side effect this makes the
   TimeoutError message more correct too;
 - Use f-string formatting for TimeoutError message;
2025-06-02 05:14:41 +00:00