Commit Graph

10937 Commits

Author SHA1 Message Date
Avi Kivity
c331796d28 Merge 'Support Min < Precision for approx_exponential_histogram' from Amnon Heiman
This series closes a gap in the approx_exponential_histogram implementation to
cover integer values starting from small Min values.

While the original implementation was focused on durations, where this limitation
was not an issue, over time, there has been a growing need for histograms that
cover smaller values, such as the number of SSTables or the number of items in a
batch.

The reason for the original limitation is inherent to the exponential histogram
math. The previous code required Min to be at least Precision to avoid negative
bit shifts in the exponential calculations.

After this series, approx_exponential_histogram allows Min to be smaller than
Precision by scaling values during indexing. The value is shifted left by
log2 Precision minus log2 Min or zero whichever is larger, and the existing
exponential math is applied. Bucket limits are then scaled back to the original
units. This keeps insertion and retrieval O(1) without runtime branching, at the
cost of repeated bucket limits for some values in the Min to Precision range.

Additional tests cover the new behavior.
Relates to #2785

** New feature, no need to backport. **

Closes scylladb/scylladb#28371

* github.com:scylladb/scylladb:
  estimated_histogram_test.cc: add to_metrics_histogram test
  histogram_metrics_helper.hh: Support Min < Precision
  estimated_histogram_test.cc: Add tests for approx_exponential_histogram with Min<Precision
  estimated_histogram.hh: support Min less than Precision histograms
2026-03-04 12:43:26 +02:00
Szymon Malewski
4c4673e8f9 test: vector_similarity: Fix similarity value checks
`isclose` function checks if returned similarity floats are close enough to expected value, but it doesn't `assert` by itself.
Several tests missed that `assert`, effectively always passing.
With this patch similarity values checks are wrapped in helper function `assert_similarity` with predefined tolerance.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-877

Closes scylladb/scylladb#28748
2026-03-04 09:53:32 +01:00
Marcin Maliszkiewicz
c7d3f80863 Merge 'auth: do not create default 'cassandra:cassandra' superuser' from Dario Mirovic
This patch series removes creation of default 'cassandra:cassandra' superuser on system start.

Disable creation of a superuser with default 'cassandra:cassandra' credentials to improve security. The current flow requires clients to create another superuser and then drop the default `cassandra:cassandra' role. For those who do, there is a time window where the default credentials exist. For those who do not, that role stays. We want to improve security by forcing the client to either use config to specify default values for default superuser name and password or use cqlsh over maintenance socket connection to explicitly create/alter a superuser role.

The patch series:
- Enable role modification over the maintenance socket
- Stop using default 'cassandra' value for default superuser, skipping creation instead

Design document: https://scylladb.atlassian.net/wiki/spaces/RND/pages/165773327/Drop+default+cassandra+superuser

Fixes scylladb/scylla-enterprise#5657

This is an improvement. It does not need a backport.

Closes scylladb/scylladb#27215

* github.com:scylladb/scylladb:
  config: enable maintenance socket in workdir by default
  docs: auth: do not specify password with -p option
  docs: update documentation related to default superuser
  test: maintenance socket role management
  test: cluster: add logs to test_maintenance_socket.py
  test: pylib: fix connect_driver handling when adding and starting server
  auth: do not create default 'cassandra:cassandra' superuser
  auth: remove redundant DEFAULT_USER_NAME from password authenticator
  auth: enable role management operations via maintenance socket
  client_state: add has_superuser method
  client_state: add _bypass_auth_checks flag
  auth: let maintenance_socket_role_manager know if node is in maintenance mode
  auth: remove class registrator usage
  auth: instantiate auth service with factory functors
  auth: add service constructor with factory functors
  auth: add transitional.hh file
  service: qos: handle special scheduling group case for maintenance socket
  service: qos: use _auth_integration as condition for using _auth_integration
2026-03-04 09:43:57 +01:00
Avi Kivity
85bd6d0114 Merge 'Add multiple-shard persistent metadata storage for strongly consistent tables' from Wojciech Mitros
In this series we introduce new system tables and use them for storing the raft metadata
for strongly consistent tables. In contrast to the previously used raft group0 tables, the
new tables can store data on any shard. The tables also allow specifying the shard where
each partition should reside, which enables the tablets of strongly consistent tables to have
their raft group metadata co-located on the same shard as the tablet replica.

The new tables have almost the same schemas as the raft group0 tables. However, they
have an additional column in their partition keys. The additional column is the shard
that specifies where the data should be located. While a tablet and its corresponding
raft group server resides on some shard, it now writes and reads all requests to the
metadata tables using its shard in addition to the group_id.

The extra partition key column is used by the new partitioner and sharder which allow
this special shard routing. The partitioner encodes the shard in the token and the
sharder decodes the shard from the token. This approach for routing avoids any
additional lookups (for the tablet mapping) during operations on the new tables
and it also doesn't require keeping any state. It also doesn't interact negatively
with resharding - as long as tablets (and their corresponding raft metadata) occupy
some shard, we do not allow starting the node with a shard count lower than the
id of this shard. When increasing the shard count, the routing does not change,
similarly to how tablet allocation doesn't change.

To use the new tables, a new implementation of `raft::persistence` is added. Currently,
it's almost an exact copy of the `raft_sys_table_storage` which just uses the new tables,
but in the future we can modify it with changes specific to metadata (or mutation)
storage for strongly consistent tables. The new storage is used in the `groups_manager`,
which combined with the removal of some `this_shard_id() == 0` checks, allows strongly
consistent tables to be used on all shards.

This approach for making sure that the reads/writes to the new tables end up on the correct shards
won in the balance of complexity/usability/performance against a few other approaches we've considered.
They include:
1. Making the Raft server read/write directly to the database, skipping the sharder, on its shard, while using
the default partitioner/sharder. This approach could let us avoid changing the schema and there should be
no problems for reads and writes performed by the Raft server. However, in this approach we would input
data in tables conflicting with the placement determined by the sharder. As a result, any read going through
the sharder could miss the rows it was supposed to read. Even when reading all shards to find a specific value,
there is a risk of polluting the cache - the rows loaded on incorrect shards may persist in the cache for an unknown
amount of time. The cache may also mistakenly remember that a row is missing, even though it's actually present,
just on an incorrect shard.
Some of the issues with this approach could be worked around using another sharder which always returns
this_shard_id() when asked about a shard. It's not clear how such a sharder would implement a method like
`token_for_next_shard`, and how much simpler it would be compared to the current "identity" sharder.
2. Using a sharder depending on the current allocation of tablets on the node. This approach relies on the
knowledge of group_id -> shard mapping at any point in time in the cluster. For this approach we'd also
need to either add a custom partitioner which encodes the group_id in the token, or we'd need to track the
token(group_id) -> shard mapping. This approach has the benefit over the one used in the series of keeping
the partition key as just group_id. However, it requires more logic, and the access to the live state of the node
in the sharder, and it's not static - the same token may be sharded differently depending on the state of the
node - it shouldn't occur in practice, but if we changed the state of the node before adjusting the table data,
we would be unable to access/fix the stale data without artificially also changing the state of the node.
3. Using metadata tables co-located to the strongly consistent tables. This approach could simplify the
metadata migrations in the future, however it would require additional schema management of all co-located
metadata tables, and it's not even obvious what could be used as the partition key in these tables - some
metadata is per-raft-group, so we couldn't reuse the partition key of the strongly consistent table for it. And
finding and remembering a partition key that is routed to a specific shard is not a simple task. Finally, splits
and merges will most likely need special handling for metadata anyway, so we wouldn't even make use of
co-located table's splits and merges.

Fixes [SCYLLADB-361](https://scylladb.atlassian.net/browse/SCYLLADB-361)

[SCYLLADB-361]: https://scylladb.atlassian.net/browse/SCYLLADB-361?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Closes scylladb/scylladb#28509

* github.com:scylladb/scylladb:
  docs: add strong consistency doc
  test/cluster: add tests for strongly-consistent tables' metadata persistence
  raft: enable multi-shard raft groups for strongly consistent tablets
  test/raft: add unit tests for raft_groups_storage
  raft: add raft_groups_storage persistence class
  db: add system tables for strongly consistent tables' raft groups
  dht: add fixed_shard_partitioner and fixed_shard_sharder
  raft: add group_id -> shard mapping to raft_group_registry
  schema: add with_sharder overload accepting static_sharder reference
2026-03-04 08:55:43 +02:00
Piotr Dulikowski
2fb981413a Merge 'vector_search: test: fix HTTPS client test flakiness' from Karol Nowacki
The default 100ms timeout for client readiness in tests is too
aggressive. In some test environments, this is not enough time for
client creation, which involves address resolution and TLS certificate
reading, leading to flaky tests.

This commit increases the default client creation timeout to 10 seconds.
This makes the tests more robust, especially in slower execution
environments, and prevents similar flakiness in other test cases.

Fixes: VECTOR-547, SCYLLADB-802, SCYLLADB-825, SCYLLADB-826

Backport to 2025.4 and 2026.1, as the same problem occurs on these branches and can potentially make the CI flaky there as well.

Closes scylladb/scylladb#28846

* github.com:scylladb/scylladb:
  vector_search: test: include ANN error in assertion
  vector_search: test: fix HTTPS client test flakiness
2026-03-04 08:55:43 +02:00
Botond Dénes
f156bcddab Merge 'test: decrease strain in test_startup_response' from Marcin Maliszkiewicz
For 2025.3 and 2025.4 this test runs order of magnitude
slower in debug mode. Potentially due to passwords::check
running in alien thread and overwhelming the CPU (this is
fixed in newer versions).

Decreasing the number of connections in test makes it fast
again, without breaking reproducibility.

As additional measure we double the timeout.

The fix is now cherry-picked to master as sometimes
test fails there too.

(cherry picked from commit 1f1fc2c2ac)

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-795

backport: 2026.1, already on other stable branches

Closes scylladb/scylladb#28848

* github.com:scylladb/scylladb:
  test: add more logs to test_startup_no_auth_response
  test: decrease strain in test_startup_response
2026-03-04 08:55:43 +02:00
Dario Mirovic
3db74aaf5f test: maintenance socket role management
Introduce a test that cover:
- Server startup without credentials config seeding with no roles created
- Await maintenance socket role management to be enabled
- `CREATE ROLE`, `ALTER ROLE`, and `DROP ROLE` statement execution success

All the tests in the test_maintenance_socket.py module take 2-3 seconds
to execute.

Explicitly shut down Cluster objects to prevent 'RuntimeError: cannot
schedule new futures after shutdown'.

Refs SCYLLADB-409
2026-03-03 23:57:50 +01:00
Dario Mirovic
f74fe22386 test: cluster: add logs to test_maintenance_socket.py
Add logs to test_maintenance_socket.py test test_maintenance_socket.
This approach offers additional visibility in case of test failure.
Such logs will be added to new tests in a follow up patch in this
patch series.

Refs SCYLLADB-409
2026-03-03 23:42:25 +01:00
Dario Mirovic
0e5ddec2a8 test: pylib: fix connect_driver handling when adding and starting server
When connect_driver=False, the expected server up state should be
capped to HOST_ID_QUERIED. This is to avoid waiting for CQL readiness,
which requires a superuser to be present.

This logic was only in ScyllaCluster.server_start. ManagerClient.server_add
with start=True and connect_driver=False would still wait for CQL and hang
if no superuser is present. The workaround was to call
ManagerClient.server_add(start=False, connect_driver=False) followed by
ManagerClient.server_start(connect_driver=False).

This patch moves the capping from ScyllaCluster.server_start to
ManagerClient.server_add and ManagerClient.server_start, where connect_driver
is processed. ScyllaCluster only receives the already resolved
expected_server_up_state value.

Refs SCYLLADB-409
2026-03-03 23:42:25 +01:00
Dario Mirovic
fd17dcbec8 auth: do not create default 'cassandra:cassandra' superuser
Changes the behavior of default superuser creation.
Previously, without configuration 'cassandra:cassandra' credentials
were used. Now default superuser creation is skipped if not configured.

The two ways to create default superuser are:
- Config file - auth_superuser_name and auth_superuser_salted_password fields
- Maintenance socket - connect over maintenance socket and CREATE/ALTER ROLE ...

Behavior changes:

Old behavior:
- No config - 'cassandra:cassandra' created
- auth_superuser_name only - <name>:cassandra created
- auth_superuser_salted_password only - 'cassandra:<password>' created
- Both specified - '<name>:<password>' created

New behavior:
- No config - no default superuser
    - Requires maintenance socket setup
- auth_superuser_name only - '<name>:' created WITHOUT password
    - Requires maintenance socket setup
- auth_superuser_salted_password only - no default superuser
- Both specified - '<name>:<password>' created

Fixes SCYLLADB-409
2026-03-03 23:42:25 +01:00
Dario Mirovic
3bef493a35 auth: remove class registrator usage
This patch removes class registrator usage in auth module.
It is not used after switching to factory functor initialization
of auth service.

Several role manager, authenticator, and authorizer name variables
are returned as well, and hardcoded inside qualified_java_name method,
since that is the only place they are ever used.

Refs SCYLLADB-409
2026-03-03 22:31:35 +01:00
Dario Mirovic
eab24ff3b0 auth: instantiate auth service with factory functors
Auth service is instantiated with the constructor that accepts
service_config, which then uses class registrator to instantiate
authorizer, authenticator, and role manager.

This patch switches to instantiating auth service via the constructor
that accepts factory functors. This is a step towards removing
usage of class registrator.

Refs SCYLLADB-409
2026-03-03 22:31:35 +01:00
Karol Nowacki
45477d9c6b vector_search: test: include ANN error in assertion
When the test fails, the assertion message does not include
the error from the ANN request.

This change enhances the assertion to include the specific ANN error,
making it easier to diagnose test failures.
2026-03-03 14:19:20 +01:00
Karol Nowacki
ab6c222fc4 vector_search: test: fix HTTPS client test flakiness
The default 100ms timeout for client readiness in tests is too
aggressive. In some test environments, this is not enough time for
client creation, which involves address resolution and TLS certificate
reading, leading to flaky tests.

This commit increases the default client creation timeout to 10 seconds.
This makes the tests more robust, especially in slower execution
environments, and prevents similar flakiness in other test cases.

Fixes: VECTOR-547, SCYLLADB-802
2026-03-03 14:19:20 +01:00
Artsiom Mishuta
5c84a76b28 test.py: setup pytest logger
This commit introduces pure pytest logging into a file

Previously, test.py used pytest as a script(not a framework) and just captured pytest stdout and logged this data by itself

This commit sets up the log files format that additionaly display Python processName, threadName adn taskName because test.py test cases use them, and now it is so hard to investigate issues that are connected with parallelism inside test case themselve

In addition, commit splits the logging of different pytest workers(xdist) into different files. If pytest workers have ho failed test - log file for these workers will be deleted

There is also additional logging for failures that will contain a separate file per test failure and contain the error itself (stacktrace) and all capture logs from stdout, stderr during the test run. With --save-log-on-success it will be a separate file per test on pass as well

All this new functionality works with the new xdit scheduler (--test-py-init=True)

Fixes SCYLLADB-713

Closes scylladb/scylladb#28705
2026-03-03 11:49:01 +01:00
Dimitrios Symonidis
80b74d7df2 tablet options: Add max_tablet_count tablet option to enforce tablet count upper bounds
Introduced a new max_tablet_count tablet option that caps the maximum number of tablets a table can have. This feature is designed primarily for backup and restore workflows.
During backup, when load balancing is disabled for snapshot consistency, the current tablet count is recorded in the backup manifest.
During restore, max_tablet_count is set to this recorded value, ensuring the restored table's tablet count never exceeds the original snapshot's tablet distribution.
This guarantee enables efficient file-based SSTable streaming during restore, as each SSTable remains fully contained within a single tablet boundary.

Closes scylladb/scylladb#28450
2026-03-03 11:19:24 +03:00
Karol Nowacki
30487e8854 index: fix vector index with filtering target column
The secondary index mechanism is currently used to determine the target column.
This mechanism works incorrectly for vector indexes with filtering because
it returns the last specified column as the target (vectors) column.
However, the syntax for a vector index requires the first column to be the target:
```
CREATE CUSTOM INDEX ON t(vectors, users) USING 'vector_index';
```

This discrepancy eventually leads to the following exception when performing an
ANN search on a vector index with filtering columns:
````
ANN ordering by vector requires the column to be indexed using 'vector_index'
````

This commit fixes the issue by introducing dedicated logic for vector indexes
to correctly identify the target(vectors) column.

Fixes: SCYLLADB-635

Closes scylladb/scylladb#28740
2026-03-02 18:47:58 +02:00
Botond Dénes
ab532882db tools/scylla-sstable: introduce scylla sstable split
Split input sstable(s) into multiple output sstables based on the provided
token boundaries. The input sstable(s) are divided according to the specified
split tokens, creating one output sstable per token range.

Fixes: SCYLLADB-10

Closes scylladb/scylladb#28741
2026-03-02 15:19:17 +01:00
Marcin Maliszkiewicz
d95939d69a test: add more logs to test_startup_no_auth_response
When test fails with assert connections_observed
we would like to know if it was unable to connect
or execute query in attempt_good_connection
2026-03-02 14:53:46 +01:00
Marcin Maliszkiewicz
91126eb2fb test: decrease strain in test_startup_response
For 2025.3 and 2025.4 this test runs order of magnitude
slower in debug mode. Potentially due to passwords::check
running in alien thread and overwhelming the CPU (this is
fixed in newer versions).

Decreasing the number of connections in test makes it fast
again, without breaking reproducibility.

As additional measure we double the timeout.

The fix is now cherry-picked to master as sometimes
test fails there too.

(cherry picked from commit 1f1fc2c2ac)
2026-03-02 14:46:51 +01:00
Marcin Maliszkiewicz
6bf706ef1b Merge 'scylla-sstable: query: handle nested UDTs' from Botond Dénes
The query (and in certain modes the write) operations uses virtual table facility inside `cql_test_env`. The schema of the sstable is created as a table in `cql_test_env`. This involves registering all UDTs with the keyspace, so they are available for lookups.
This was done with a flat loop over all column types, but this is not enough. UDTs might be nested in other types, like collections. One has to do a traversal of the type tree and register every UDT on the way.
This PR changes the flat loop to a recursive traversal of the type tree. The query operation now works with UDTs, no matter how deeply nested they are.

Backport: Implements missing functionality of a tool, no backport.

Closes scylladb/scylladb#28798

* github.com:scylladb/scylladb:
  tools/scylla-sstable: create_table_in_cql_env(): register UDTs recursively
  tools/scylla-sstable: generalize dump_if_user_type
  tools/scylla-sstable: move dump_if_user_type() definition
2026-03-02 14:14:43 +01:00
Marcin Maliszkiewicz
a83ee6cf66 Merge 'db/batchlog_manager: re-add v1 support for mixed clusters' from Botond Dénes
3f7ee3ce5d introduced system.batchlog_v2, with a schema designed to speed up batchlog replays and make post-replay cleanups much more effective.
It did not introduce a cluster feature for the new table, because it is node local table, so the cluster can switch to the new table gradually, one node at a time.
However, https://github.com/scylladb/scylladb/issues/27886 showed that the switching causes timeouts during upgrades, in mixed clusters. Furthermore, switching to the new table unconditionally  on upgrades nodes, means that on rollback, the batches saved into the v2 table are lost.
This PR introduces re-introduces v1 (`system.batchlog`) support and guards the use of the v2 table with a cluster feature, so mixed clusters keep using v1 and thus be rollback-compatible.
The re-introduced v1 support doesn't support post-replay cleanups for simplicity. The cleanup in v1 was never particularly effective anyway and we ended up disabling it for heavy batchlog users, so I don't think the lack of support for cleanup is a problem.

Fixes: https://github.com/scylladb/scylladb/issues/27886

Needs backport to 2026.1, to fix upgrades for clusters using batches

Closes scylladb/scylladb#28736

* github.com:scylladb/scylladb:
  test/boost/batchlog_manager_test: add tests for v1 batchlog
  test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2
  test/boost/batchlog_manager_test: fix indentation
  test/boost/batchlog_manager_test: extract prepare_batches() method
  test/lib/cql_assertions: is_rows(): add dump parameter
  tools/scylla-sstable: extract query result printers
  tools/scylla-sstable: add std::ostream& arg to query result printers
  repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log
  db/batchlog_manager: re-add v1 support
  db/batchlog_manager: return all_replayed from process_batch()
  db/batchlog_manager: process_bath() fix indentation
  db/batchlog_manager: make batch() a standalone function
  db/batchlog_manager: make structs stats public
  db/batchlog_manager: allocate limiter on the stack
  db/batchlog_manager: add feature_service dependency
  gms/feature_service: add batchlog_v2 feature
2026-03-02 12:09:10 +01:00
Patryk Jędrzejczak
ba7f314cdc test: test_full_shutdown_during_replace: retry replace after the replacing node is removed from gossip
The test is currently flaky with `reuse_ip = True`. The issue is that the
test retries replace before the first replace is rolled back and the
first replacing node is removed from gossip. The second replacing node
can see the entry of the first replacing node in gossip. This entry has
a newer generation than the entry of the node being replaced, and both
replacing nodes have the same IP as the node being replaced. Therefore,
the second replacing node incorrectly considers this entry as the entry
of the node being replaced. This entry is missing rack and DC, so the
second replace fails with
```
ERROR 2026-02-24 21:19:03,420 [shard 0:main] init - Startup failed:
std::runtime_error (Cannot replace node
8762a9d2-3b30-4e66-83a1-98d16c5dd007/127.61.127.1 with a node on
a different data center or rack.
Current location=UNKNOWN_DC/UNKNOWN_RACK, new location=dc1/rack2)
```

Fixes SCYLLADB-805

Closes scylladb/scylladb#28829
2026-03-02 10:26:57 +02:00
Michael Litvak
8c4bc33e51 test: remove test_view_building_with_tablet_move
remove the test since it's not relevant anymore, it's not testing what
it's supposed to test and it's unstable.

the purpose of the test was to reproduce an issue in the legacy view
builder where a view starts to build at token T2 and then all tokens
[T1, end) with T1<T2 migrate to another node while it's still building,
exposing an issue when the view builder wraparounds the token ring.

this is not relevant anymore because now view building with tablets is
done via the view building coordinator for tablets, and all views start
to build from the first token with no wraparound.

besides, the test is unstable due to relying too much on specific
timing, which was useful for investigating and fixing the original issue
but not anymore.

Fixes SCYLLADB-842

Closes scylladb/scylladb#28842
2026-03-02 07:42:08 +01:00
Marcin Maliszkiewicz
8c2da76fde test/cqlpy: remove xfail from test_constant_function_parameter
The issue was fixed by commit cc03f5c89d
("cql3: support literals and bind variables in selectors"), so the
xfail marker is no longer needed.

Closes scylladb/scylladb#28776
2026-03-01 20:03:42 +02:00
Botond Dénes
1f09fcfb26 Merge 'Use standard ks/cf/data creation methods in test_restore_with_streaming_scopes' from Pavel Emelyanov
The test uses create_dataset helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities.

Also the PR simplifies the 3-level nested loops used to combine several sets of restoration parameters by using itertools.product facility.

Continuation of #28600.

Cleaning tests, not backporting

Closes scylladb/scylladb#28608

* github.com:scylladb/scylladb:
  test/object_store: Use itertools.product() for deeply nested loops
  test/object_store: Replace dataset creation usage with standard methods
  test/object_store: Shift indentation right for test_restore_with_streaming_scopes
2026-02-27 16:15:55 +02:00
Avi Kivity
450a09b152 test: tools: restrict embedded perf tests from taking over host
The perf-simple-query tests were not restricted on CPU count,
so on a 96-CPU machine, they would run on 96 CPUs, and time out
in debug mode.

All restrict memory usage and add --overprovisioned so that
pinning is disabled. Apply that to all tests.

Closes scylladb/scylladb#28821
2026-02-27 16:06:22 +02:00
Botond Dénes
d3a3921487 Merge 'Re-use and improve the take_snapshot() helper in backup tests' from Pavel Emelyanov
The helper is very simple yet generic -- it takes a snapshot of a keyspace on all servers and collects the resulting sstables from workdirs. Re-using it in all test cases saves some lines of code. Also, the method is "sequential", making it "parallel" reduces the waiting time a bit.

Will help generalizing existing backup/restore tests to support clustered snapshot/backup/restore API (see #28525) later.

Cleaning up tests, not backporting.

Closes scylladb/scylladb#28660

* github.com:scylladb/scylladb:
  test/backup: Run keyspace flush and snapshot taking API in parallel
  test/backup: Re-use take_snapshot() helper in do_abort_restore()
  test/backup: Move take_snapshot() helper up
2026-02-27 15:58:18 +02:00
Patryk Jędrzejczak
9a9202c909 Merge 'Remove gossiper topology code' from Gleb Natapov
The PR removes most of the code that assumes that group0 and raft topology is not enabled. It also makes sure that joining a cluster in no raft mode or upgrading a node in a cluster that not yet uses raft topology to this version will fail.

Refs #15422

No backport needed since this removes functionality.

Closes scylladb/scylladb#28514

* https://github.com/scylladb/scylladb:
  group0: fix indentation after previous patch
  raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more
  raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream
  raft_group0: remove unused code from raft_group0
  node_ops: remove topology over node ops code
  topology: fix indentation after the previous patch
  topology: drop topology_change_enabled parameter from raft_group0 code
  storage_service: remove unused handle_state_* functions
  gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option
  storage_service: fix indentation after the last patch
  storage_service: remove gossiper bootstrapping code
  storage_service: drop get_group_server_if_raft_topolgy_enabled
  storage_service: drop is_topology_coordinator_enabled and its uses
  storage_service: drop run_with_api_lock_in_gossiper_mode_only
  topology: remove code that assumes raft_topology_change_enabled() may return false
  test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode
  test: schema_change_test: drop schema tests relevant for no raft mode only
  topology: remove upgrade to raft topology code
  group0: remove upgrade to group0 code
  group0: refuse to boot if a cluster is still is not in a raft topology mode
  storage_service: refuse to join a cluster in legacy mode
2026-02-27 14:43:41 +01:00
Botond Dénes
9521a51e4c Merge 'generic_server: scale connection concurrency semaphore by listener count' from Marcin Maliszkiewicz
The concurrency semaphore gates uninitialized connections across all
do_accepts loops, but was initialized to a fixed value regardless of
how many listeners exist. With multiple listeners competing for the
same units, each effectively gets less than the configured concurrency.

Initialize the semaphore to concurrency - 1 and signal 1 per listen()
call, so total capacity is concurrency - 1 + nr_listeners. This
guarantees each listener's accept loop can have at least one unit
available.

It mainly fixes problem when setting uninitialized_connections_semaphore_cpu_concurrency
config value to 1 would result in not being able to process connections, as only 1 out of 2
listeners got the semaphore.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-762
Backport: no, it's a minor problem

Closes scylladb/scylladb#28747

* github.com:scylladb/scylladb:
  test: add test_uninitialized_conns_semaphore
  generic_server: fix waiters count in shed log
  generic_server: scale connection concurrency semaphore by listener count
2026-02-27 15:06:50 +02:00
Łukasz Paszkowski
bb57b0f3b7 compaction_manager: fix maybe_wait_for_sstable_count_reduction() hanging forever
The futurization refactoring in 9d3755f276 ("replica: Futurize
retrieval of sstable sets in compaction_group_view") changed
maybe_wait_for_sstable_count_reduction() from a single predicated
wait:
```
    co_await cstate.compaction_done.wait([..] {
        return num_runs_for_compaction() <= threshold
            || !can_perform_regular_compaction(t);
    });
```
to a while loop with a predicated wait:
```
    while (can_perform_regular_compaction(t)
           && co_await num_runs_for_compaction() > threshold) {
        co_await cstate.compaction_done.wait([this, &t] {
            return !can_perform_regular_compaction(t);
        });
    }
```

This was necessary because num_runs_for_compaction() became a
coroutine (returns future<size_t>) and can no longer be called
inside a condition_variable predicate (which must be synchronous).

However, the inner wait's predicate — !can_perform_regular_compaction(t)
— only returns true when compaction is disabled or the table is being
removed. During normal operation, every signal() from compaction_done
wakes the waiter, the predicate returns false, and the waiter
immediately goes back to sleep without ever re-checking the outer
while loop's num_runs_for_compaction() condition.

This causes memtable flushes to hang forever in
maybe_wait_for_sstable_count_reduction() whenever the sstable run
count exceeds the threshold, because completed compactions signal
compaction_done but the signal is swallowed by the predicate.

Fix by replacing the predicated wait with a bare wait(), so that
any signal (including from completed compactions) causes the outer
while loop to re-evaluate num_runs_for_compaction().

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-610

Closes scylladb/scylladb#28801
2026-02-26 20:13:50 +02:00
Marcin Maliszkiewicz
a03ebe1a29 Merge 'cql: implement a new per-row TTL feature' from Nadav Har'El
This series implements a new per-row TTL feature for CQL. The per-row TTL feature was requested in issue #13000. It is a feature that does not exist in Cassandra, and was inspired by DynamoDB's TTL feature - and under the hood uses the same implementation that we used in Alternator to implement this DynamoDB feature.

The new per-row TTL feature is completely separate from CQL's existing per-write (and per-cell) TTL, and both will be available to users.

In the per-row TTL feature, one column in the table is designated as the "TTL" column, and its value for a row is the expiration time for that row. The TTL column can be designated at table creation time, e.g.:

```cql
CREATE TABLE tab (
    id int PRIMARY KEY,
    t text,
    expiration timestamp TTL
);
```

Or after the table already exists with:

```cql
ALTER TABLE tab TTL expiration
```

Expiration can also be disabled, with:

```cql
ALTER TABLE tab TTL NULL
```

The new per-row TTL feature has two features that users have been asking for:

1. A user can change the value of just the TTL column - without rewriting the entire row - to change the expiration time of the entire row.
2. When an expired row is finally deleted, a CDC event about this deletion appears in the CDC log (if CDC is enabled), including - if a preimage is enabled - the content of the deleted row.

To achieve the second goal (CDC events), a row is not guaranteed to disappear at exactly its expiration time (as CQL's original TTL feature guarantees). Rather, the row is deleted some time later, depending on `alternator_ttl_period_in_seconds`; Until the actual deletion, the row is still readable (and even writable). But we are guaranteed that when the row is finally deleted, the CDC event will come too.

The implementation uses the same background thread used by Alternator to periodically scan for expired items and delete them.

The expiration thread keeps the same metrics as it did for Alternator:
* `scylla_expiration_scan_passes`
* `scylla_expiration_scan_table`
* `scylla_expiration_items_deleted`
* `scylla_expiration_secondary_ranges_scanned`

The series begins with a few small preparation patches, followed by the main part of the feature (which isn't big, since we are just enabling the pre-existing Alternator expiration machinary for CQL) and finally 30 tests (single-node and multi-node tests) and documentation.

This series is a new feature, so traditionally would not be backported. However, I wouldn't be surprised if we will be requested to backport it so that customers will not need to wait for a new major release.

Fixes #13000

Closes scylladb/scylladb#28320

* github.com:scylladb/scylladb:
  test/cqlpy: verify that a column can't be both STATIC and PRIMARY KEY
  docs/cql: document the new CQL per-row TTL feature
  test/cluster: tests for the new CQL per-row TTL feature
  test/cqlpy: tests for the new CQL per-row TTL feature
  test: set low alternator_ttl_period_in_seconds in CQL tests
  cql ttl: fix ALTER TABLE to disable TTL if column is dropped
  cql ttl: add setting/unsetting of TTL column to ALTER TABLE
  cql ttl: add TTL column support to CREATE TABLE and DESC TABLE
  ttl: add CQL support to Alternator's TTL expiration service
  alternator ttl: move TTL_TAG_KEY to a header file
  alternator ttl: remove unnecessary check of feature flag
  cql: add "cql_row_ttl" cluster feature
  alternator: fix error message if UpdateTimeToLive is not supported
2026-02-26 15:29:12 +01:00
Yaniv Michael Kaul
ead9961783 cql: vector: fix vector dimension type
Switch vector dimension handling to fixed-width `uint32_t` type,
update parsing/validation, and add boundary tests.

The dimension is parsed as `unsigned long` at first which is guaranteed
to be **at least** 32-bit long, which is safe to downcast to `uint32_t`.

Move `MAX_VECTOR_DIMENSION` from `cql3_type::raw_vector` to `cql3_type`
to ensure public visibility for checks outside the class.

Add tests to verify the type boundaries.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-223

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Co-authored-by: Dawid Pawlik <dawid.pawlik@scylladb.com>

Closes scylladb/scylladb#28762
2026-02-26 14:46:53 +02:00
Michael Litvak
4a60ee28a2 test/cqlpy/test_materialized_view.py: increase view build timeout
The test test_build_view_with_large_row creates a materialized view and
expects the view to be built with a timeout of 5 seconds. It was
observed to fail because the timeout is too short on slow machines.

Increase the timeout to 60 seconds to make the test less flaky on slow
machines. Similarly for the other tests in the file that have a timeout
for view build, increase the timeout to 60 seconds to be consistent and
safer.

Fixes SCYLLADB-769

Closes scylladb/scylladb#28817
2026-02-26 11:27:51 +02:00
Marcin Maliszkiewicz
30f18a91fd Merge 'dtest: wait_for speedup' from Dario Mirovic
Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second.

This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds.

Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes.

This patch also improves performance of some alternator tests that use the same wait_for dtest function.

`wait_for` in dtest framework has default time step reduced to make the environment more responsive and test execution faster.

Refs SCYLLADB-573

This is a performance improvement of testing framework. No need to backport.

Closes scylladb/scylladb#28590

* github.com:scylladb/scylladb:
  dtest: shorten default sleep step in wait_for
  dtest: wait_for speedup
2026-02-26 09:33:38 +01:00
Amnon Heiman
5db971c2f9 estimated_histogram_test.cc: add to_metrics_histogram test
Add a test that exercises to_metrics_histogram when Min is smaller than
Precision.

The test verifies duplicate integer bounds are collapsed,
counts remain cumulative, and native histogram metadata is still present
with the expected schema and min id.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2026-02-26 09:00:52 +02:00
Amnon Heiman
af6371c11f estimated_histogram_test.cc: Add tests for approx_exponential_histogram with Min<Precision
Add three test cases to verify the hybrid linear/exponential bucketing:
- test_histogram_min_1_bucket_limits: Validates bucket lower limits
- test_histogram_min_1_basic: Tests value insertion and bucket distribution
- test_histogram_min_1_statistics: Tests min(), max(), quantile(), and mean()
- test_histogram_min_2_precision_4: Test min == 2 and precision 4.

These tests cover the new Min<Precision mode with Precision=4, verifying both the
linear range and exponential range.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2026-02-26 08:53:06 +02:00
Nadav Har'El
1d265e7d6d test/cqlpy: verify that a column can't be both STATIC and PRIMARY KEY
While adding the new syntax "TTL" to CREATE TABLE, I noticed that the
parser actually allows a column to be defined as "STATIC PRIMARY KEY".

So I add here a small test to verify that this is not really allowed:
The syntax "c int STATIC PRIMARY KEY" is accepted, but then rejected
by a later check. The syntax "c int PRIMARY KEY STATIC" is rejected
as a syntax error.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:45 +02:00
Nadav Har'El
23ad0be034 test/cluster: tests for the new CQL per-row TTL feature
The previous patch added single-node functional tests (in test/cqlpy)
for everything which was possible to test on a single node. In this
patch we add four tests that we couldn't test on a single node, using
the test/cluster test framework:

1. Test that the TTL expiration work - both the scanning threads and
   the actual deletion work on all nodes - happens on the "streaming"
   scheduling group.

2. Test that even if one of the cluster's nodes is down, still all
   the items get expired - another node "takes over" the dead node's
   work.

3. Test that rolling upgrade works as designed for the CQL per-row TTL
   feature: Before every single node in the cluster is upgraded to
   support this feature, a TTL column cannot be enabled on a table.
   And as soon as the last node of the cluster is upgraded, the TTL
   feature begins to work completely (you don't need to reboot all
   the nodes again).

4. Test that expiration works correctly on a multi-DC setup. The test
   doesn't check the efficiency of this process - i.e., that today each
   DC scans part of the data, reading with LOCAL_QUORUM, and writing
   the deletions across the entire cluster. Rather, the test only
   verifies the correctness - that expired rows do get deleted -
   for the usual case the data across the DCs is consistent.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:44 +02:00
Nadav Har'El
7a1351c6cf test/cqlpy: tests for the new CQL per-row TTL feature
This patch contains 27 functional tests (in the test/cqlpy framework)
for the new CQL per-row TTL feature. The tests cover the TTL column
configuration statements (CREATE TABLE, ALTER TABLE) as well as the
actual item expiration or non-expiration depending on the value of
the expiration-time column - and also CDC events generated on expiration
and the metrics generated by the expiration process.

These tests were written together with the code, as in "test-driven
development", so they aim to cover every corner case considered during
the development, and they reproduce every bug and misstep seen during
the development process. As a result, they hopefully achieve very high
code coverage - but since we don't have a working code-coverage tool,
I can't report any specific code coverage numbers.

These tests check everything which we can check on single-node cluster.
The next patch will add additional multi-node tests for things we can't
check here with a single node - such as the scheduling group used by the
distributed work, the effect of dead nodes on the TTL functionality, and
the process of rolling upgrade.

The tests in this patch do NOT try to stress the background expiration
scanning threads, or to check how they handle topology changes, large
amounts of data or clusters spanning multiple DCs. These tests also don't
test the performance impact of these scanning threads. Because the
expiration scanning thread is identical to the one already used by
Alternator TTL, we assume that many of these aspects were already tested
for Alternator TTL and did not change when the same implementation is
used for the new CQL feature.

All new tests pass on ScyllaDB. Because the per-row TTL feature is
a new ScyllaDB feature that does not exist on Cassandra, all these
tests are skipped on Cassandra.

Because some of these tests involve waiting for expiration, they can't
be very quick. Still, because we set alternator_ttl_period_in_seconds
to 0.5 seconds in the test framework, all 27 tests running sequentially
finish in roughly 6 seconds total, which we consider acceptable.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:44 +02:00
Nadav Har'El
154cecda71 test: set low alternator_ttl_period_in_seconds in CQL tests
In test/alternator/run we set alternator_ttl_period_in_seconds to a very
low number (0.5 seconds) to allow TTL tests to expire items very quickly
and finish quickly.

Until now, we didn't need to do this for CQL tests, because they weren't
using this Alternator-only feature. Now that CQL uses the same expiration
feature with its original configuration parameter, we need to set it in
CQL tests too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2026-02-25 14:59:43 +02:00
Wojciech Mitros
97dc88d6b6 test/cluster: add tests for strongly-consistent tables' metadata persistence
In this patch we add various tests for checking how strongly consistent
tables work while allowing their tablets to reside on non-0 shards and
while using the new persistent storage for their raft metadata.
The tests verify that:
- strongly consistent tables' tablets can be allocated on different shards
and we can write/read from them
- the raft metadata is persistent across restarts even with disruptions
- the sharder correctly routes metadata queries to specified shards
- we can correctly perform multi-shard reads from the metadata tables
- we can read using just the group_id (without shard) using ALLOW FILTERING

For the tests we add logging to the sharder and partitioner and we add
some extra logs for observability.
2026-02-25 12:34:58 +01:00
Wojciech Mitros
ffe32e8e4d test/raft: add unit tests for raft_groups_storage
Most functions of the new storage for raft groups for strongly
consistent tables are the same as for the system raft table
storage, so we reuse the tests for them to test the new storage.

We add additional tests for checking the new raft groups partitioner
and sharder, and for verifying that writes using storages for different
shards do not affect the data read on different shards.

We also add a test for checking the snapshot_descriptor present after
the storage bootstrap - for both system and strongly consistent storages
we check that the storage contains the initial descriptor.
2026-02-25 12:34:58 +01:00
Botond Dénes
99244179f7 Merge 'CQL transport: Add histogram-based request/response size tracking' from Amnon Heiman
This series closes a gap in how CQL request and response sizes are reported.

Previously, request_size and response_size were tracked as simple counters,
providing only cumulative totals per shard. This made it difficult to understand
the distribution of message sizes and identify potential issues with very large
or very small requests.

After this series, the CQL transport reports detailed histogram metrics showing
the distribution of request and response sizes. These histograms are tracked
per-instance, per-type (per ops), and per-scheduling-group, providing
much better visibility into CQL traffic patterns.

The histograms are collected for QUERY, EXECUTE, and BATCH operations, which are
the primary data path operations where message size distribution is most relevant.
This data can help identify:
- Clients sending unexpectedly large requests
- Operations with oversized result sets
- Scheduling group differences in traffic patterns

To support this, the series extends the approx_exponential_histogram template to
handle accurate sum, adds a bytes_histogram type alias optimized for byte-range measurements (1KB to 1GB).

The existing per-shard counter metrics are maintained for backward compatibility.
Metrics example:
```
scylla_transport_cql_request_bytes{kind="BATCH",scheduling_group_name="sl:default",shard="0"} 129808
scylla_transport_cql_request_bytes{kind="EXECUTE",scheduling_group_name="sl:default",shard="0"} 227409
scylla_transport_cql_request_bytes{kind="PREPARE",scheduling_group_name="sl:default",shard="0"} 631
scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:default",shard="0"} 2809
scylla_transport_cql_request_bytes{kind="QUERY",scheduling_group_name="sl:driver",shard="0"} 4079
scylla_transport_cql_request_bytes{kind="REGISTER",scheduling_group_name="sl:default",shard="0"} 98
scylla_transport_cql_request_bytes{kind="STARTUP",scheduling_group_name="sl:driver",shard="0"} 432
scylla_transport_cql_request_histogram_bytes_sum{kind="QUERY",scheduling_group_name="sl:driver"} 4079
scylla_transport_cql_request_histogram_bytes_count{kind="QUERY",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1024.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2048.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4096.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8192.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16384.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="32768.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="65536.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="131072.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="262144.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="524288.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1048576.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="2097152.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="4194304.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="8388608.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="16777216.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="33554432.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="67108864.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="134217728.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="268435456.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="536870912.000000",scheduling_group_name="sl:driver"} 57
scylla_transport_cql_request_histogram_bytes_bucket{kind="QUERY",le="1073741824.000000",scheduling_group_name="sl:driver"} 57
```
**The field sees it as an important issue**

Fixes #14850

Closes scylladb/scylladb#28419

* github.com:scylladb/scylladb:
  test/boost/estimated_histogram_test.cc: Switch to real Sum
  transport/server: to bytes_histogram
  approx_exponential_histogram: Add sum() method for accurate value tracking
  utils/estimated_histogram.hh: Add bytes_histogram
2026-02-25 13:05:18 +02:00
Andrei Chekun
729bad77b1 test.py: add possibility to run downloaded Scylla binary
Add possibility to run Scylla binary that is stored or download the
relocatable package with Scylla.

Closes scylladb/scylladb#28787
2026-02-25 10:23:19 +02:00
Łukasz Paszkowski
9ade0b23da reader_concurrency_semaphore: set _ex in on_preemptive_abort()
When a permit is preemptively aborted, store the corresponding
exception in permit's member: `reader_permit::impl::_ex`.

This makes preemptively-aborted permits consistently report aborted()
and prevents them from being treated as eligible for inactive
registration in `register_inactive_read()`, avoiding assertion
failures on unexpected permit state.

Closes scylladb/scylladb#28591
2026-02-25 10:20:06 +02:00
Botond Dénes
56cc7bbeec Merge 'Allow "global" snapshot using topology coordinator + add tablet metadata to manifest' from Calle Wilund
Refs: SCYLLADB-193

Adds a "snapshot_table" topology operation and associated data structure/table columns to support dispatching a snapshot operation as a topo coordinator op.

Logic is similar, and thus broken out and semi-shared with, truncation.

Also adds optional tablet metadata to manifest, listing all tablets present in a given snapshot, as well as
tablet sstable ownership, repair status, and token ranges.

As per description in SCYLLADB-193, the alternative snapshot mechanism is in
a separate namespace under 'tablets', which while dubious is the desired destination.

The API is accessed via `nodetool cluster snapshot`, which more or less mirrors `nodetool snapshot`, but using topo op.

TTL is added to message propagation as a separate patch here, since it is not (yet) used from API (or nodetool).
Requires a syntax for both API and command line.

Closes scylladb/scylladb#28525

* github.com:scylladb/scylladb:
  topology::snapshot: Add expiry (ttl) to RPC/topo op
  test_snapshot_with_tablets: Extend test to check manifest content
  table::manifest: Add tablet info to manifest.json
  test::test_snapshot_with_tablets: Add small test for topo coordinated snapshot
  scylla-nodetool: Add "cluster snapshot" command
  api::storage_service: Add tablets/snapshots command for cluster level snapshot
  db::snapshot-ctl: Add method to do snapshot using topo coordinator
  storage_proxy: Add snapshot_keyspace method
  topology_coordinator: Add handler for snapshot_tables
  storage_proxy: Add handler for SNAPSHOT_WITH_TABLETS
  messaging_service: Add SNAPSHOT_WITH_TABLETS verb
  feature_service: Add SNAPSHOT_AS_TOPOLOGY_OPERATION feature
  topology_mutation: Add setter for snapshot part of row
  system_keyspace::topology_requests_entry: Add snapshot info to table
  topology_state_machine: Add snapshot_tables operation
  topology_coordinator: Break out logic from handle_truncate_table
  storage_proxy: Break out logic from request_truncate_with_tablets
  test/object_store: Remove create_ks_and_cf() helper
  test/object_store: Replace create_ks_and_cf() usage with standard methods
  test/object_store: Shift indentation right for test cases
2026-02-25 10:17:53 +02:00
Botond Dénes
166e245097 Merge 'test.py: Topology test pytest integration' from Andrei Chekun
Migrate cluster tests directory to be handled by pytest. This is the next step in process of unification of the tests and migration to the pytest.
With this PR cluster test will be executed with the full path to the file instead of `suite/test` paradigm.

Backport is not needed because it framework enhancement.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-46

Closes scylladb/scylladb#27618

* github.com:scylladb/scylladb:
  test.py: remove setsid from the framework
  test.py: rename suite.yaml to test_config.yaml
  test.py: add cluster tests to be executed by pytest
  test.py: add random seed for topology tests reproducibility
  test.py: add explicit default values to pytest options
  test.py: replace SCYLLA env var with build_mode fixture
2026-02-25 10:17:20 +02:00
Botond Dénes
9dff9752b4 Merge 'Fix regression in Alternator TTL with tablets and node going down' from Nadav Har'El
Recently we suffered a regression on how Alternator TTL behaves when a node goes down when tablets are used.

Usually, expiration of data in a particular tablet are handled by this tablet's "primary replica". However, if that node is down, we want another node to perform these expiration until the primary replica goes back online. We created a function `tablet_map::get_secondary_replica()` to select that "other node". We don't care too much what the "secondary replica" means, but we do care that it's different from the primary replica - if it's the same the expiration of that tablet will never be done.

It turns out that recently, in commits 817fdad and d88036d, the implementation of get_primary_replica() changed without a corresponding change to get_secondary_replica(). After those changes, the two functions are mismatched, and sometimes return the same node for both primary and secondary replica.

Unfortunately, although we had a dtest for the handling of a dead node in Alternator TTL, it failed to reproduce this bug, so this regression was missed - nothing else besides Alternator TTL ever used the get_secondary_replica() function.

So this series, in addition to fixing the bug, we add two tests that reproduce this bug (fail before the fix, pass with the fix):

1. A unit test that checks that get_secondary_replica() always returns a different node from get_primary_replica()
2. A cluster test based on the original dtest, which does reproduce this bug in Alternator TTL where some of the data was never expired (but only failed in release build, for an unknown reason).

Fixes SCYLLADB-777.

Closes scylladb/scylladb#28771

* github.com:scylladb/scylladb:
  test: add unit test for tablet_map::get_secondary_replica()
  test, alternator: add test for TTL expiration with a node down
  locator: fix get_secondary_replica() to match get_primary_replica()
2026-02-25 10:13:55 +02:00
Gleb Natapov
cd76604c79 raft_group0: remove unused code from raft_group0
Also do not pass raft_replace_info into setup_group0 since it is not
used there for a long time now.
2026-02-25 10:08:32 +02:00