Commit Graph

240 Commits

Author SHA1 Message Date
Pavel Emelyanov
6ae72ed134 test: Reuse S3 fixtures facilities in cqlpy/test_tools.py
Creating endpoint conf can be made with the s3_server method
Getting boto3 resource from s3_server itself is also possible

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#27380
2025-12-03 16:32:54 +02:00
Botond Dénes
357f91de52 Revert "Merge 'db/config: enable ms sstable format by default' from Michał Chojnowski"
This reverts commit b0643f8959, reversing
changes made to e8b0f8faa9.

The change forgot to update
sstables_manager::get_highest_supported_format(), which results in
/system/highest_supported_sstable_version still returning me, confusing
and breaking tests.

Fixes: scylladb/scylla-dtest#6435

Closes scylladb/scylladb#27379
2025-12-02 14:38:56 +02:00
Dawid Mędrek
b76af2d07f cql3: Improve errors when manipulating default service level
Before this commit, any attempt to create, alter, attach, or drop
the default service level would result in a syntax error whose
error message was unclear:

```
cqlsh> attach service level default to cassandra;
SyntaxException: line 1:21 no viable alternative at input 'default'
```

The error stems from the grammar not being able to parse `default`
as a correct service level name. To fix that, we cover it manually.
This way, the grammar accepts it and we can process it in Scylla.

The reason why we'd like to cover the default service level is that
it's an actual service level that the user should reference. Getting
a syntax error is not what should happen. Hence this fix.

We validate the input and if the given role is really the default
service level, we reject the query and provide an informative error
message.

Two validation tests are provided.

Fixes scylladb/scylladb#26699

Closes scylladb/scylladb#27162
2025-11-28 15:32:37 +03:00
Botond Dénes
584f4e467e tools/scylla-sstable: introduce the dump-schema command
There is a limited number of ways to obtain the schema of a table:
1) Use DESCRIBE TABLE in cqlsh
2) Find the schema definition in the code (for system tables)
3) Ask support/user to provide schema
4) Piece together the schema definition from the system tables

Option (1) is the most convenient but requires access to live cluster.
(2) is limited to system tables only.
When investigating issues for customers, we have to rely on (3) and this
often adds communication round-trips and delays. (4) requires knowledge
of ScyllaDB internals and access to system tables.

The new dump-schema commands provides a convenient way to obtain the
schema of tables, given that there is access to either an sstable or the
system tables. It can dump the schema of system tables without either.

Closes scylladb/scylladb#26433
2025-11-25 20:32:36 +03:00
Karol Nowacki
ca62effdd2 vector_search: Restrict vector index tests to tablets only
Vector indexes are going to be supported only for tablets (see VECTOR-322).
As a result, tests using vector indexes will be failing when run with vnodes.

This change ensures tests using vector indexes run exclusively with tablets.

Fixes: VECTOR-49

Closes scylladb/scylladb#26843
2025-11-25 09:26:16 +02:00
Aleksandra Martyniuk
76174d1f7a cql3: reject ALTER KEYSPACE if rf of datacenter with tablets is omitted
In ALTER KEYSPACE, when a datacenter name is omitted, its replication
factor is implicitly set to zero with vnodes, while with tablets,
it remains unchanged.

ALTER KEYSPACE should behave the same way for tablets as it does
for vnodes. However, this can be dangerous as we may mistakenly
drop the whole datacenter.

Reject ALTER KEYSPACE if it changes replication factor, but omits
a datacenter that currently contains tablet replicas.

Fixes: https://github.com/scylladb/scylladb/issues/25549.

Closes scylladb/scylladb#25731
2025-11-24 06:36:51 +02:00
Avi Kivity
b0643f8959 Merge 'db/config: enable ms sstable format by default' from Michał Chojnowski
Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes.
Make them the new default.

If we change our mind, this change can be reverted later.

New functionality, and this is a drastic change. No backport needed.

Closes scylladb/scylladb#26377

* github.com:scylladb/scylladb:
  db/config: enable `ms` sstable format by default
  cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format
  api/system: add /system/chosen_sstable_version
  test/cluster/dtest: reduce num_tokens to 16
2025-11-23 13:52:57 +02:00
Michał Chojnowski
da51a30780 db/config: enable ms sstable format by default
Trie-based sstable indexes are supposed to be (hopefully)
a better default than the old BIG indexes.
Make them the new default.

If we change our mind, this change can be reverted later.
2025-11-21 12:39:46 +01:00
Botond Dénes
f89bb68fe2 Merge 'cdc: Preserve properties when reattaching log table' from Dawid Mędrek
When we enable CDC on a table, Scylla creates a log table for it.
It has default properties, but the user may change them later on.
Furthermore, it's possible to detach that log table by simply
disabling CDC on the base table:

```cql
/* Create a table with CDC enabled. The log table is created. */
CREATE TABLE ks.t (pk int PRIMARY KEY) WITH cdc = {'enabled': true};

/* Detach the log table. */
ALTER TABLE ks.t WITH cdc = {'enabled': false};

/* Modify a property of the log table. */
ALTER TABLE ks.t_scylla_cdc_log WITH bloom_filter_fp_chance = 0.13;
```

The log table can also be reattached by enabling CDC on the base table
again:

```cql
/* Reattach the log table */
ALTER TABLE ks.t WITH cdc = {'enabled': true};
```

However, because the process of reattachment goes through the same
code that created it in the first place, the properties of the log
table are rolled back to their default values. This may be confusing
to the user and, if unnoticed, also have other consequences, e.g.
affecting performance.

To prevent that, we ensure that the properties are preserved.

A reproducer test,
`test_log_table_preserves_properties_after_reattachment`, has been
provided to verify that the changes are correct.

Another test, `test_log_table_preserves_id_after_reattachment`, has
also been added because the current implementation sets properties
and the ID separately.

Fixes scylladb/scylladb#25523

Backport: not necessary. Although the behavior may be unexpected,
it's not a bug per se.

Closes scylladb/scylladb#26443

* github.com:scylladb/scylladb:
  cdc: Preserve properties when reattaching log table
  cdc: Extract creating columns in CDC log table to dedicated function
  cdc: Extract default properties of CDC log tables to dedicated function
  schema/schema_builder.hh: Add set_properties
  schema: Add getter for schema::user_properties
  schema: Remove underscores in fields of schema::user_properties
  schema: Extract user properties out of raw_schema
2025-11-21 10:06:05 +02:00
Nadav Har'El
a9cf7d08da test/cqlpy: remove USE from test, and test a USE-related bug
One of the tests in test_describe.py used "USE {test_keyspace}" which
affects the CQL session shared by all tests in an unrecoverable way (there
is no "UNUSE" statement).

As an example of what might happen if the shared CQL session is "polluted"
by a USE, issue #26334 is about a bug we have in DESC KEYSPACES when USE
is active.

So in this patch, we:

1. Fix the test to not use USE on the shared CQL session - it's easy to
   create a separate session to use the "USE" on. With this fix, the test
   no longer leaves the shared CQL session in a "USE" state.

2. Add a new xfailing test to reproduce the DESC KEYSPACES bug.
   Refs #26334

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26345
2025-11-20 10:25:03 +02:00
Nadav Har'El
7b9428d8d7 test/cqlpy: test compression setting for auxiliary table
In the previous patch we noticed that although recently (commit adf9c42,
Refs #26610) we changed the default sstable compressor from LZ4Compressor
to LZ4WithDictsCompressor, this change was only applied to CQL, not to
Alternator.

In this patch we add tests that demonstrate that it's even worse - the
new compression only applies to CQL's *base* table - all the "auxiliary"
tables -
        * Materialized views
        * Secondary index's materialized views
        * CDC log tables

all still have the old LZ4Compressor, different from the base table's
default compressor.

The new test fails on Scylla, reproducing #26914, and passes on
Cassandra (on Cassandra, we only compare the materialized view table,
because SI and CDC is implemented differently).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-19 09:18:37 +02:00
Dawid Mędrek
0602afc085 cdc: Preserve properties when reattaching log table
When we enable CDC on a table, Scylla creates a log table for it.
It has default properties, but the user may change them later on.
Furthermore, it's possible to detach that log table by simply
disabling CDC on the base table:

```cql
/* Create a table with CDC enabled. The log table is created. */
CREATE TABLE ks.t (pk int PRIMARY KEY) WITH cdc = {'enabled': true};

/* Detach the log table. */
ALTER TABLE ks.t WITH cdc = {'enabled': false};

/* Modify a property of the log table. */
ALTER TABLE ks.t_scylla_cdc_log WITH bloom_filter_fp_chance = 0.13;
```

The log table can also be reattached by enabling CDC on the base table
again:

```cql
/* Reattach the log table */
ALTER TABLE ks.t WITH cdc = {'enabled': true};
```

However, because the process of reattachment goes through the same
code that created it in the first place, the properties of the log
table are rolled back to their default values. This may be confusing
to the user and, if unnoticed, also have other consequences, e.g.
affecting performance.

To prevent that, we ensure that the properties are preserved.

A reproducer test,
`test_log_table_preserves_properties_after_reattachment`, has been
provided to verify that the changes are correct. It fails before this
commit.

Another test, `test_log_table_preserves_id_after_reattachment`, has
also been added because the current implementation sets properties
and the ID separately.

Fixes scylladb/scylladb#25523
2025-11-17 11:56:30 +01:00
Benny Halevy
f9ce98384a scylla-sstable: correctly dump sharding_metadata
This patch fixes 2 issues at one go:

First, Currently sstables::load clears the sharding metadata
(via open_data()), and so scylla-sstable always prints
an empty array for it.

Second, printing token values would generate invalid json
as they are currently printed as binary bytes, and they
should be printed simply as numbers, as we do elsewhere,
for example, for the first and last keys.

Fixes #26982

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#26991
2025-11-14 17:55:41 +02:00
Piotr Dulikowski
7f482c39eb Merge '[schema] Speculative retry rounding fix' from Dario Mirovic
This patch series re-enables support for speculative retry values `0` and `100`. These values have been supported some time ago, before [schema: fix issue 21825: add validation for PERCENTILE values in speculative_retry configuration. #21879
](https://github.com/scylladb/scylladb/pull/21879). When that PR prevented using invalid `101PERCENTILE` values, valid `100PERCENTILE` and `0PERCENTILE` value were prevented too.

Reproduction steps from [[Bug]: drop schema and all tables after apply speculative_retry = '99.99PERCENTILE' #26369](https://github.com/scylladb/scylladb/issues/26369) are unable to reproduce the issue after the fix. A test is added to make sure the inclusive border values `0` and `100` are supported.

Documentation is updated to give more information to the users. It now states that these border values are inclusive, and also that the precision, with automatic rounding, is 1 decimal digit.

Fixes #26369

This is a bug fix. If at any time a client tries to use value >= 99.5 and < 100, the raft error will happen. Backport is needed. The code which introduced inconsistency is introduced in 2025.2, so no backporting to 2025.1.

Closes scylladb/scylladb#26909

* github.com:scylladb/scylladb:
  test: cqlpy: add test case for non-numeric PERCENTILE value
  schema: speculative_retry: update exception type for sstring ops
  docs: cql: ddl.rst: update speculative-retry-options
  test: cqlpy: add test for valid speculative_retry values
  schema: speculative_retry: allow 0 and 100 PERCENTILE values
2025-11-13 15:27:45 +01:00
Nadav Har'El
5839574294 Merge 'cql3: Fix std::bad_cast when deserializing vectors of collections' from Karol Nowacki
cql3: Fix std::bad_cast when deserializing vectors of collections

This PR fixes a bug where attempting to INSERT a vector containing collections (e.g., `vector<set<int>,1>`) would fail. On the client side, this manifested as a `ServerError: std::bad_cast`.

The cause was "type slicing" issue in the reserialize_value function. When retrieving the vector's element type, the result was being assigned by value (using auto) instead of by reference.
This "sliced" the polymorphic abstract_type object, stripping it of its actual derived type information. As a result, a subsequent dynamic_cast would fail, even if the underlying type was correct.

To prevent this entire class of bugs from happening again, I've made the polymorphic base class `abstract_type` explicitly uncopyable.

Fixes: #26704

This fix needs to be backported as these releases are affected: `2025.4` , `2025.3`.

Closes scylladb/scylladb#26740

* github.com:scylladb/scylladb:
  cql3: Make abstract_type explicitly noncopyable
  cql3: Fix std::bad_cast when deserializing vectors of collections
2025-11-13 00:24:25 +02:00
Nadav Har'El
4de88a7fdc test/cqlpy: fix run script for materialized views on tablets
Recently we enabled tablets by default, but it is necessary to
enable rf_rack_valid_keyspaces if materialized views are to be used
with tablets, and this option is *not* the default.

We did add this option in test/pylib/scylla_cluster.py which is
used by test.py, but we didn't add it to test/cqlpy/run.py, so
the test/cqlpy/run script is no longer able to run tests with
materialized views. So this patch adds the missing configuration
to run.py.

FIxes #26918

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26919
2025-11-12 11:56:21 +03:00
Karol Nowacki
960fe3da60 cql3: Fix std::bad_cast when deserializing vectors of collections
When deserializing a vector whose elements are collections (e.g., set, list),
the operation raises a `std::bad_cast` exception.

This was caused by type slicing due to an incorrect assignment of a
polymorphic type by value instead of by reference. This resulted in a
failed `dynamic_cast` even when the underlying type was correct.
2025-11-12 09:11:56 +01:00
Nadav Har'El
b659dfcbe9 test/cqlpy: comment out Cassandra check that is no longer relevant
In the test translated from Cassandra validation/operations/alter_test.py
we had two lines in the beginning of an unrelated test that verified
that CREATE KEYSPACE is not allowed without replication parameters.
But starting recently, ScyllaDB does have defaults and does allow these
CREATE KEYSPACE. So comment out these two test lines.

We didn't notice that this test started to fail, because it was already
marked xfail, because in the main part of this test, it reproduces a
different issue!

The annoying side-affect of these no-longer-passing checks was that
because the test expected a CREATE KEYSPACE to fail, it didn't bother
to delete this keyspace when it finished, which causes test.py to
report that there's a problem because some keyspaces still exist at the
end of the test. Now that we fixed this problem, we no longer need to
list this test in test/cqlpy/suite.yaml as a test that leaves behind
undeleted keyspaces.

Fixes #26292

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26341
2025-11-11 10:34:27 +02:00
Nikos Dragazis
94c4f651ca test/cqlpy: Test secondary index with short reads
Add a test to check that paged secondary index queries behave correctly
when pages are short. This is currently failing in Scylla, but passes in
Cassandra 5, therefore marked as "xfailing". Refer to the test's
docstring for more details.

The bug is a regression introduced by commit f6f18b1.
`test/cqlpy/run --release ...` shows that the test passes in 5.1 but
fails in 5.2 onwards.

Refs #25839.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#25843
2025-11-11 09:28:45 +02:00
Dario Mirovic
7ec9e23ee3 test: cqlpy: add test case for non-numeric PERCENTILE value
Add test case for non-numeric PERCENTILE value, which raises an error
different to the out-of-range invalid values. Regex in the test
test_invalid_percentile_speculative_retry_values is expanded.

Refs #26369
2025-11-09 13:59:36 +01:00
Dario Mirovic
5d1913a502 test: cqlpy: add test for valid speculative_retry values
test_valid_percentile_speculative_retry_values is introduced to test that
valid values for speculative_retry are properly accepted.

Some of the values are moved from the
test_invalid_percentile_speculative_retry_values test, because
the previous commit added support for them.

Refs #26369
2025-11-09 13:23:26 +01:00
Nadav Har'El
8a07b41ae4 test/cqlpy: add test confirming page_size=0 disables paging
In pull request #26384 a discussion started whether page_size=0 really
disables paging, or maybe one needs page_size=-1 to truly disable paging.

The reason for that discussion was commit 08c81427b that started to
use page_size=-1 for internal unpaged queries, and commit 76b31a3 that
incorrectly claimed that page_size>=0 means paging is enabled.

This patch introduces a test that confirms that with page_size=0, paging
is truly disabled - including the size-based (1MB) paging.

The new test is Scylla-only, because Cassandra is anyway missing the
size-based page cutoff (see CASSANDRA-11745).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26742
2025-11-05 15:52:16 +03:00
Michael Litvak
1dbf53ca29 test: enable counters tests with tablets
Enable all counters-related tests that were disabled for tablets because
counters was not supported with tablets until now.

Some tests were parametrized to run with both vnodes and tablets, and
the tablets case was skipped, in order to not lose coverage. We change
them to run with the default configuration since now counters is
supported with both vnodes and tablets, and the implementation is the
same, so there is no benefit in running them with both configurations.
2025-11-03 16:04:37 +01:00
Michael Litvak
60ac13d75d cql3: remove warning when creating keyspace with tablets
When creating a keyspace with tablets, a warning is shown with all the
unsupported features for tablets, which is only counters currently.

Now that counters is also supported with tablets, we can remove this
warning entirely.
2025-11-03 16:04:37 +01:00
Michał Hudobski
fd521cee6f test: fix typo in vector_index test
Unfortunately in https://github.com/scylladb/scylladb/pull/26508
a typo than changes a behavior of a test was introduced.
This patch fixes the typo.

Closes scylladb/scylladb#26803
2025-10-31 18:35:02 +03:00
Avi Kivity
04a289cae6 Merge 'Auto expand to rack list' from Tomasz Grabiec
We want to move towards rack-list based replication factor for tablets being the default mode, and in the future the only supported mode. This PR is a step towards that. We auto-expand numeric RF to rack list on keyspace creation and ALTER when rf_rack_valid_keyspaces option is enabled.

The PR is mostly about adjusting tests. The main logic change is in the last patch, which modifies option post-processing in ks_prop_defs.

Fixes #26397

Closes scylladb/scylladb#26692

* github.com:scylladb/scylladb:
  cql3: ks_prop_defs: Expand numeric RF to rack list
  locator: Move rack_list to topology.hh
  alternator: Do not set RF for zero-token DCs
  alternator: Switch keyspace creation to use ks_prop_defs
  test: alternator: Adjust for rack lists
  cql3: Move validation of invalid ALTER KEYSPACE earlier, to ks_prop_defs
  test: cqlpy: Mark tests using rack lists as scylla-only
  test: Switch to rack-list based RF
  test: Generalize tests to work with both numeric RF and rack lists
  test: cluster: test_zero_token_nodes_multidc: Adjust to rack list RF
  test: Prepare for handling errors specific to rack list path
  test: cluster: dtest: alternator: Force RF=1 in test_putitem_contention
  test: Create cluster with multiple racks in multi-dc setups
  test: boost: network_topology_strategy_test: Adjust to rack-list RF
  test: tablets: Adjust to rack list
  test: cluster: test_group0_schema_versioning: Use smaller RF to respect rf-rack-validness
  test: tablets_test: Convert test_per_shard_goal_mixed_dc_rf to be rack-valid
  test: object_store: test_backup: Adjust for rack lists
  test: cluster: tablets: Do not move tablet across racks in test_tablet_transition_sanity
  test: cluster: mv: Do not move tablets across racks
  test: cluster: util: Fix docstring for parse_replication_options()
  tablets, topology_coordinator: Skip tablet draining on replace
2025-10-30 21:54:08 +02:00
Tomasz Grabiec
8e69c65124 test: cqlpy: Mark tests using rack lists as scylla-only
Those tests are intended to be also run against Cassandra, which
doesn't support rack lists.
2025-10-29 23:32:58 +01:00
Tomasz Grabiec
d2e7d6fad2 test: Generalize tests to work with both numeric RF and rack lists 2025-10-29 23:32:58 +01:00
Benny Halevy
e8b9f13061 test: Prepare for handling errors specific to rack list path 2025-10-29 23:32:58 +01:00
Michał Hudobski
46589bc64c secondary_index: disallow multiple vector indexes on the same column
We currently allow creating multiple vector indexes on one column.
This doesn't make much sense as we do not support picking one when
making ann queries.

To make this less confusing and to make our behavior similar
to Cassandra we disallow the creation of multiple vector indexes
on one column.

We also add a test that checks this behavior.

Fixes: VECTOR-254
Fixes: #26672

Closes scylladb/scylladb#26508
2025-10-29 11:55:38 +02:00
Botond Dénes
f3cec5f11a Merge 'index: Set tombstone_gc when creating underlying view' from Dawid Mędrek
Before this commit, when the underlying materialized view was created,
it didn't have the property `tombstone_gc` set to any value. We fix the
bug in this PR.

Implementation strategy:

1. Move code responsible for producing the schema
   of a secondary index to the file that handles
   `CREATE INDEX`.
2. Set the property when creating the view.
3. Add reproducer tests.

Fixes scylladb/scylladb#26542

Backport: we can discuss it.

Closes scylladb/scylladb#26543

* github.com:scylladb/scylladb:
  index: Set tombstone_gc when creating secondary index
  index: Make `create_view_for_index` method of `create_index_statement`
  index: Move code for creating MV of secondary index to cql3
  db, cql3: Move creation of underlying MV for index
2025-10-28 14:42:42 +02:00
Michał Hudobski
541b52cdbf cql: fail with a better error when null vector is passed to ann query
Currently when a null vector is passed to an ANN query we fail with a
quite confusing error ("NoHostAvailable: ('Unable to complete the
operation against any hosts', {<Host: 127.0.0.1:9042 datacenter1>:
<Error from server: code=0000 [Server error] message="to_bytes() called
on raw value that is null">})").

This patch fixes that by throwing an InvalidRequestException with an
appropriate message instead.
We also add a test case that validates this behavior.

Fixes: VECTOR-257

Closes scylladb/scylladb#26510
2025-10-27 16:09:08 +02:00
Avi Kivity
b843d8bc8b Merge 'scylla-sstable: add cql support to write operation' from Botond Dénes
In theory, scylla-sstable write is an awesome and flexible tool to generate sstables with arbitrary content. This is convenient for tests and could come clutch in a disaster scenario, where certain system table's content need to be manually re-created, system tables that are not writable directly via CQL.
In practice, in its current form this operation is so convoluted to use that even its own author shuns it. This is because the JSON specification of the sstable content is the same as that of the scylla-sstable dump-data: containing every single piece of information on the mutation content. Where this is an advantage for dump-data, allowing users to inspect the data in its entirety -- it is a huge disadvantage for write, because of all these details have to be filled in, down to the last timestamp, to generate an sstable. On top of that, the tool doesn't even support any of the more advanced data types, like collections, UDF and counters.
This PR proposes a new way of generating sstables: based on the success of scylla-sstable query, it introduces CQL support for scylla-sstable write. The content of the sstable can now be specified via standard INSERT, UPDATE and DELETE statements, which are applied to a memtable, then flushed into the sstable.
To avoid boundless memory consumption, the memtable is flushed every time it reaches 1MiB in size, consequently the command can generate multiple output sstables.

The new CQL input-format is made default, this is safe as nobody is using this command anyway. Hopefully this PR will change that.

Fixes: https://github.com/scylladb/scylladb/issues/26506

New feature, no backport.

Closes scylladb/scylladb#26515

* github.com:scylladb/scylladb:
  test/cqlpy/test_tools.py: add test for scylla-sstable write --input-format=cql
  replica/mutation_dump: add support for virtual tables
  tools/scylla-sstable: print_query_results_json(): handle empty value buffer
  tools/scylla-sstable: add cql support to write operation
  tools/scylla-sstable: write_operation(): fix indentation
  tools/scylla-sstable: write_operation(): prepare for a new input-format
  tools/scylla-sstable: generalize query_operation_validate_query()
  tools/scylla-sstable: move query_operation_validate_query()
  tools/scylla-sstable: extract schema transformation from query operation
  replica/table: add virtual write hook to the other apply() overload too
2025-10-24 23:32:40 +03:00
Avi Kivity
997b52440e Merge 'replica/mutation_dump: include empty/dead partitions in the scan results' from Botond Dénes
`select * from mutation_fragment()` queries don't return partitions which are completely empty or only contain tombstones which are all garbage collectible. This is because the underlying `mutation_dump` mechanism has a separate query to discover partitions for scans. This query is a regular mutation scan, which is subject to query compaction and garbage collection. Disable the query compaction for mutation queries executed on behalf of mutation fragment queries, so *all* data is visible in the result, even that which is fully garbage collectible.

Fixes scylladb/scylladb#23707.

Scans for mutation-fragment are very rare, so a backport is not necessary. We can backport on-demand.

Closes scylladb/scylladb#26227

* github.com:scylladb/scylladb:
  replica/mutation_dump: multi_range_partition_generator: disable garbage-collection
  replica: add tombstone_gc_enabled parameter to mutation query methods
  mutation/mutation_compactor: remove _can_gc member
  tombstone_gc: add tombstone_gc_state factory methods for gc_all and no_gc
2025-10-24 23:26:16 +03:00
Tomasz Grabiec
ba692d1805 schema_tables: Keep "replication" column backwards-compatible by expanding rack lists to numeric RF
In 380f243986 we added support for rack
lists in replication options. Drivers which are not prepared to parse
that (as of now, all of them), will not create metadata object for
that keyspace. This breaks, for example, the "copy to/from" cqlsh
command. Potentially other things too.

To fix that, keep the "replication" column in the old format, and
store numeric RF there, which corresponds to the number of
replicas. Accurate options in the new format are put in
"replication_v2".

We set replication_v2 in the schema only when it differs from the old
"replication" so that the new column is not set during upgrade,
otherwise downgrade would fail. Partition tombstone is added to ensure
that pre-alter replication_v2 value is deleted on alters which change
replication to a value which is the same as the post-alter
"replication" value.

Fixes #26415

Closes scylladb/scylladb#26429
2025-10-21 09:11:25 +03:00
Dawid Mędrek
7e201eea1a index: Set tombstone_gc when creating secondary index
Before this commit, when the underlying materialized view was created,
it didn't have the property `tombstone_gc` set to any value. That
was a bug and we fix it now.

Two reproducer tests is added for validation. They reproduce the problem
and don't pass before this commit.

Fixes scylladb/scylladb#26542
2025-10-20 14:04:45 +02:00
Botond Dénes
5d70450917 replica/mutation_dump: multi_range_partition_generator: disable garbage-collection
Make use of the freshly introduced facility to disable
garbage-collection on a per-query basis for range scans. This is needed
so partitions that only contain garbage-collectible data are not missing
from the partition-list. When using SELECT * FROM MUTATION_FRAGMENTS(),
the user is expecting to see *all* data, even that which is dead and
garbage-collectible.

Include a test which reproduces the issue.
2025-10-16 10:40:28 +03:00
Nadav Har'El
afc5379148 secondary index: translate test_indexing_paging_and_aggregation to Python
The Boost test test_indexing_paging_and_aggregation is one of the slowest
boost tests. But it's hard to understand why it needs to be so slow - the
C++ test code is opaque, and uncommented. The test didn't need to be in
C++ - it only uses CQL, not any internal interfaces - but it was written
in 2019, a year before test/cqlpy was created.

So before we can make this test faster, this patch translates it to
Python and adds significant amount of comments. The new Python test is
functionally identical to the old C++ test - it is not (yet) made
smaller or faster. The new test takes a whopping 9 seconds to run on
my laptop (in dev build mode). We'll reduce that in the next patch.

As usual, the cqlpy test can also be tested on Cassandra, and
unsurprisingly, it passes.

Refs #16134 (which asks to translate more MV and SI tests to Python).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-10-15 17:50:37 +03:00
Botond Dénes
46af0127e9 test/cqlpy/test_tools.py: add test for scylla-sstable write --input-format=cql
Comprehensive test for the new CQL input format.
2025-10-13 18:10:40 +03:00
Botond Dénes
180bf647f7 replica/mutation_dump: add support for virtual tables
Not supported currently as such tables have no memtables, cache or
sstables, so any select * from mutation_fragments() query will return
empty result.
Detect virtual tables and add return their content with a distinct
'virtual-table' mutation_source designation.
2025-10-13 18:10:40 +03:00
Botond Dénes
e404dd7cf0 tools/scylla-sstable: add cql support to write operation
Add new --input-format command line argument. Possible values are json
(current) and cql (new -- added in this patch).
When --input-format=cql (new default), the input-file is expected to
contain CQL INSERT, UPDATE or DELETE statements, separated by semicolon.
The input file can contain any number of statements, in any order. The
statements will be executed and applied to a memtable, which is then
flushed to create an sstable with the content generated from the
statement. The memtable's size is capped at 1MiB, if it reaches this
size, it is flushed and recreated. Consequently, multiple sstables can
be created from a single scylla-sstable write --input-format=cql
operation.
2025-10-13 18:10:40 +03:00
Botond Dénes
f03cec9574 tools/scylla-sstable: generalize query_operation_validate_query()
Make error messages more generic, so they are not specific to select.
Make it a template on the type of cql statement for the final check. To
avoid templating the whole thing, the function is split into two.
Parametrize the name of the allowed statement types in said check.
Prepares the method to be shared between query operation and write
operation (future change).
While at it, also change query param type to std::string_view to avoid
some copies.
2025-10-13 17:35:50 +03:00
Andrzej Jackowski
b62135f767 test: add test_desc_* for driver service level
Driver service level is a special service level that is created
automatically by the system. Therefore, it requires special handling
in DESC SCHEMA WITH INTERNALS and those test verifies the special
behavior.

Refs: scylladb/scylladb#24411
2025-10-08 08:25:07 +02:00
Andrzej Jackowski
c59a7db1c9 service_level_controller: automatically create sl:driver
This commit:
  - Increases the number of allowed scheduling groups to allow the
    creation of `sl:driver`.
  - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating
    `sl:driver` until all nodes have increased the number of
    scheduling groups.
  - Starts using `get_create_driver_service_level_mutations`
    to unconditionally create `sl:driver` on
    `raft_initialize_discovery_leader`. The purpose of this code
    path is ensuring existence of `sl:driver` in new system and tests.
  - Starts using `migrate_to_driver_service_level` to create `sl:driver`
    if it is not already present. The creation of `sl:driver` is
    managed by `topology_coordinator`, similar to other system keyspace
    updates, such as the `view_builder` migration. The purpose of this
    code path is handling upgrades.
  - Modifies related tests to pass after `sl:driver` is added.

Later in this patch series, `sl:driver` will be used by
`transport/server` to handle selected traffic, such as the driver's
schema and topology fetches.

Refs: scylladb/scylladb#24411
2025-10-08 08:24:43 +02:00
Andrzej Jackowski
7d2db37831 test: add MAX_USER_SERVICE_LEVELS
Previously, tests used the hardcoded value 7 for the maximum number of
user service levels. This commit introduces a named variable that can
be shared across tests to avoid cases where this magic number goes
out of sync.
2025-10-08 08:24:17 +02:00
Tomasz Grabiec
5fc617ecf5 test: Add tests for rack list RF 2025-10-02 19:45:00 +02:00
Michał Hudobski
ae4d4908ba configure: increase SCHEDULING_GROUPS_COUNT to 20
We would like to have an additional service level
available for users of the Vector Store service,
which would allow us to de/prioritize vector
operations as needed. To allow that, we increase
the number of scheduling groups from 19 to 20
and adjust the related test accordingly.

Closes scylladb/scylladb#26316
2025-09-30 12:41:28 +03:00
Nadav Har'El
38002718a9 cqlpy: improve testing for "duration" column type
We had very rudimentary tests for the "duration" CQL type in the cqlpy
framework - just for reproducing issue #8001. But we left two
alternative formats, and a lot of corner cases, untested.
So this patch aims to add the missing tests - to exhaustively cover
the "duration" literal formats and their capabilities.

Some of the examples tested in the new test are inspired by Cassandra's
unit test test/unit/org/apache/cassandra/cql3/DurationTest.java and the
corner cases that this file covers. However, the new tests are not direct
translation of that file because DurationTest.java was not a CQL test -
it was a unit test of Cassandra's internal "Duration" type, so could not
be directly translated into a CQL-based test.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25092
2025-09-30 12:25:02 +03:00
Avi Kivity
4d9271df98 Merge 'sstables: introduce sstable version ms' from Michał Chojnowski
This is yet another part in the BTI index project.

Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25626
Next parts: make `ms` the default. Then, general tweaks and improvements. Later, potentially a full `da` format implementation.

This patch series introduces a new, Scylla-only sstable format version `ms`, which is like `me`, but with the index components (Summary.db and Index.db) replaced with BTI index components (Partitions.db and Rows.db), as they are in Cassandra 5.0's `da` format version.

(Eventually we want to just implement `da`, but there are several other changes (unrelated to the index files) between `me` and `da`. By adding this `ms` as an intermediate step we can adapt the new index formats without dragging all the other changes into the mix (and raising the risk of regressions, which is already high)).

The high-level structure of the PR is:
1. Introduce new component types — `Partitions` and `Rows`.
2. Teach `class sstable` to open them when they exist.
3. Teach the sstable writer how to write index data to them.
4. Teach `class sstable` and unit tests how to deal with sstables that have no `Index` or `Summary` (but have `Partitions` and `Rows` instead).
5. Introduce the new sstable version `ms`, specify that it has `Partitions` and `Rows` instead of `Index` and `Summary`.
6. Prepare unit tests for the appearance of `ms`.
7. Enable `ms` in unit tests.
8. Make `ms` enablable via db::config (with a silent fall back to `me` until the new `MS_SSTABLE_FORMAT` cluster feature is enabled).
9. Prepare integration tests for the appearance of `ms`.
10. Enable both `ms` and `me` in tests where we want both versions to be tested.

This series doesn't make `ms` the default yet, because that requires teaching Scylla Manager and a few dtests about the new format first. It can be enabled by setting `sstable_format: ms` in the config.

Per a review request, here is an example from `perf_fast_forward`, demonstrating some motivation for a new format. (Although not the main one. The main motivations are getting rid of restrictions on the RAM:disk ratio, and index read throughput for datasets with tiny partitions). The dataset was populated with `build/release/scylla perf-fast-forward --smp=1 --sstable-format=$VERSION --data-directory=data.$VERSION --column-index-size-in-kb=1 --populate --random-seed=0`.
This test involves a partition with 1000000 clustering rows (with 32-bit keys and 100-byte values) and ~500 index blocks, and queries a few particular rows from the partition. Since the branching factor for the BIG promoted index is 2 (it's a binary search), the lookup involves ~11.2 sequential page reads per row. The BTI format has a more reasonable branching factor, so it involves ~2.3 page reads per row.

`build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/me --run-tests=large-partition-select-few-rows`:
```
offset  stride  rows     iterations    avg aio    aio      (KiB)
500000  1       1                70       18.0     18        128
500001  1       1               647       19.0     19        132
0       1000000 1               748       15.0     15        116
0       500000  2               372       29.0     29        284
0       250000  4               227       56.0     56        504
0       125000  8               116      106.0    106        928
0       62500   16               67      195.0    195       1732
```
`build/release/scylla perf-fast-forward --smp=1 --data-directory=perf_fast_forward_data/ms --run-tests=large-partition-select-few-rows`:
```
offset  stride  rows     iterations    avg aio    aio      (KiB)
500000  1       1                51        5.1      5         20
500001  1       1                64        5.3      5         20
0       1000000 1               679        4.0      4         16
0       500000  2               492        8.0      8         88
0       250000  4               804       16.0     16        232
0       125000  8               409       31.0     31        516
0       62500   16               97       54.0     54       1056
```

Index file size comparison for the default `perf_fast_forward` tables with `--random-seed=0`:
Large partition table (dominated by intra-partition index): 2.4 MB with `me`, 732 kB with `ms`.
For the small partitions table (dominated by inter-partition index): 11 MB with `me`, 8.4 MB with `ms`.

External tests:
I ran SCT test `longevity-mv-si-4days-streaming-test` test on 6 nodes with 30 shards each for 8 hours. No anomalies were observed.

New functionality, no backport needed.

Closes scylladb/scylladb#26215

* github.com:scylladb/scylladb:
  test/boost/bloom_filter_test: add test_rebuild_from_temporary_hashes
  test/cluster: add test_bti_index.py
  test: prepare bypass_cache_test.py for `ms` sstables
  sstables/trie/bti_index_reader: add a failure injection in advance_lower_and_check_if_present
  test/cqlpy/test_sstable_validation.py: prepare the test for `ms` sstables
  tools/scylla-sstable: add `--sstable-version=?` to `scylla sstable write`
  db/config: expose "ms" format to the users via database config
  test: in Python tests, prepare some sstable filename regexes for `ms`
  sstables: add `ms` to `all_sstable_versions`
  test/boost/sstable_3_x_test: add `ms` sstables to multi-version tests
  test/lib/index_reader_assertions: skip some row index checks for BTI indexes
  test/boost/sstable_inexact_index_test: explicitly use a `me` sstable
  test/boost/sstable_datafile_test: skip test_broken_promoted_index_is_skipped for `ms` sstables
  test/resource: add `ms` sample sstable files for relevant tests
  test/boost/sstable_compaction_test: prepare for `ms` sstables.
  test/boost/index_reader_test: prepare for `ms` sstables
  test/boost/bloom_filter_tests: prepare for `ms` sstables
  test/boost/sstable_datafile_test: prepare for `ms` sstables
  test/boost/sstable_test: prepare for `ms` sstables.
  sstables: introduce `ms` sstable format version
  tools/scylla-sstable: default to "preferred" sstable version, not "highest"
  sstables/mx/reader: use the same hashed_key for the bloom filter and the index reader
  sstables/trie/bti_index_reader: allow the caller to passing a precalculated murmur hash
  sstables/trie/bti_partition_index_writer: in add(), get the key hash from the caller
  sstables/mx: make Index and Summary components optional
  sstables: open Partitions.db early when it's needed to populate key range for sharding metadata
  sstables: adapt sstable::set_first_and_last_keys to sstables without Summary
  sstables: implement an alternative way to rebuild bloom filters for sstables without Index
  utils/bloom_filter: add `add(const hashed_key&)`
  sstables: adapt estimated_keys_for_range to sstables without Summary
  sstables: make `sstable::estimated_keys_for_range` asynchronous
  sstables/sstable: compute get_estimated_key_count() from Statistics instead of Summary
  replica/database: add table::estimated_partitions_in_range()
  sstables/mx: implement sstable::has_partition_key using a regular read
  sstables: use BTI index for queries, when present and enabled
  sstables/mx/writer: populate BTI index files
  sstables: create and open BTI index files, when enabled
  sstables: introduce Partition and Rows component types
  sstables/mx/writer: make `_pi_write_m.partition_tombstone` a `sstables::deletion_time`
2025-09-30 09:40:02 +03:00
Michał Chojnowski
182c8ce87b test/cqlpy/test_sstable_validation.py: prepare the test for ms sstables
BIG sstables and BTI sstables use different code paths for
validating the Data file against the index. So we want to test
both types of indexes, not just the default one.

This patch changes the test so that it explicitly tests both
`me` and `ms` instead of only testing the default format.

Note that we disable some tests for BTI indexes:
the tests which check that validation detects mismatches
between the row index ("promoted index") and the Data file.
This is because currently iteration over the row
index in BTI isn't implemented at the moment,
so for BTI the validation behaves as if there was no row indexes.
2025-09-29 22:15:25 +02:00