Commit Graph

10055 Commits

Author SHA1 Message Date
Michał Hudobski
7646dde25b select_statement: add a warning about unsupported paging for vs queries
Currently we do not support paging for vector search queries.
When we get such a query with paging enabled we ignore the paging
and return the entire result. This behavior can be confusing for users,
as there is no warning about paging not working with vector search.
This patch fixes that by adding a warning to the result of ANN queries
with paging enabled.

Closes scylladb/scylladb#26384
2025-11-13 18:47:05 +02:00
Piotr Dulikowski
7f482c39eb Merge '[schema] Speculative retry rounding fix' from Dario Mirovic
This patch series re-enables support for speculative retry values `0` and `100`. These values have been supported some time ago, before [schema: fix issue 21825: add validation for PERCENTILE values in speculative_retry configuration. #21879
](https://github.com/scylladb/scylladb/pull/21879). When that PR prevented using invalid `101PERCENTILE` values, valid `100PERCENTILE` and `0PERCENTILE` value were prevented too.

Reproduction steps from [[Bug]: drop schema and all tables after apply speculative_retry = '99.99PERCENTILE' #26369](https://github.com/scylladb/scylladb/issues/26369) are unable to reproduce the issue after the fix. A test is added to make sure the inclusive border values `0` and `100` are supported.

Documentation is updated to give more information to the users. It now states that these border values are inclusive, and also that the precision, with automatic rounding, is 1 decimal digit.

Fixes #26369

This is a bug fix. If at any time a client tries to use value >= 99.5 and < 100, the raft error will happen. Backport is needed. The code which introduced inconsistency is introduced in 2025.2, so no backporting to 2025.1.

Closes scylladb/scylladb#26909

* github.com:scylladb/scylladb:
  test: cqlpy: add test case for non-numeric PERCENTILE value
  schema: speculative_retry: update exception type for sstring ops
  docs: cql: ddl.rst: update speculative-retry-options
  test: cqlpy: add test for valid speculative_retry values
  schema: speculative_retry: allow 0 and 100 PERCENTILE values
2025-11-13 15:27:45 +01:00
Piotr Dulikowski
2e5eb92f21 Merge 'cdc: use CDC schema that is compatible with the base schema' from Michael Litvak
When generating CDC log mutations for some base mutation, use a CDC schema that is compatible with the base schema.

The compatible CDC schema has for every base column a corresponding CDC column with the same name. If using a non-compatible schema, we may encounter a situation, especially during ALTER, that we have a mutation with a base column set with some value, but the CDC schema doesn't have a column by that name. This would cause the user request to fail with an error.

We add to the schema object a schema_ptr that for CDC-enabled tables points to the schema object of the CDC table that is compatible with the schema. It is set by the schema merge algorithm when creating the schema for a table that is created or altered. We use the fact that a base table and its CDC table are created and altered in the same group0 operation, and this way we can find and set the cdc schema for a base table.

When transporting the base schema as a frozen schema between shards, we transport with it the frozen cdc schema as well.

The patch starts with a series of refactoring commits that make extending the frozen schema easier and cleans up some duplication in the code about the frozen schema. We combine the two types `frozen_schema_with_base_info` and `view_schema_and_base_info` to a single type `extended_frozen_schema` that holds a frozen schema with additional data that is not part of the schema mutations but needs to be transported with it to unfreeze it - base_info, and the frozen cdc schema which is added in a later commit.

Fixes https://github.com/scylladb/scylladb/issues/26405

backport not needed - enhancement

Closes scylladb/scylladb#24960

* github.com:scylladb/scylladb:
  test: cdc: test cdc compatible schema
  cdc: use compatiable cdc schema
  db: schema_applier: create schema with pointer to CDC schema
  db: schema_applier: extract cdc tables
  schema: add pointer to CDC schema
  schema_registry: remove base_info from global_schema_ptr
  schema_registry: use extended_frozen_schema in schema load
  schema_registry: replace frozen_schema+base_info with extended_frozen_schema
  frozen_schema: extract info from schema_ptr in the constructor
  frozen_schema: rename frozen_schema_with_base_info to extended_frozen_schema
2025-11-13 10:11:54 +01:00
Pavel Emelyanov
f47f2db710 Merge 'Support local primary-replica-only for native restore' from Robert Bindar
This PR extends the restore API so that it accepts primary_replica_only as parameter and it combines the concepts of primary-replica-only with scoped streaming so that with:
- `scope=all primary_replica_only=true` The restoring node will stream to the global primary replica only
- `scope=dc primary_replica_only=true` The restoring node will stream to the local primary replica only.
- `scope=rack primary_replica_only=true` The restoring node will stream only to the primary replica from within its own rack (with rf=#racks, the restoring node will stream only to itself)
- `scope=node primary_replica_only=true` is not allowed, the restoring node will always stream only to itself so the primary_replica_only parameter wouldn't make sense.

The PR also adjusts the `nodetool refresh` restriction on running restore with both primary_replica_only and scope, it adds primary_replica_only to `nodetool restore` and it adds cluster tests for primary replica within scope.

Fixes #26584

Closes scylladb/scylladb#26609

* github.com:scylladb/scylladb:
  Add cluster tests for checking scoped primary_replica_only streaming
  Improve choice distribution for primary replica
  Refactor cluster/object_store/test_backup
  nodetool restore: add primary-replica-only option
  nodetool refresh: Enable scope={all,dc,rack} with primary_replica_only
  Enable scoped primary replica only streaming
  Support primary_replica_only for native restore API
2025-11-13 12:11:18 +03:00
Tomasz Grabiec
10b893dc27 Merge 'load_stats: fix bug in migrate_tablet_size()' from Ferenc Szili
`topology_cooridinator::migrate_tablet_size()` was introduced in 10f07fb95a. It has a bug where the has_tablet_size() lambda always returns false because of bad comparison of iterators after a table and tablet search:

```
if (auto table_i = tables.find(gid.table); table_i != tables.find(gid.table)) {
    if (auto size_i = table_i->second.find(trange); size_i != table_i->second.find(trange)) {
```

This change also fixes a problem where the `migrate_tablet_size()` would crash with a `std::out_of_range` if the pending node was not present in load_stats.

This change fixes these two problems and moves the functionality into a separate method of `load_stats`. It also adds tests for the new method.

A version containing this bug has not been released yet, so no backport is needed.

Closes scylladb/scylladb#26946

* github.com:scylladb/scylladb:
  load_stats: add test for migrate_tablet_size()
  load_stats: fix problem with tablet size migration
2025-11-12 23:48:37 +01:00
Nadav Har'El
5839574294 Merge 'cql3: Fix std::bad_cast when deserializing vectors of collections' from Karol Nowacki
cql3: Fix std::bad_cast when deserializing vectors of collections

This PR fixes a bug where attempting to INSERT a vector containing collections (e.g., `vector<set<int>,1>`) would fail. On the client side, this manifested as a `ServerError: std::bad_cast`.

The cause was "type slicing" issue in the reserialize_value function. When retrieving the vector's element type, the result was being assigned by value (using auto) instead of by reference.
This "sliced" the polymorphic abstract_type object, stripping it of its actual derived type information. As a result, a subsequent dynamic_cast would fail, even if the underlying type was correct.

To prevent this entire class of bugs from happening again, I've made the polymorphic base class `abstract_type` explicitly uncopyable.

Fixes: #26704

This fix needs to be backported as these releases are affected: `2025.4` , `2025.3`.

Closes scylladb/scylladb#26740

* github.com:scylladb/scylladb:
  cql3: Make abstract_type explicitly noncopyable
  cql3: Fix std::bad_cast when deserializing vectors of collections
2025-11-13 00:24:25 +02:00
Nadav Har'El
4de88a7fdc test/cqlpy: fix run script for materialized views on tablets
Recently we enabled tablets by default, but it is necessary to
enable rf_rack_valid_keyspaces if materialized views are to be used
with tablets, and this option is *not* the default.

We did add this option in test/pylib/scylla_cluster.py which is
used by test.py, but we didn't add it to test/cqlpy/run.py, so
the test/cqlpy/run script is no longer able to run tests with
materialized views. So this patch adds the missing configuration
to run.py.

FIxes #26918

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26919
2025-11-12 11:56:21 +03:00
Karol Nowacki
960fe3da60 cql3: Fix std::bad_cast when deserializing vectors of collections
When deserializing a vector whose elements are collections (e.g., set, list),
the operation raises a `std::bad_cast` exception.

This was caused by type slicing due to an incorrect assignment of a
polymorphic type by value instead of by reference. This resulted in a
failed `dynamic_cast` even when the underlying type was correct.
2025-11-12 09:11:56 +01:00
Ferenc Szili
fcbc239413 load_stats: add test for migrate_tablet_size()
This change adds tests which validate the functionality of
load_stats::migrate_tablet_size()
2025-11-11 14:28:31 +01:00
Benny Halevy
a290505239 utils: stall_free: add dispose_gently
dispose_gently consumes the object moved to it,
clearing it gently before it's destroyed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#26356
2025-11-11 12:20:18 +02:00
Nadav Har'El
b659dfcbe9 test/cqlpy: comment out Cassandra check that is no longer relevant
In the test translated from Cassandra validation/operations/alter_test.py
we had two lines in the beginning of an unrelated test that verified
that CREATE KEYSPACE is not allowed without replication parameters.
But starting recently, ScyllaDB does have defaults and does allow these
CREATE KEYSPACE. So comment out these two test lines.

We didn't notice that this test started to fail, because it was already
marked xfail, because in the main part of this test, it reproduces a
different issue!

The annoying side-affect of these no-longer-passing checks was that
because the test expected a CREATE KEYSPACE to fail, it didn't bother
to delete this keyspace when it finished, which causes test.py to
report that there's a problem because some keyspaces still exist at the
end of the test. Now that we fixed this problem, we no longer need to
list this test in test/cqlpy/suite.yaml as a test that leaves behind
undeleted keyspaces.

Fixes #26292

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26341
2025-11-11 10:34:27 +02:00
Botond Dénes
042303f0c9 Merge 'Alternator: enable tablets by default - depending on tablets_mode_for_new_keyspaces' from Nadav Har'El
Before this series, Alternator's CreateTable operation defaults to creating a table replicated with vnodes, not tablets. The reasons for this default included missing support for LWT, Materialized Views, Alternator TTL and Alternator Streams if tablets are used. But today, all of these (except the still-experimental Alternator Streams) are now fully available with tablets, so we are finally ready to switch Alternator to use tablets by default in new tables.

We will use the same configuration parameter that CQL uses, tablets_mode_for_new_keyspaces, to determine whether new keyspaces use tablets by default. If set to `enabled`, tablets are used by default on new tables. If set to `disabled`, tablets will not be used by default (i.e., vnodes will be used, as before). A third value, `enforced` is similar to `enabled` but forbids overriding the default to vnodes when creating a table.

As before, the user can set a tag during the CreateTable operation to override the default choice of tablets or vnodes (unless in `enforced` mode). This tag is now named `system:initial_tablets` - whereas before this patch it was called `experimental:initial_tablets`. The rules stay the same as with the earlier, experimental:initial_tablets tag: when supplied with a numeric value, the table will use tablets. When supplied with something else (like a string "none"), the table will use vnodes.

Fixes https://github.com/scylladb/scylladb/issues/22463

Backport to 2025.4, it's important not to delay phasing out vnodes.

Closes scylladb/scylladb#26836

* github.com:scylladb/scylladb:
  test,alternator: use 3-rack clusters in tests
  alternator: improve error in tablets_mode_for_new_keyspaces=enforced
  config: make tablets_mode_for_new_keyspaces live-updatable
  alternator: improve comment about non-hidden system tags
  alternator: Fix test_ttl_expiration_streams()
  alternator: Fix test_scan_paging_missing_limit()
  alternator: Don't require vnodes for TTL tests
  alternator: Remove obsolete test from test_table.py
  alternator: Fix tag name to request vnodes
  alternator: Fix test name clash in test_tablets.py
  alternator: test_tablets.py handles new policy reg. tablets
  alternator: Update doc regarding tablets support
  alternator: Support `tablets_mode_for_new_keyspaces` config flag
  Fix incorrect hint for tablets_mode_for_new_keyspaces
  Fix comment for tablets_mode_for_new_keyspaces
2025-11-11 09:45:29 +02:00
Nikos Dragazis
94c4f651ca test/cqlpy: Test secondary index with short reads
Add a test to check that paged secondary index queries behave correctly
when pages are short. This is currently failing in Scylla, but passes in
Cassandra 5, therefore marked as "xfailing". Refer to the test's
docstring for more details.

The bug is a regression introduced by commit f6f18b1.
`test/cqlpy/run --release ...` shows that the test passes in 5.1 but
fails in 5.2 onwards.

Refs #25839.

Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>

Closes scylladb/scylladb#25843
2025-11-11 09:28:45 +02:00
Robert Bindar
a04ebb829c Add cluster tests for checking scoped primary_replica_only streaming
This commits adds a tests checking various scenarios of restoring
via load and stream with primary_replica_only and a scope specified.

The tests check that in a few topologies, a mutation is replicated
a correct amount of times given primary_replica_only and that
streaming happens according to the scope rule passed.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:18:01 +02:00
Robert Bindar
d4e43bd34c Refactor cluster/object_store/test_backup
This PR splits the suppport code from test_backup.py
into multiple functions so less duplicated code is
produced by new tests using it. It also makes it a bit
easier to understand.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:18:01 +02:00
Robert Bindar
83aee954b4 nodetool refresh: Enable scope={all,dc,rack} with primary_replica_only
So far it was not allowed to pass a scope when using
the primary_replica_only option, this patch enables
it because the concepts are now combined so that:
- scope=all primary_replica_only=true gets the global primary replica
- scope=dc primary_replica_only=true gets the local primary replica
- scope=rack primary_replica_only=true is like a noop, it gets the only
  replica in the rack (rf=#racks)
- scope=node primary_replica_only=node is not allowed

Fixes #26584

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:18:01 +02:00
Pavel Emelyanov
decf86b146 Merge 'Make AWS & Azure KMS boost testing use fixture + include Azure in pytests' from Calle Wilund
* Adds test fixture for AWS KMS
* Adds test fixture for Azure KMS
* Adds key provider proxy for Azure to pytests (ported dtests)
* Make test gather for boost tests handle suites
* Fix GCP test snafu

Fixes #26781
Fixes #26780
Fixes #26776
Fixes #26775

Closes scylladb/scylladb#26785

* github.com:scylladb/scylladb:
  gcp_object_storage_test: Re-enable parallelism.
  test::pylib: Add azure (mock) testing to EAR matrix
  test::boost::encryption_at_rest: Remove redundant azure test indent
  test::boost::encryption_at_rest: Move azure tests to use fixture
  test::lib: Add azure mock/real server fixture
  test::pylib::boost: Fix test gather to handle test suites
  utils::gcp::object_storage: Fix typo in semaphore init
  test::boost::encryption_at_rest_test: Remove redundant indent
  test::boost::test_encryption_at_rest: Move to AWS KMS fixture for kms test
  test::boost::test_encryption_at_rest: Reorder tests and helpers
  ent::encryption: Make text helper routines take std::string
  test::pylib::dockerized_service: Handle docker/podman bind error message
  test::lib::aws_kms_fixture: Add a fixture object to run mock AWS KMS
  test::lib::gcs_fixture: Only set port if running docker image + more retry
2025-11-10 14:35:05 +03:00
Yauheni Khatsianevich
d3e62b15db fix(test): minor typo fix, removing redundant param from logging
Closes scylladb/scylladb#26901
2025-11-10 08:42:11 +03:00
Dario Mirovic
7ec9e23ee3 test: cqlpy: add test case for non-numeric PERCENTILE value
Add test case for non-numeric PERCENTILE value, which raises an error
different to the out-of-range invalid values. Regex in the test
test_invalid_percentile_speculative_retry_values is expanded.

Refs #26369
2025-11-09 13:59:36 +01:00
Dario Mirovic
5d1913a502 test: cqlpy: add test for valid speculative_retry values
test_valid_percentile_speculative_retry_values is introduced to test that
valid values for speculative_retry are properly accepted.

Some of the values are moved from the
test_invalid_percentile_speculative_retry_values test, because
the previous commit added support for them.

Refs #26369
2025-11-09 13:23:26 +01:00
Nadav Har'El
65ed678109 test,alternator: use 3-rack clusters in tests
With tablets enabled, we can't create an Alternator table on a three-
node cluster with a single rack, since Scylla refuses RF=3 with just
one rack and we get the error:

    An error occurred (InternalServerError) when calling the CreateTable
    operation: ... Replication factor 3 exceeds the number of racks (1) in
    dc datacenter1

So in test/cluster/test_alternator.py we need to use the incantation
"auto_rack_dc='dc1'" every time that we create a three-node cluster.

Before this patch, several tests in test/cluster/test_alternator.py
failed on this error, with this patch all of them pass.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-09 12:52:29 +02:00
Nadav Har'El
c03081eb12 alternator: improve error in tablets_mode_for_new_keyspaces=enforced
When in tablets_mode_for_new_keyspaces=enforced mode, Alternator is
supposed to fail when CreateTable asks explicitly for vnodes. Before
this patch, this error was an ugly "Internal Server Error" (an
exception thrown from deep inside the implementation), this patch
checks for this case in the right place, to generate a proper
ValidationException with a proper error message.

We also enable the test test_tablets_tag_vs_config which should have
caught this error, but didn't because it was marked xfail because
tablets_mode_for_new_keyspaces had not been live-updatable. Now that
it is, we can enable the test. I also improved the test to be slightly
faster (no need to change the configuration so many times) and also
check the ordinary case - where the schema doesn't choose neither
vnodes nor tablets explicitly and we should just use the default.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
eeb3a40afb alternator: Fix test_ttl_expiration_streams()
The test is now aware of the new name of the
`system:initial_tablets` tag.
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
a659698c6d alternator: Fix test_scan_paging_missing_limit()
With tablets, the test begun failing. The failure was correlated with
the number of initial tablets, which when kept at default, equals
4 tablets per shard in release build and 2 tablets per shard in dev
build.

In this patch we split the test into two - one with a more data in
the table to check the original purpose of this test - that Scan
doesn't return the entire table in one page if "Limit" is missing.
The other test reproduces issue #10327 - that when the table is
small, Scan's page size isn't strictly limited to 1MB as it is in
DynamoDB.

Experimentally, 8000 KB of data (compared to 6000 KB before this patch)
is enough when we have up to 4 initial tablets per shard (so 8 initial
tablets on a two-shard node as we typically run in tests).

Original patch by Piotr Szymaniak <piotr.szymaniak@scylladb.com>
modified by Nadav Har'El <nyh@scylladb.com>
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
345747775b alternator: Don't require vnodes for TTL tests
Since #23662 Alternator supports TTL with tablets too. Let's clear some
leftovers causing Alternator to test TTL with vnodes instead of with
what is default for Alternator (tablets or vnodes).
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
274d0b6d62 alternator: Remove obsolete test from test_table.py
Since Alternator is capable of runnng with tablets according to the
flag in config, remove the obsolete test that is making sure
that Alternator runs with vnodes.
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
63897370cb alternator: Fix tag name to request vnodes
The tag was lately renamed from `experimental:initial_tablets` to
`system::initial_tablets`. This commit fixes both the tests as well as
the exceptions sent to the user instructing how to create table with
vnodes.
2025-11-09 12:52:29 +02:00
Piotr Szymaniak
c7de7e76f4 alternator: Fix test name clash in test_tablets.py 2025-11-09 12:52:28 +02:00
Piotr Szymaniak
7466325028 alternator: test_tablets.py handles new policy reg. tablets
Adjust the tests so they are in-line with the config flag
'tablets_mode_for_new_keyspaces` that the Alternator learned to honour.
2025-11-09 12:52:28 +02:00
Botond Dénes
cdba3bebda Merge 'Generalize directory checks in database_test's snapshot test cases' from Pavel Emelyanov
Those test cases use lister::scan_dir() to validate the contents of snapshot directory of a table against this table's base directory. This PR generalizes the listing code making it shorter.

Also, the snapshot_skip_flush_works case is missing the check for "schema.cql" file. Nothing is wrong with it, but the test is more accurate if checking it.

Also, the snapshot_with_quarantine_works case tries to check if one set of names is sub-set of another using lengthy code. Using std::includes improves the test readability a lot.

Also, the PR replaces lister::scan_dir() with directory_lister. The former is going to be removed some day (see also #26586)

Improving existing working test, no backport is needed.

Closes scylladb/scylladb#26693

* github.com:scylladb/scylladb:
  database_test: Simplify snapshot_with_quarantine_works() test
  database_test: Improve snapshot_skip_flush_works test
  database_test: Simplify snapshot_works() tests
  database_test: Use collect_files() to remove files
  database_test: Use collectz_files() to count files in directory
  database_test: Introduce collect_files() helper
2025-11-07 16:04:02 +02:00
Michał Chojnowski
b82c2aec96 sstables/trie: fix an assertion violation in bti_partition_index_writer_impl::write_last_key
_last_key is a multi-fragment buffer.

Some prefix of _last_key (up to _last_key_mismatch) is
unneeded because it's already a part of the trie.
Some suffix of _last_key (after needed_prefix) is unneeded
because _last_key can be differentiated from its neighbors even without it.

The job of write_last_key() is to find the middle fragments,
(containing the range `[_last_key_mismatch, needed_prefix)`)
trim the first and last of the middle fragments appropriately,
and feed them to the trie writer.

But there's an error in the current logic,
in the case where `_last_key_mismatch` falls on a fragment boundary.
To describe it with an example, if the key is fragmented like
`aaa|bbb|ccc`, `_last_key_mismatch == 3`, and `needed_prefix == 7`,
then the intended output to the trie writer is `bbb|c`,
but the actual output is `|bbb|c`. (I.e. the first fragment is empty).

Technically the trie writer could handle empty fragments,
but it has an assertion against them, because they are a questionable thing.

Fix that.

We also extend bti_index_test so that it's able to hit the assert
violation (before the patch). The reason why it wasn't able to do that
before the patch is that the violation requires decorated keys to differ
on the _first_ byte of a partition key column, but the keys generated
by the test only differed on the last byte of the column.
(Because the test was using sequential integers to make the values more
human-readable during debugging). So we modify the key generation
to use random values that can differ on any position.

Fixes scylladb/scylladb#26819

Closes scylladb/scylladb#26839
2025-11-07 11:25:07 +02:00
Abhinav Jha
ab0e0eab90 raft topology: skip non-idempotent steps in decommission path to avoid problems during races
In the present scenario, there are issues in left_token_ring transition state
execution in the decommissioning path. In case of concurrent mutation race
conditions, we enter left_token_ring more than once, and apparently if
we enter left token ring second time, we try to barrier the decommisioned
node, which at this point is no longer possible. That's what causes the errors.

This pr resolves the issue by adding a check right in the start of
left_token_ring to check if the first topology state update, which marks
the request as done is completed. In this case, its confirmed that this
is the second time flow is entering left_token_ring and the steps preceding
the request status update should be skipped. In such cases, all the rest
steps are skipped and topology node status update( which threw error in
previous trial) is executed directly. Node removal status from group0 is
also checked and remove operation is retried if failed last time.

Although these changes are done with regard to the decommission operation
behavior in `left_token_ring` transition state, but since the pr doesn't
interfere with the core logic, it should not derail any rollback specific
logic. The changes just prevent some non-idempotent operations from
re-occuring in case of failures. Rest of the core logic remain intact.

Test is also added to confirm the proper working of the same.

Fixes: scylladb/scylladb#20865

Backport is not needed, since this is not a super critical bug fix.

Closes scylladb/scylladb#26717
2025-11-07 10:07:49 +01:00
Asias He
dbeca7c14d repair: Add metric for time spent on tablet repair
It is useful to check time spent on tablet repair. It can be used to
compare incremental repair and non-incremental repair. The time does not
include the time waiting for the tablet scheduler to schedule the tablet
repair task.

Fixes #26505

Closes scylladb/scylladb#26502
2025-11-06 10:00:20 +03:00
Calle Wilund
b0061e8c6a gcp_object_storage_test: Re-enable parallelism.
Re-enable parallel execution to get better logs.
Note, this is somewhat wasteful, as we won't re-use test fixture here,
but in the end, it is probably an improvement.
2025-11-05 15:07:26 +00:00
Wojciech Mitros
0a22ac3c9e mv: don't mark the view as built if the reader produced no partitions
When we build a materialized view we read the entire base table from start to
end to generate all required view udpates. If a view is created while another view
is being built on the same base table, this is optimized - we start generating
view udpates for the new view from the base table rows that we're currently
reading, and we read the missed initial range again after the previous view
finishes building.
The view building progress is only updated after generating view updates for
some read partitions. However, there are scenarios where we'll generate no
view updates for the entire read range. If this was not handled we could
end up in an infinite view building loop like we did in https://github.com/scylladb/scylladb/issues/17293
To handle this, we mark the view as built if the reader generated no partitions.
However, this is not always the correct conclusion. Another scenario where
the reader won't encounter any partitions is when view building is interrupted,
and then we perform a reshard. In this scenario, we set the reader for all
shards to the last unbuilt token for an existing partition before the reshard.
However, this partition may not exist on a shard after reshard, and if there
are also no partitions with higher tokens, the reader will generate no partitions
even though it hasn't finished view building.
Additionally, we already have a check that prevents infinite view building loops
without taking the partitions generated by the reader into account. At the end
of stream, before looping back to the start, we advance current_key to the end
of the built range and check for built views in that range. This handles the case
where the entire range is empty - the conditions for a built view are:
1. the "next_token" is no greater than "first_token" (the view building process
looped back, so we've built all tokens above "first_token")
2. the "current_token" is no less than "first_token" (after looping back, we've
built all tokens below "first_token")

If the range is empty, we'll pass these conditions on an empty range after advancing
"current_key" to the end because:
1. after looping back, "next_token" will be set to `dht::minimum_token`
2. "current_key" will be set to `dht::ring_position::max()`

In this patch we remove the check for partitions generated by the reader. This fixes
the issue with resharding and it does not resurrect the issue with infinite view building
that the check was introduced for.

Fixes https://github.com/scylladb/scylladb/issues/26523

Closes scylladb/scylladb#26635
2025-11-05 17:02:32 +02:00
Nadav Har'El
8a07b41ae4 test/cqlpy: add test confirming page_size=0 disables paging
In pull request #26384 a discussion started whether page_size=0 really
disables paging, or maybe one needs page_size=-1 to truly disable paging.

The reason for that discussion was commit 08c81427b that started to
use page_size=-1 for internal unpaged queries, and commit 76b31a3 that
incorrectly claimed that page_size>=0 means paging is enabled.

This patch introduces a test that confirms that with page_size=0, paging
is truly disabled - including the size-based (1MB) paging.

The new test is Scylla-only, because Cassandra is anyway missing the
size-based page cutoff (see CASSANDRA-11745).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#26742
2025-11-05 15:52:16 +03:00
Tomasz Grabiec
f8879d797d tablet_allocator: Avoid load balancer failure when replacing the last node in a rack
Introduced in 9ebdeb2

The problem is specific to node replacing and rack-list RF. The
culprit is in the part of the load balancer which determines rack's
shard count. If we're replacing the last node, the rack will contain
no normal nodes, and shards_per_rack will have no entry for the rack,
on which the table still has replicas. This throws std::out_of_range
and fails the tablet draining stage, and node replace is failed.

No backport because the problem exists only on master.

Fixes #26768

Closes scylladb/scylladb#26783
2025-11-05 15:49:51 +03:00
Pavel Emelyanov
05d711f221 database_test: Simplify snapshot_with_quarantine_works() test
The test collects Data files from table dir, then _all_ files from
snapshot dir and then checks whether the former is the subset of the
latter. Using std::includes over two sets makes the code much shorter.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-05 15:35:28 +03:00
Pavel Emelyanov
c8492b3562 database_test: Improve snapshot_skip_flush_works test
It has two inaccuracies.

First, when checking the contents of table directory, it uses
pre-populated expected list with "manifest.json" in it. Weird.

Second, when cechking the contents of snapshot directory it doesn't
check if the "schema.cql" is there. It's always there, but if something
breaks in the future it may come unnoticed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-05 15:35:26 +03:00
Pavel Emelyanov
5a25d74b12 database_test: Simplify snapshot_works() tests
No functional changes here, just make use of the new lister to shorten
the code. A small side effect -- if the test fails because contents of
directories changes, it will print the exact difference in logs, not
just that N files are missing/present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-05 15:34:25 +03:00
Pavel Emelyanov
365044cdbb database_test: Use collect_files() to remove files
Some test cases remove files from table directory to perform some checks
over the taken snapshots. Using collect_files() helper makes the code
easier to read.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-05 15:34:24 +03:00
Pavel Emelyanov
e1f326d133 database_test: Use collectz_files() to count files in directory
Some test cases want to see that there are more than one file in a
directory, so they can just re-use the new helper. Much shorter this
way.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-05 15:32:58 +03:00
Pavel Emelyanov
60d1f78239 database_test: Introduce collect_files() helper
It returns a set of files in a given directoy. Will be used by all next
patches.

Implemented using directory_lister, not lister::scan_dir in order to
help removing the latter one in the future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-05 15:32:58 +03:00
Calle Wilund
6c6105e72e test::pylib: Add azure (mock) testing to EAR matrix
Fixes #26782

Adds a provider proxy for azure, using the existing mock server,
now as a fixture.
2025-11-05 10:22:23 +00:00
Calle Wilund
b8a6b6dba9 test::boost::encryption_at_rest: Remove redundant azure test indent 2025-11-05 10:22:23 +00:00
Calle Wilund
10e591bd6b test::boost::encryption_at_rest: Move azure tests to use fixture
Fixes #26781

Makes the test independent of wrapping scripts. Note: retains the
split into "real" and "mock" tests. For other tests, we either all
mock, or allow the environment to select mock or real. Here we have
them combined. More expensive, but otoh more thourough.
2025-11-05 10:22:22 +00:00
Calle Wilund
1d37873cba test::lib: Add azure mock/real server fixture
Wraps the real/mock azure server for test in a fixture.
Note: retains the current test setup which explicitly runs
some tests with "real" azure, if avail, and some always mock.
2025-11-05 10:22:22 +00:00
Calle Wilund
10041419dc test::pylib::boost: Fix test gather to handle test suites
Fixes #26775
2025-11-05 10:22:22 +00:00
Calle Wilund
2edf6cf325 test::boost::encryption_at_rest_test: Remove redundant indent
Removed empty scope and reindents kms test using fixtures.
2025-11-05 10:22:22 +00:00
Calle Wilund
286a655bc0 test::boost::test_encryption_at_rest: Move to AWS KMS fixture for kms test
Fixes #26780

Uses fake/real CI endpoint for AWS KMS tests, and moves these into a
suite for sharing the mock server.
2025-11-05 10:22:22 +00:00