Commit Graph

10533 Commits

Author SHA1 Message Date
Piotr Smaron
d4c28690e1 db: fail reads and writes with local consistencty level to a DC with RF=0
When read or write operations are performed on a DC with RF=0 with LOCAL_QUORUM
or LOCAL_ONE consistency level, Cassandra throws `Unavailable` exception.
Scylla allowed such read operations and failed write operations with a cryptic:
"broken promise" error. This occured because the initial availability
check passed (quorum of 0 requires 0 replicas), but execution failed
later when no replicas existed to process the mutation.

This patch adds an explicit RF=0 validation for LOCAL_ONE and LOCAL_QUORUM that
throws before attempting operation execution.

The change also requires `test_query_dc_with_rf_0_does_not_crash_db` to be
upgraded. This testcase was asserting somewhat similar scenario, but wasn't
taking into account the whole matrix of combinations:
- scenarios: successful vs unsuccesful operation outcome
- local consistency levels: LOCAL_QUORUM & LOCAL_ONE
- operations: SELECT (read) & INSERT (write)

and so it's been extended to cover both the pre-existing and the current issues
and the whole matrix of combinations.

Fixes: scylladb/scylladb#27893
2026-01-22 12:49:45 +01:00
Sergey Zolotukhin
799d837295 test: disable test_start_bootstrapped_with_invalid_seed
The test intermittently fails when an invalid DNS name is resolved,
likely due to ISP DNS error hijacking (see scylladb/scylladb#28153).

Disable this test to unblock CI.

Fixes scylladb/scylladb#28153

Closes scylladb/scylladb#28162
2026-01-15 10:25:45 +01:00
Tomasz Grabiec
eef798d84f Merge 'Distribute data evenly among primary replicas during restore' from Robert Bindar
Most likely 817fdad uncovered the fact that our choice of primary replica was resonating with tablet allocation and we were ending up picking the same replica as primary within a scope instead of rotating primaryship among all replicas in the scope.
This created situations where for instance, restoring into a 9 nodes with primary_replica_only=true would put all data into 3 nodes, leaving the other 6 unused. The balancing of the dataset was performed by the subsequent repair step.

This PR fixes this by changing the formula for picking up the primary replica out of a set of eligible replicas from within the passed scope.
The PR also extends the testing scenarios in `test_backup.py` so we get to run restore for a set of topologies, for all combinations of scope, primary_replica_only and min_tablet_counts.
Most of the work was done by @bhalevy [here](https://github.com/scylladb/scylladb/compare/master...bhalevy:scylla:load-balance-primary-replica), this PR just splitted it and did touchups here and there.

Fixes #27281

Closes scylladb/scylladb#27397

* github.com:scylladb/scylladb:
  test: reduce dataset and number of test cases or debug builds
  test: bump repair timeout up, it's sometimes not enough in CI
  test: refactor test_refresh.py to match test_restore_with_streaming_scopes.
  test: extend test_restore_with_streaming_scopes
  test: Adjust test_restore_primary_replica_different_dc_scope_all
  test: Refactor restoring code in test_backup to match SM pattern
  test: add check_mutation_replicas calls after fresh creation of dataset
  test: extend create_dataset to accept consistency_level
  test: refactor check_mutation_replicas so it's more readable
  test: make create_dataset async and refactor so it's configurable
  test: use defaultdict in collect_mutations
  test: add log marks to facilitate reusing server for restore
  locator: tablets: Distribute data evenly among primary replicas during restore
2026-01-14 18:57:55 +01:00
Avi Kivity
bd08b6e5b2 Merge 'Unify configuration of object storage endpoints (take 2)' from Pavel Emelyanov
To configure S3 storage, one needs to do

```
object_storage_endpoints:
  - name: s3.us-east-1.amazonaws.com
    port: 443
    https: true
    aws_region: us-east-1
```

and for GCS it's

```
object_storage_endpoints:
  - name: https://storage.googleapis.com:433
    type: gs
    credentials_file: <gcp account credentials json file>
```

This PR updates the S3 part to look like

```
object_storage_endpoints:
  - name: https://s3.us-east-1.amazonaws.com:443
    aws_region: us-east-1
```

fixes: #26570

This is 2nd attempt, previous one (#27360) was reverted because it reported endpoint configs in new format via API and CQL always, even if the endpoint was configured in the old way. This "broke" scylla manager and some dtests. This version has this bug fixed, and endpoints are reported in the same format as they were configured with.

About correctness of the changes.

No modifications to existing tests are made here, so old format is respected correctly (as far as it's covered by tests). To prove the new format works the the test_get_object_store_endpoints is extended to validate both options. Some preparations to this test to make this happen come on their own with the PR #28111  to show that they are valid and pass before changing the core code.

Enhancing the way configuration is made, likely no need to backport.

Closes scylladb/scylladb#28112

* github.com:scylladb/scylladb:
  test: Validate S3 endpoints new format works
  docs: Update docs according to new endpoints config option format
  object_storage: Create s3 client with "extended" endpoint name
  s3/storage: Tune config updating
  sstable: Shuffle args for s3_client_wrapper
  test: Rename badconf variable into objconf
  test: Split the object_store/test_get_object_store_endpoints test
2026-01-14 18:29:03 +02:00
Yaniv Michael Kaul
d919aacc69 storage_proxy: mark write_timeouts metric for counter write timeouts
When a counter write times out (due to rpc::timeout_error or timed_out_error),
the code was throwing mutation_write_timeout_exception but not marking the
write_timeouts metric. This resulted in counter write timeouts not being
counted in the scylla_storage_proxy_coordinator_write_timeouts metric.

Regular writes go through mutate_internal -> mutate_end, which catches
mutation_write_timeout_exception and marks the metric. However, counter
writes use a separate code path (mutate_counters) that has its own
exception handling but was missing the metric update.

This fix adds get_stats().write_timeouts.mark() before throwing the
timeout exception in the counter write path, consistent with how the
CAS path handles cas_write_timeouts.

Refs: https://scylladb.atlassian.net/browse/SCYLLADB-245

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#28019
2026-01-14 17:50:46 +02:00
Gleb Natapov
bee5f63cb6 topology coordinator: complete pending operation for a replaced node
A replaced node may have pending operation on it. The replace operation
will move the node into the 'left' state and the request will never be
completed. More over the code does not expect left node to have a
request. It will try to process the request and will crash because the
node for the request will not be found.

The patch checks is the replaced node has peening request and completes
it with failure. It also changes topology loading code to skip requests
for nodes that are in a left state. This is not strictly needed, but
makes the code more robust.

Fixes #27990

Closes scylladb/scylladb#28009
2026-01-14 13:11:27 +01:00
Patryk Jędrzejczak
6b5923c64e test: test_group0_schema_versioning: wait for schema sync in system.local
`test_schema_versioning_with_recovery` is currently flaky. It performs
a write with CL=ALL and then checks if the schema version is the same on
all nodes by calling `verify_table_versions_synced`. All nodes are expected
to sync their schema before handling the replica write. The node in
RECOVERY mode should do it through a schema pull, and other nodes should do
it through a group 0 read barrier.

The problem is in `verify_local_schema_versions_synced` that compares the
schema versions in `system.local`. The node in RECOVERY mode updates the
schema version in `system.local` after it acknowledges the replica write
as completed. Hence, the check can fail.

We fix the problem by making the function wait until the schema versions
match.

Note that RECOVERY mode is about to be retired together with the whole
gossip-based topology in 2026.2. So, this test is about to be deleted.
However, we still want to fix it, so that it doesn't bother us in older
branches.

Fixes #23803

Closes scylladb/scylladb#28114
2026-01-14 09:55:45 +01:00
Botond Dénes
122b7847e5 Merge 'index: Accept view properties in CREATE INDEX' from Dawid Mędrek
Problem
-------
Secondary indexes are implemented via materialized views under the
hood. The way an index behaves is determined by the configuration
of the view. Currently, it can be modified by performing the CQL
statement `ALTER MATERIALIZED VIEW` on it. However, that raises some
concerns.

Consider, for instance, the following scenario:

1. The user creates a secondary index on a table.
2. In parallel, the user performs writes to the base table.
3. The user modifies the underlying materialized view, e.g. by setting
   the `synchronous_updates` to `true` [1].

Some of the writes that happened before step 3 used the default value
of the property (which is `false`). That had an actual consequence
on what happened later on: the view updates were performed
asynchronously. Only after step 3 had finished did it change.

Unfortunately, as of now, there is no way to avoid a situation like
that. Whenever the user wants to configure a secondary index they're
creating, they need to do it in another schema change. Since it's
not always possible to control how the database is manipulated in
the meantime, it leads to problems like the one described.

That's not all, though. The fact that it's not possible to configure
secondary indexes is inconsistent with other schema entities. When
it comes to tables or materialized views, the user always have a means
to set some or even all of the properties during their creation.

Solution
--------
The solution to this problem is extending the `CREATE INDEX` CQL
statement by view properties. The syntax is of form:

```
> CREATE INDEX <index name>
> .. ON <keyspace>.<table> (<columns>)
> .. WITH <properties>
```

where `<properties>` corresponds to both index-specific and view
properties [2, 3]. View properties can only be used with indexes
implemented with materialized views; for example, it will be impossible
to create a vector index when specifying any view property (see
examples below).

When a view property is provided, it will be applied when creating the
underlying materialized view. The behavior should be similar to how
other CQL statements responsible for creating schema entities work.

High-level implementation strategy
----------------------------------
1. Make auxiliary changes.
2. Introduce data structures representing the new set of index
   properties: both index-specific and those corresponding to the
   underlying view.
3. Extend `CREATE INDEX` to accept view properties.
4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include
   view properties in their output.

User documentation is also updated at the steps to reflect the
corresponding changes.

Implementation considerations
-----------------------------
There are a number of schema properties that are now obsolete. They're
accepted by other CQL statements, but they have no effect. They
include:

* `index_interval`
* `replicate_on_write`
* `populate_io_cache_on_flush`
* `read_repair_chance`
* `dclocal_read_repair_chance`

If the user tries to create a secondary index specifying any of those
keywords, the statement will fail with an appropriate error (see
examples below).

Unlike materialized views, we forbid specifying the clustering order
when creating a secondary index [4]. This limitation may be lifted
later on, but it's a detail that may or may not prove troublesome. It's
better to postpone covering it to when we have a better perspective on
the consequences it would bring.

Examples
--------
Good examples
```
> CREATE INDEX idx ON ks.t (v);
> CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property';
> CREATE INDEX idx ON ks.t (v)
  .. WITH comment = 'multiple view properties are ok'
  .. AND synchronous_updates = true;
> CREATE INDEX idx ON ks.t (v)
  .. WITH comment = 'default value ok'
  .. AND synchronous_updates = false;
```

Bad examples
```
> CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true;

SyntaxException: Unknown property 'replicate_on_write'

> CREATE INDEX idx ON ks.t (v)
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="Cannot specify options for a non-CUSTOM index"

> CREATE CUSTOM INDEX idx ON ks.t (v)
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="CUSTOM index requires specifying the index class"

> CREATE CUSTOM INDEX idx ON ks.t (v)
  .. USING 'vector_index'
  .. WITH OPTIONS = {'option1': 'value1'}
  .. AND comment = 'some text';

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="You cannot use view properties with a vector index"

> CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC);

InvalidRequest: Error from server: code=2200 [Invalid query]
  message="Indexes do not allow for specifying the clustering order"
```

and so on. For more examples, see the relevant tests.

References:
[1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views
[2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index
[3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options
[4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clause

Fixes scylladb/scylladb#16454

Backport: not needed. This is an enhancement.

Closes scylladb/scylladb#24977

* github.com:scylladb/scylladb:
  cql3: Extend DESC INDEX by view properties
  cql3: Forbid using CLUSTERING ORDER BY when creating index
  cql3: Extend CREATE INDEX by MV properties
  cql3/statements/create_index_statement: Allow for view options
  cql3/statements/create_index_statement: Rename member
  cql3/statements/index_prop_defs: Re-introduce index_prop_defs
  cql3/statements/property_definitions: Add extract_property()
  cql3/statements/index_prop_defs.cc: Add namespace
  cql3/statements/index_prop_defs.hh: Rename type
  cql3/statements/view_prop_defs.cc: Move validation logic into file
  cql3/statements: Introduce view_prop_defs.{hh,cc}
  cql3/statements/create_view_statement.cc: Move validation of ID
  schema/schema.hh: Do not include index_prop_defs.hh
2026-01-14 09:54:27 +02:00
Avi Kivity
489d1a0fbc Merge 'replica: don't throw exceptions for read timeout' from Botond Dénes
Read timeouts are a common occurence and they typically occur when the replica is overloaded. So throwing exceptions for read timeouts is very harmful. Be careful not to thow exceptions while propagating them up the future chain. Add a test to enfore and detect regressions.

Fixes: scylladb/scylladb#25062

Improvement, normally not a backport candidate, but we may decide to backport if customer(s) are found to suffer from this.

Closes scylladb/scylladb#25068

* github.com:scylladb/scylladb:
  reader_permit: remove check_abort()
  test/boost/database_test: add test for read timeout exceptions
  sstables/mx/reader: don't throw exceptions on the read-path
  readers/multishard: don't throw exceptions on the read-path
  replica/table: don't throw exceptions on the read-path
  multishard_mutation_query: fix indentation
  multishard_mutation_query: don't throw exceptions on the read-path
  service/storage_proxy: don't throw exceptions on the full-scan path
  cql3/query_processor: don't throw exceptions on the read-path
  reader_permit: add get_abort_exception()
2026-01-13 16:17:41 +02:00
Avi Kivity
c6dfae5661 treewide: #include Seastar headers with angle brackets
Seastar is an external library from the point of view of
ScyllaDB, so should be included with angle brackets.

Closes scylladb/scylladb#27947
2026-01-13 14:56:15 +02:00
Tomasz Grabiec
63b9a7e2b5 test: pylib: log_browsing: Grep logs without considering newly appended lines
At the end of the test case, the framework greps logs for errors and
backtraces. The servers are still running at this point. Some test
cases enable debug-level logging. If servers manage to produce new
lines between the python script processes them, the grep will never
return.

Protect against this by grepping over a file snapshot.

Fixes #28086

Closes scylladb/scylladb#28088
2026-01-13 14:41:02 +02:00
Pavel Emelyanov
9ffd22491f test: Validate S3 endpoints new format works
Extend the test_get_object_store_endpoints() test to configure S3
endpoints in full-url format and check that they are rendered properly
via API/CQL.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-01-13 13:24:18 +03:00
Pavel Emelyanov
83e88d206c test: Rename badconf variable into objconf
It's not actually a "bad" config, it's just some config the test works
with.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-01-13 13:23:20 +03:00
Pavel Emelyanov
9c627bc44a test: Split the object_store/test_get_object_store_endpoints test
It tests two things -- the way object storage config is represented via
API and CQL (from sytem.config) and that updating config affects CREATE
KEYSPACE CQL (with keyspace storage options)

It's better to split the test, as its former part is going to be
extented to validate old/new config formats (see #26570)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-01-13 13:23:03 +03:00
Robert Bindar
dfcabb5fa4 test: reduce dataset and number of test cases or debug builds
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:51 +02:00
Robert Bindar
ca3c57e821 test: bump repair timeout up, it's sometimes not enough in CI
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:49 +02:00
Robert Bindar
6f5e58e718 test: refactor test_refresh.py to match test_restore_with_streaming_scopes.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:48 +02:00
Robert Bindar
6e636a4231 test: extend test_restore_with_streaming_scopes
to test restoring with a different min_tablet_count
than the schema was originally created with.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:46 +02:00
Robert Bindar
92cd1ddec3 test: Adjust test_restore_primary_replica_different_dc_scope_all
to match the new topology arhitecture

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:44 +02:00
Robert Bindar
db13ece9a0 test: Refactor restoring code in test_backup to match SM pattern
This patch refactors the restoring code in cluster/test_backup.py
so it matches better the way SM works.
The patch also refactors test_restore_with_streaming_scopes so to
facilitate running restore scenarios under all supported scopes
with or w/o primary_replica_only enabled by reusing the servers
and backups for a topology. This allows us to test a lot more scenarios
without making the test impossibly slow.

split from bhalevy/load-balance-primary-replica

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:43 +02:00
Robert Bindar
ba01589f53 test: add check_mutation_replicas calls after fresh creation of dataset
to validate that mutation assertions are sane

split from bhalevy/load-balance-primary-replica

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:41 +02:00
Robert Bindar
b835d32cb0 test: extend create_dataset to accept consistency_level
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:35 +02:00
Robert Bindar
e7d44356d9 test: refactor check_mutation_replicas so it's more readable
split from bhalevy/load-balance-primary-replica

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:31 +02:00
Robert Bindar
733b4dbbb7 test: make create_dataset async and refactor so it's configurable
with num_keys and min_tablet_count

split from bhalevy/load-balance-primary-replica

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:46:20 +02:00
Robert Bindar
f2c8949e4a test: use defaultdict in collect_mutations
split from bhalevy/load-balance-primary-replica

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:45:03 +02:00
Robert Bindar
45faeba97d test: add log marks to facilitate reusing server for restore
split from bhalevy/load-balance-primary-replica

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-01-13 11:44:48 +02:00
Nadav Har'El
2a831ad373 Merge 'Address CodeQL Errors' from Botond Dénes
Address all errors reported by CodeQL as reported on https://github.com/scylladb/scylladb/security/quality.
This is a mixed bag, with some harmless issues, while others are severe problems which will result in the code breaking (if it is even run). I suspect some of the more severe problems were found in dead code that is not used at all -- hence nobody noticed.
Still, these issues are good to fix, so we can reduce noise in the reports and improve the maintainability of the code.

Code cleanup, no backport

Closes scylladb/scylladb#27838

* github.com:scylladb/scylladb:
  pgo/pgo.py: don't mutate input params
  test/pylib/coverage_utils.py: profdata_to_lcov: don't mutate defaulted param
  test/cluster/dtest/tools/misc.py: add type annotations to list_to_hashed_dict()
  idl-compiler.py: raise TypeError instead of raw str
  test/pylib/lcov_utils.py: don't call set when iterating over it
  configure.py: move away from .format(**locals())
  test/cluster/object_store/conftest.py: add missing call to parent constructor
  idl-compiler.py: add missing call to parent class constructor
  tools/scyllatop/fake.py: pass correct number of args to _add_metric
2026-01-13 11:43:57 +02:00
Nadav Har'El
609b283d98 test/cqlpy: add another reproducer for known issue
This patch adds a second reproducer for issue #25839, which is about
scanning a secondary index which returns partial results. The new test
uses count(*) without requesting the row themselves, but still has the
same problem of counting only part of the rows. This is the problem that
a user reported in issue #28026.

Unlike the previous test, this test works correctly on older versions
of Scylla - by using larger data, like on Cassandra - without changing
a configuration variable that did not yet exist. So with this test we
can confirm that this bug is a Scylla 5.2 regression:

  test/cqlpy/run --release 5.1 test_secondary_index.py::test_short_count

passes, while

  test/cqlpy/run --release 5.2 test_secondary_index.py::test_short_count

fails. It also fails on master, so the new test is marked "xfail".

Refs #25839
Refs #28026

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#28108
2026-01-13 11:15:27 +02:00
Botond Dénes
7b562bb185 Merge 'system.clients: Address SSL refactor review comments' from Piotr Smaron and Copilot
Addresses outstanding review comments from PR #22961 where SSL field
collection was refactored into generic_server::connection base class.
This patch consists of minor cosmetic enhancements for increased
readability, mainly, with some minor fixups explained in specific
commits.

Cosmetic changes, no need to backport.

Closes scylladb/scylladb#27575

* github.com:scylladb/scylladb:
  test_ssl: fix indentation
  generic_server: improve logging broken TLS connection
  test_ssl: improve timeout and readability
  alternator/server: update SSL comment
2026-01-13 11:00:26 +02:00
Botond Dénes
354c805e6a reader_permit: remove check_abort()
This method can cause performance regressions if used in the wrong place
-- namely if it is used to abort reads by throwing the abort exception.
Exceptions should be propagated during reads without throwing them,
otherwise they cause extra CPU load, making a bad situation worse.
Remove this method, so it doesn't accidentally get more users, migrate
remaining users to get_abort_exception().
2026-01-13 10:47:57 +02:00
Botond Dénes
a0ddac655d test/boost/database_test: add test for read timeout exceptions
Read timeouts shouldn't trigger exceptions thrown, exceptions should be
solely propagated via futures, otherwise they put extra strain on the
system at the worst possible time: when it is overload already enough
that reads started to time out.
The test covers both single partition reads and full scans, with two
scenarios:
* timeout while the read is queued
* timeout when the read is already ongoing
2026-01-13 10:47:57 +02:00
Michał Hudobski
c8aa49b196 vector search, paging: add test for paging warnings
We add a test that validates that indexed queries
do not throw a warning related to vector search paging

Fixes: SCYLLADB-248

Closes scylladb/scylladb#28077
2026-01-13 10:33:36 +02:00
Nadav Har'El
34191d8fd4 alternator: fix signature checking of headers with multiple spaces
We have a test in test_compressed_response.py that reproduces a bug
where in Alternator's signature checking code, if a header had multiple
consecutive spaces its signature isn't checked correctly.

This patch fixes this and that xfailing test begins to pass.

But it turns out that the handling of multiple consecutive spaces in
headers when calculating the authentication signature is just one example
of "header canonization" that the AWS Signature V4 specification requires
us to do. There are additional types of header canonization that Alternator
must do, and this patch also adds new tests in test_authorization.py for
checking *all* the types of canonization.

Fortunately, for all other types of canonizations, we already handled
them correctly - Alternator already lowercases header names, sorts them
alphabetically and removes leading and trailing spaces before calculating
the signature. So most of the new tests added pass also without this patch,
and only one of them, test_canonization_middle_whitespace, needs this
patch to pass. As usual, all the new tests also pass on DynamoDB.

Fixes #27775

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#28102
2026-01-13 10:29:13 +02:00
Andrei Chekun
3f14d45d6e test.py: fix the link for the failed_test directory
With new UI Jenkins escaping the HTML tags during rendering to prevent
XSS. This will show just link without custom name as a string that can
be copied and then pasted to navigate to the failed directory.

Closes scylladb/scylladb#28062
2026-01-13 10:22:38 +02:00
Yaniv Kaul
4e3aa53f8b test/rest_api/test_gossiper.py: fix for Variable defined multiple times
To fix the problem, we need to remove the first, redundant definition of
test_gossiper_unreachable_endpoints (lines 19-24). The second definition
(lines 25-40) should be retained as it has more substantial test logic.
No other code changes or imports are needed, as the test logic is
preserved fully in the retained definition.

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Closes scylladb/scylladb#27632
2026-01-13 10:17:29 +02:00
Avi Kivity
66aee0fb5e alternator: add optional listeners for proxy protocol v2
Following 954f2cbd2f, which added proxy protocol v2 listeners
for CQL, we do the same for alternator. We add two optional ports
for plain and TLS-wrapped HTTP.

We test each new port, that the old ports still work, and that
mixing up a port with no proxy protocol and a connection with proxy
protocol (or the opposite) fails. The latter serves to show
that the testing strategy is valid and doesn't just pass whatever
happens. We also verify that the correct addresses (and TLS mode)
show up in system.clients.

Closes scylladb/scylladb#27889
2026-01-13 09:59:24 +02:00
Nadav Har'El
e7df03127b alternator: support "deflate" encoding in request compression
Currently Alternator supports compressed requests in the gzip format
with "Content-Encoding: gzip". We did not support any other compression
formats.

It turns out that DynamoDB also supports the "deflate" encoding.
The "deflate" format is just a small variant of gzip and also supported
by the same zlib library that we already use, so it is very easy
to add support for it as well. So this patch adds it.

Beyond compatibility with DynamoDB, another benefit of this patch is
symmetry with our response compression support (PR #27454), where
we supported both gzip and deflate compression of responses - so
we should support the same for requests.

This patch also adds tests for Content-Encoding: deflate, which pass
on DynamoDB (proving that "deflate" is indeed supported there).
On Alternator the new tests failed before this patch and pass with
this patch.

Refs #27243 (which asks to support more compression formats).

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#27917
2026-01-13 09:58:12 +02:00
Nadav Har'El
a1f198d453 test/cqlpy: translate Cassandra's test InsertInvalidateSizedRecordsTest
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertInvalidateSizedRecordsTest.java into our
cqlpy framework.

This is one of the tests added to Cassandra as part of the vector
search work, but actually has nothing to do with vector search -
it checks what happens when key columns of different types exceeed
their maximum size (64KB).

Unfortunately, each one of the tests added here *fail* on ScyllaDB,
providing more reproducers for two already known issues (which
already had plenty of reproducers...):

Refs  #8627 Cleanly reject updates with indexed values where value > 64k
Refs #12247 Better error reporting for oversized keys during INSERT

One of the tests also fails on Cassandra, due to CASSANDRA-19270.
It is not clear to me how this unit test actually passed on Cassandra,
I can only guess that the Python driver somehow makes the request
differently than what the Java unit tests use to make requests to
Cassandra.

One of the tests in the original Cassandra source file I did not
translate, readingEmptyStringsForDifferentTypes, because it tests
cqlsh, not pure CQL.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#27944
2026-01-13 08:59:36 +02:00
Botond Dénes
4ec3d76b87 test/pylib/coverage_utils.py: profdata_to_lcov: don't mutate defaulted param
It is considered a dangerous practice as it creates a side-effect for
later calls to the same function.
Create a new variable instead and mutate that. Also remove the unused
update_known_ids parameter, which defaults to True and no caller changes
it. Passing False to this param also seem to have no effect. Instead of
trying to guess what the desired effect of passing False is and fixing
it, just remove this unused param.

Found by CodeQL "Modification of parameter with default".
2026-01-13 08:33:17 +02:00
Botond Dénes
157fe2b80e test/cluster/dtest/tools/misc.py: add type annotations to list_to_hashed_dict()
To hopefully shut up CodeQL "Iterable can be either a string or a sequence".
This change makes the code more readable anyway, so it is more than just
a gratuitous change to make some code-scanner happy.
2026-01-13 08:33:17 +02:00
Botond Dénes
3c6e9637a0 test/pylib/lcov_utils.py: don't call set when iterating over it
Probably a typo. Found by CodeQL "Non-callable called".
2026-01-13 08:33:17 +02:00
Botond Dénes
d2db84714e test/cluster/object_store/conftest.py: add missing call to parent constructor
Replace manual init of parent fields.
Found by CodeQL: "Missing call to superclass `__init__` during object
initialization".

The secret_key is not initialized to server.secret_key, instead of
server.access_key. This probably fixes a (benign) bug.
2026-01-13 08:33:17 +02:00
Botond Dénes
725e99e263 Merge 'test.py: Fix boost logs' from Andrei Chekun
Write the boost logs into stdout in HRF format and in XML to the file. The XML file will be used for parsing and providing the error information in the summary section of the fail.

Fixes: https://github.com/scylladb/scylladb/issues/28045

Framework enhancements, no need to backport.

Closes scylladb/scylladb#28107

* github.com:scylladb/scylladb:
  test.py: remove XML log from fail summary
  test.py: fix truncated boost output to stdout file
2026-01-13 06:19:05 +02:00
Andrei Chekun
dfa6a61721 test.py: remove XML log from fail summary
Remove XML log from fail summary.
Add text from the first error in the XML file to the fail summary
2026-01-12 14:26:58 +01:00
Andrei Chekun
d96a50481a test.py: fix truncated boost output to stdout file
Change the behavior of the catching the boost log output. With this
change boost will output it's logging to stdour with HRF format and to
the tempfile in XML format. This will help for easier debuggint when all
messages will be in the output file and still in the fail summary.
2026-01-12 14:15:23 +01:00
Botond Dénes
6bcc18e5c6 erge 'test.py: integrate python tests to be executed with pytest runner' from Andrei Chekun
This will move responsibility for running tests with pytest in the same manner as it was done with boost tests. From this commit, test.py is not responsible anymore for running python tests and relies completely on pytest.
This is another step for unification of test execution.
Convert skip_mode function to `pytest.mark` to be able to use to annotate the whole module instead of each test explicitly.

NOTE: this is a breaking change. From this commit, several directories with tests will require a path to the file to launch the test. Affected directories
test/alternator
test/broadcast_tables
test/cql
test/cqlpy
test/rest_api

Changes only in framework, so no backport.

This PR will increase the amount of the tests by 30 test, due to the fact that how test.py and pytest discover tests. test.py count a file as a test, and when skip used in suite.yaml it will exclude the tests from discovery completely.
While the pytest count test funstion as a test and uses skip_mode mark and will discover the tests, but it will skip them during execution, hence the difference
test.py output before PR:
```bash
> ./test.py --mode=release  rest_api/test_compaction_task rest_api/test_task_manager --list --no-gather-metrics

```
test.py output in this PR:
```bash
> ./test.py --mode=release  test/rest_api/test_compaction_task.py test/rest_api/test_task_manager.py --list

rest_api/test_compaction_task.py::test_global_major_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_major_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_reshaping_compaction_task.release.1
rest_api/test_compaction_task.py::test_resharding_compaction_task.release.1
rest_api/test_compaction_task.py::test_regular_compaction_task.release.1
rest_api/test_compaction_task.py::test_compaction_task_abort.release.1
rest_api/test_compaction_task.py::test_major_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_compaction_progress[major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_compaction_task.py::test_compaction_progress[shard_major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_compaction_task.py::test_compaction_progress[table_major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_task_manager.py::test_task_manager_modules.release.1
rest_api/test_task_manager.py::test_task_manager_tasks.release.1
rest_api/test_task_manager.py::test_task_manager_status_running.release.1
rest_api/test_task_manager.py::test_task_manager_status_done.release.1
rest_api/test_task_manager.py::test_task_manager_status_failed.release.1
rest_api/test_task_manager.py::test_task_manager_not_abortable.release.1
rest_api/test_task_manager.py::test_task_manager_wait.release.1
rest_api/test_task_manager.py::test_task_manager_ttl.release.1
rest_api/test_task_manager.py::test_task_manager_user_ttl.release.1
rest_api/test_task_manager.py::test_task_manager_sequence_number.release.1
rest_api/test_task_manager.py::test_task_manager_recursive_status.release.1
rest_api/test_task_manager.py::test_module_not_exists.release.1
rest_api/test_task_manager.py::test_task_folding.release.1
rest_api/test_task_manager.py::test_abort_on_unregistered_task.release.1
```

Fixes: https://github.com/scylladb/scylladb/issues/27716

Closes scylladb/scylladb#26395

* github.com:scylladb/scylladb:
  test.py: fix test_vector_similarity.py
  docs: add directories excluded from test.py
  test.py: prevent file descriptors leaking
  test.py: capture print inside the test
  test.py: do not print header for collection with test.py
  test.py: remove not supported functionality
  test.py: switch of execution of several test directories by test.py runner
  test.py: integrate python tests to be executed with pytest runner
  test.py: fix test/vector_search_validator to be able to run with pytest
  test.py: prepare base class for migration
  test.py: move environment preparation to one method
  test.py: introduce new environment variable TESTPY_PREPARED_ENVIRONMENT
2026-01-12 14:17:19 +02:00
Botond Dénes
04b8f72946 Merge 'repair: Implement auto repair for tablet repair' from Asias He
repair: Implement auto repair for tablet repair

This patch implements the basic auto repair support for tablet repair.

It was decided to add no per table configuration for the initial
implementation, so two scylla yaml config options are introduced to set
the default auto repair configs for all the tablet tables.

- auto_repair_enabled_default

Set true to enable auto repair for tablet tables by default. The value
will be overridden by the per keyspace or per table configuration which
is not implemented yet.

- auto_repair_threshold_default_in_seconds

Set the default time in seconds for the auto repair threshold for tablet
tables. If the time since last repair is bigger than the configured
time, the tablet is eligible for auto repair. The value will be
overridden by the per keyspace or per table configuration which is not
implemented yet.

The following metrcis are added:

- auto_repair_needs_repair_nr

The number of tablets with auto repair enabled that needs repair

- auto_repair_enabled_nr

The number of tablets with auto repair enabled

The metrics are useful to tell if auto repair is falling behind.

In the future, more auto repair scheduling will be added, e.g.,
scheduling based on the repaired and unrepaired sstable set size,
tombstone ratio and so on, in addition to the time based scheduling.

Fixes SCYLLADB-99

New feature. No backport.

Closes scylladb/scylladb#27534

* github.com:scylladb/scylladb:
  topology_coordinator: Add metrics for tablet repair
  repair: Implement auto repair for tablet repair
2026-01-12 14:16:01 +02:00
Petr Gusev
889d7782ed treewide: use coroutine::maybe_yield in coroutines
It's more efficient since coroutine::maybe_yield returns
a lightweight struct (awaitable), not the future.

Closes scylladb/scylladb#28101
2026-01-12 10:38:47 +01:00
Alex
e430065c92 db: views: serialize create/drop view operations via shard 0
Create and drop view operations are currently performed on all shards, and their execution is not fully serialized. On slower processors this can lead to interleavings that leave stale entries in `system.scylla_views_build`

A problematic sequence looks like this:

* `on_create_view()` runs on shard 0 → entries for shard 0 and shard 1 are created
* `on_drop_view()` runs on shard 0 → entry for shard 0 is removed
* `on_create_view()` runs on shard 1 → entries for shard 0 and shard 1 are created again
* `on_drop_view()` runs on shard 1 → entry for shard 1 is removed, while the shard 0 entry remains

This results in a leftover row in `system.scylla_views_builds_in_progress`, causing `view_build_test.cc` to get stuck indefinitely in an eventual state and eventually be terminated by CI.

This patch fixes the issue by fully serializing all view create and drop operations through shard 0. Shard 0 becomes the single execution point and notifies other shards to perform their work in order. Requests originating.

new process:
  - view_builder::on_create_view(...) runs only on shard 0 and kicks off dispatch_create_view(...) in the background.
  - dispatch_create_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
  - dispatch_create_view(...) calls handle_seed_view_build_progress(...) on shard 0. That:
      - writes the global “build progress” row across all shards via _sys_ks.register_view_for_building_for_all_shards(...).
  - After seeding, dispatch_create_view(...) broadcasts to all shards with container().invoke_on_all(...).
  - Each shard runs handle_create_view_local(...), which:
      - waits for pending base writes/streams, flushes the base,
      - resets the reader to the current token and adds the new view,
      - handles errors and triggers _build_step to continue processing.

  Drop view

  - view_builder::on_drop_view(...) runs only on shard 0 and kicks off dispatch_drop_view(...) in the background.
  - dispatch_drop_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
  - It broadcasts handle_drop_view_local(...) to all shards with invoke_on_all(...).
  - Each shard runs handle_drop_view_local(...), which:
      - removes the view from local build state (_base_to_build_step and _built_views) by scanning existing steps,
      - ignores missing keyspace cases.
  - After all shards finish local cleanup, shard 0 runs handle_drop_view_global_cleanup(...), which:
      - removes global build progress, built‑view state, and view build status in system tables,

  Shutdown

  - drain() waits on _view_notification_sem before _sem so in‑flight dispatches finish before bookkeeping is halted.

In addition, the test is adjusted to remove the long eventual wait (596.52s / 30 iterations) and instead rely on the default wait of 17 iterations (~4.37 minutes), eliminating unnecessary delays while preserving correctness.

Fixes: https://github.com/scylladb/scylladb/issues/27898
Backport: not required as the problem happens on master

Closes scylladb/scylladb#27929
2026-01-12 09:23:22 +02:00
Michał Hudobski
92c988514c vector_search: allow all where clauses in vector search queries
To prepare for implementation of filtering we skip validation
of where clauses in vector search queries. All queries that would
be blocked by the lack of ALLOW FILTERING now will pass through.

Fixes: VECTOR-410

Closes scylladb/scylladb#27758
2026-01-11 12:56:44 +02:00