When read or write operations are performed on a DC with RF=0 with LOCAL_QUORUM
or LOCAL_ONE consistency level, Cassandra throws `Unavailable` exception.
Scylla allowed such read operations and failed write operations with a cryptic:
"broken promise" error. This occured because the initial availability
check passed (quorum of 0 requires 0 replicas), but execution failed
later when no replicas existed to process the mutation.
This patch adds an explicit RF=0 validation for LOCAL_ONE and LOCAL_QUORUM that
throws before attempting operation execution.
The change also requires `test_query_dc_with_rf_0_does_not_crash_db` to be
upgraded. This testcase was asserting somewhat similar scenario, but wasn't
taking into account the whole matrix of combinations:
- scenarios: successful vs unsuccesful operation outcome
- local consistency levels: LOCAL_QUORUM & LOCAL_ONE
- operations: SELECT (read) & INSERT (write)
and so it's been extended to cover both the pre-existing and the current issues
and the whole matrix of combinations.
Fixes: scylladb/scylladb#27893
Most likely 817fdad uncovered the fact that our choice of primary replica was resonating with tablet allocation and we were ending up picking the same replica as primary within a scope instead of rotating primaryship among all replicas in the scope.
This created situations where for instance, restoring into a 9 nodes with primary_replica_only=true would put all data into 3 nodes, leaving the other 6 unused. The balancing of the dataset was performed by the subsequent repair step.
This PR fixes this by changing the formula for picking up the primary replica out of a set of eligible replicas from within the passed scope.
The PR also extends the testing scenarios in `test_backup.py` so we get to run restore for a set of topologies, for all combinations of scope, primary_replica_only and min_tablet_counts.
Most of the work was done by @bhalevy [here](https://github.com/scylladb/scylladb/compare/master...bhalevy:scylla:load-balance-primary-replica), this PR just splitted it and did touchups here and there.
Fixes#27281Closesscylladb/scylladb#27397
* github.com:scylladb/scylladb:
test: reduce dataset and number of test cases or debug builds
test: bump repair timeout up, it's sometimes not enough in CI
test: refactor test_refresh.py to match test_restore_with_streaming_scopes.
test: extend test_restore_with_streaming_scopes
test: Adjust test_restore_primary_replica_different_dc_scope_all
test: Refactor restoring code in test_backup to match SM pattern
test: add check_mutation_replicas calls after fresh creation of dataset
test: extend create_dataset to accept consistency_level
test: refactor check_mutation_replicas so it's more readable
test: make create_dataset async and refactor so it's configurable
test: use defaultdict in collect_mutations
test: add log marks to facilitate reusing server for restore
locator: tablets: Distribute data evenly among primary replicas during restore
To configure S3 storage, one needs to do
```
object_storage_endpoints:
- name: s3.us-east-1.amazonaws.com
port: 443
https: true
aws_region: us-east-1
```
and for GCS it's
```
object_storage_endpoints:
- name: https://storage.googleapis.com:433
type: gs
credentials_file: <gcp account credentials json file>
```
This PR updates the S3 part to look like
```
object_storage_endpoints:
- name: https://s3.us-east-1.amazonaws.com:443
aws_region: us-east-1
```
fixes: #26570
This is 2nd attempt, previous one (#27360) was reverted because it reported endpoint configs in new format via API and CQL always, even if the endpoint was configured in the old way. This "broke" scylla manager and some dtests. This version has this bug fixed, and endpoints are reported in the same format as they were configured with.
About correctness of the changes.
No modifications to existing tests are made here, so old format is respected correctly (as far as it's covered by tests). To prove the new format works the the test_get_object_store_endpoints is extended to validate both options. Some preparations to this test to make this happen come on their own with the PR #28111 to show that they are valid and pass before changing the core code.
Enhancing the way configuration is made, likely no need to backport.
Closesscylladb/scylladb#28112
* github.com:scylladb/scylladb:
test: Validate S3 endpoints new format works
docs: Update docs according to new endpoints config option format
object_storage: Create s3 client with "extended" endpoint name
s3/storage: Tune config updating
sstable: Shuffle args for s3_client_wrapper
test: Rename badconf variable into objconf
test: Split the object_store/test_get_object_store_endpoints test
When a counter write times out (due to rpc::timeout_error or timed_out_error),
the code was throwing mutation_write_timeout_exception but not marking the
write_timeouts metric. This resulted in counter write timeouts not being
counted in the scylla_storage_proxy_coordinator_write_timeouts metric.
Regular writes go through mutate_internal -> mutate_end, which catches
mutation_write_timeout_exception and marks the metric. However, counter
writes use a separate code path (mutate_counters) that has its own
exception handling but was missing the metric update.
This fix adds get_stats().write_timeouts.mark() before throwing the
timeout exception in the counter write path, consistent with how the
CAS path handles cas_write_timeouts.
Refs: https://scylladb.atlassian.net/browse/SCYLLADB-245
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Closesscylladb/scylladb#28019
A replaced node may have pending operation on it. The replace operation
will move the node into the 'left' state and the request will never be
completed. More over the code does not expect left node to have a
request. It will try to process the request and will crash because the
node for the request will not be found.
The patch checks is the replaced node has peening request and completes
it with failure. It also changes topology loading code to skip requests
for nodes that are in a left state. This is not strictly needed, but
makes the code more robust.
Fixes#27990Closesscylladb/scylladb#28009
`test_schema_versioning_with_recovery` is currently flaky. It performs
a write with CL=ALL and then checks if the schema version is the same on
all nodes by calling `verify_table_versions_synced`. All nodes are expected
to sync their schema before handling the replica write. The node in
RECOVERY mode should do it through a schema pull, and other nodes should do
it through a group 0 read barrier.
The problem is in `verify_local_schema_versions_synced` that compares the
schema versions in `system.local`. The node in RECOVERY mode updates the
schema version in `system.local` after it acknowledges the replica write
as completed. Hence, the check can fail.
We fix the problem by making the function wait until the schema versions
match.
Note that RECOVERY mode is about to be retired together with the whole
gossip-based topology in 2026.2. So, this test is about to be deleted.
However, we still want to fix it, so that it doesn't bother us in older
branches.
Fixes#23803Closesscylladb/scylladb#28114
Problem
-------
Secondary indexes are implemented via materialized views under the
hood. The way an index behaves is determined by the configuration
of the view. Currently, it can be modified by performing the CQL
statement `ALTER MATERIALIZED VIEW` on it. However, that raises some
concerns.
Consider, for instance, the following scenario:
1. The user creates a secondary index on a table.
2. In parallel, the user performs writes to the base table.
3. The user modifies the underlying materialized view, e.g. by setting
the `synchronous_updates` to `true` [1].
Some of the writes that happened before step 3 used the default value
of the property (which is `false`). That had an actual consequence
on what happened later on: the view updates were performed
asynchronously. Only after step 3 had finished did it change.
Unfortunately, as of now, there is no way to avoid a situation like
that. Whenever the user wants to configure a secondary index they're
creating, they need to do it in another schema change. Since it's
not always possible to control how the database is manipulated in
the meantime, it leads to problems like the one described.
That's not all, though. The fact that it's not possible to configure
secondary indexes is inconsistent with other schema entities. When
it comes to tables or materialized views, the user always have a means
to set some or even all of the properties during their creation.
Solution
--------
The solution to this problem is extending the `CREATE INDEX` CQL
statement by view properties. The syntax is of form:
```
> CREATE INDEX <index name>
> .. ON <keyspace>.<table> (<columns>)
> .. WITH <properties>
```
where `<properties>` corresponds to both index-specific and view
properties [2, 3]. View properties can only be used with indexes
implemented with materialized views; for example, it will be impossible
to create a vector index when specifying any view property (see
examples below).
When a view property is provided, it will be applied when creating the
underlying materialized view. The behavior should be similar to how
other CQL statements responsible for creating schema entities work.
High-level implementation strategy
----------------------------------
1. Make auxiliary changes.
2. Introduce data structures representing the new set of index
properties: both index-specific and those corresponding to the
underlying view.
3. Extend `CREATE INDEX` to accept view properties.
4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include
view properties in their output.
User documentation is also updated at the steps to reflect the
corresponding changes.
Implementation considerations
-----------------------------
There are a number of schema properties that are now obsolete. They're
accepted by other CQL statements, but they have no effect. They
include:
* `index_interval`
* `replicate_on_write`
* `populate_io_cache_on_flush`
* `read_repair_chance`
* `dclocal_read_repair_chance`
If the user tries to create a secondary index specifying any of those
keywords, the statement will fail with an appropriate error (see
examples below).
Unlike materialized views, we forbid specifying the clustering order
when creating a secondary index [4]. This limitation may be lifted
later on, but it's a detail that may or may not prove troublesome. It's
better to postpone covering it to when we have a better perspective on
the consequences it would bring.
Examples
--------
Good examples
```
> CREATE INDEX idx ON ks.t (v);
> CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property';
> CREATE INDEX idx ON ks.t (v)
.. WITH comment = 'multiple view properties are ok'
.. AND synchronous_updates = true;
> CREATE INDEX idx ON ks.t (v)
.. WITH comment = 'default value ok'
.. AND synchronous_updates = false;
```
Bad examples
```
> CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true;
SyntaxException: Unknown property 'replicate_on_write'
> CREATE INDEX idx ON ks.t (v)
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Cannot specify options for a non-CUSTOM index"
> CREATE CUSTOM INDEX idx ON ks.t (v)
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="CUSTOM index requires specifying the index class"
> CREATE CUSTOM INDEX idx ON ks.t (v)
.. USING 'vector_index'
.. WITH OPTIONS = {'option1': 'value1'}
.. AND comment = 'some text';
InvalidRequest: Error from server: code=2200 [Invalid query]
message="You cannot use view properties with a vector index"
> CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC);
InvalidRequest: Error from server: code=2200 [Invalid query]
message="Indexes do not allow for specifying the clustering order"
```
and so on. For more examples, see the relevant tests.
References:
[1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views
[2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index
[3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options
[4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clauseFixesscylladb/scylladb#16454
Backport: not needed. This is an enhancement.
Closesscylladb/scylladb#24977
* github.com:scylladb/scylladb:
cql3: Extend DESC INDEX by view properties
cql3: Forbid using CLUSTERING ORDER BY when creating index
cql3: Extend CREATE INDEX by MV properties
cql3/statements/create_index_statement: Allow for view options
cql3/statements/create_index_statement: Rename member
cql3/statements/index_prop_defs: Re-introduce index_prop_defs
cql3/statements/property_definitions: Add extract_property()
cql3/statements/index_prop_defs.cc: Add namespace
cql3/statements/index_prop_defs.hh: Rename type
cql3/statements/view_prop_defs.cc: Move validation logic into file
cql3/statements: Introduce view_prop_defs.{hh,cc}
cql3/statements/create_view_statement.cc: Move validation of ID
schema/schema.hh: Do not include index_prop_defs.hh
Read timeouts are a common occurence and they typically occur when the replica is overloaded. So throwing exceptions for read timeouts is very harmful. Be careful not to thow exceptions while propagating them up the future chain. Add a test to enfore and detect regressions.
Fixes: scylladb/scylladb#25062
Improvement, normally not a backport candidate, but we may decide to backport if customer(s) are found to suffer from this.
Closesscylladb/scylladb#25068
* github.com:scylladb/scylladb:
reader_permit: remove check_abort()
test/boost/database_test: add test for read timeout exceptions
sstables/mx/reader: don't throw exceptions on the read-path
readers/multishard: don't throw exceptions on the read-path
replica/table: don't throw exceptions on the read-path
multishard_mutation_query: fix indentation
multishard_mutation_query: don't throw exceptions on the read-path
service/storage_proxy: don't throw exceptions on the full-scan path
cql3/query_processor: don't throw exceptions on the read-path
reader_permit: add get_abort_exception()
At the end of the test case, the framework greps logs for errors and
backtraces. The servers are still running at this point. Some test
cases enable debug-level logging. If servers manage to produce new
lines between the python script processes them, the grep will never
return.
Protect against this by grepping over a file snapshot.
Fixes#28086Closesscylladb/scylladb#28088
Extend the test_get_object_store_endpoints() test to configure S3
endpoints in full-url format and check that they are rendered properly
via API/CQL.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It tests two things -- the way object storage config is represented via
API and CQL (from sytem.config) and that updating config affects CREATE
KEYSPACE CQL (with keyspace storage options)
It's better to split the test, as its former part is going to be
extented to validate old/new config formats (see #26570)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
to test restoring with a different min_tablet_count
than the schema was originally created with.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
This patch refactors the restoring code in cluster/test_backup.py
so it matches better the way SM works.
The patch also refactors test_restore_with_streaming_scopes so to
facilitate running restore scenarios under all supported scopes
with or w/o primary_replica_only enabled by reusing the servers
and backups for a topology. This allows us to test a lot more scenarios
without making the test impossibly slow.
split from bhalevy/load-balance-primary-replica
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
to validate that mutation assertions are sane
split from bhalevy/load-balance-primary-replica
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Address all errors reported by CodeQL as reported on https://github.com/scylladb/scylladb/security/quality.
This is a mixed bag, with some harmless issues, while others are severe problems which will result in the code breaking (if it is even run). I suspect some of the more severe problems were found in dead code that is not used at all -- hence nobody noticed.
Still, these issues are good to fix, so we can reduce noise in the reports and improve the maintainability of the code.
Code cleanup, no backport
Closesscylladb/scylladb#27838
* github.com:scylladb/scylladb:
pgo/pgo.py: don't mutate input params
test/pylib/coverage_utils.py: profdata_to_lcov: don't mutate defaulted param
test/cluster/dtest/tools/misc.py: add type annotations to list_to_hashed_dict()
idl-compiler.py: raise TypeError instead of raw str
test/pylib/lcov_utils.py: don't call set when iterating over it
configure.py: move away from .format(**locals())
test/cluster/object_store/conftest.py: add missing call to parent constructor
idl-compiler.py: add missing call to parent class constructor
tools/scyllatop/fake.py: pass correct number of args to _add_metric
This patch adds a second reproducer for issue #25839, which is about
scanning a secondary index which returns partial results. The new test
uses count(*) without requesting the row themselves, but still has the
same problem of counting only part of the rows. This is the problem that
a user reported in issue #28026.
Unlike the previous test, this test works correctly on older versions
of Scylla - by using larger data, like on Cassandra - without changing
a configuration variable that did not yet exist. So with this test we
can confirm that this bug is a Scylla 5.2 regression:
test/cqlpy/run --release 5.1 test_secondary_index.py::test_short_count
passes, while
test/cqlpy/run --release 5.2 test_secondary_index.py::test_short_count
fails. It also fails on master, so the new test is marked "xfail".
Refs #25839
Refs #28026
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#28108
Addresses outstanding review comments from PR #22961 where SSL field
collection was refactored into generic_server::connection base class.
This patch consists of minor cosmetic enhancements for increased
readability, mainly, with some minor fixups explained in specific
commits.
Cosmetic changes, no need to backport.
Closesscylladb/scylladb#27575
* github.com:scylladb/scylladb:
test_ssl: fix indentation
generic_server: improve logging broken TLS connection
test_ssl: improve timeout and readability
alternator/server: update SSL comment
This method can cause performance regressions if used in the wrong place
-- namely if it is used to abort reads by throwing the abort exception.
Exceptions should be propagated during reads without throwing them,
otherwise they cause extra CPU load, making a bad situation worse.
Remove this method, so it doesn't accidentally get more users, migrate
remaining users to get_abort_exception().
Read timeouts shouldn't trigger exceptions thrown, exceptions should be
solely propagated via futures, otherwise they put extra strain on the
system at the worst possible time: when it is overload already enough
that reads started to time out.
The test covers both single partition reads and full scans, with two
scenarios:
* timeout while the read is queued
* timeout when the read is already ongoing
We add a test that validates that indexed queries
do not throw a warning related to vector search paging
Fixes: SCYLLADB-248
Closesscylladb/scylladb#28077
We have a test in test_compressed_response.py that reproduces a bug
where in Alternator's signature checking code, if a header had multiple
consecutive spaces its signature isn't checked correctly.
This patch fixes this and that xfailing test begins to pass.
But it turns out that the handling of multiple consecutive spaces in
headers when calculating the authentication signature is just one example
of "header canonization" that the AWS Signature V4 specification requires
us to do. There are additional types of header canonization that Alternator
must do, and this patch also adds new tests in test_authorization.py for
checking *all* the types of canonization.
Fortunately, for all other types of canonizations, we already handled
them correctly - Alternator already lowercases header names, sorts them
alphabetically and removes leading and trailing spaces before calculating
the signature. So most of the new tests added pass also without this patch,
and only one of them, test_canonization_middle_whitespace, needs this
patch to pass. As usual, all the new tests also pass on DynamoDB.
Fixes#27775
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#28102
With new UI Jenkins escaping the HTML tags during rendering to prevent
XSS. This will show just link without custom name as a string that can
be copied and then pasted to navigate to the failed directory.
Closesscylladb/scylladb#28062
To fix the problem, we need to remove the first, redundant definition of
test_gossiper_unreachable_endpoints (lines 19-24). The second definition
(lines 25-40) should be retained as it has more substantial test logic.
No other code changes or imports are needed, as the test logic is
preserved fully in the retained definition.
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Closesscylladb/scylladb#27632
Following 954f2cbd2f, which added proxy protocol v2 listeners
for CQL, we do the same for alternator. We add two optional ports
for plain and TLS-wrapped HTTP.
We test each new port, that the old ports still work, and that
mixing up a port with no proxy protocol and a connection with proxy
protocol (or the opposite) fails. The latter serves to show
that the testing strategy is valid and doesn't just pass whatever
happens. We also verify that the correct addresses (and TLS mode)
show up in system.clients.
Closesscylladb/scylladb#27889
Currently Alternator supports compressed requests in the gzip format
with "Content-Encoding: gzip". We did not support any other compression
formats.
It turns out that DynamoDB also supports the "deflate" encoding.
The "deflate" format is just a small variant of gzip and also supported
by the same zlib library that we already use, so it is very easy
to add support for it as well. So this patch adds it.
Beyond compatibility with DynamoDB, another benefit of this patch is
symmetry with our response compression support (PR #27454), where
we supported both gzip and deflate compression of responses - so
we should support the same for requests.
This patch also adds tests for Content-Encoding: deflate, which pass
on DynamoDB (proving that "deflate" is indeed supported there).
On Alternator the new tests failed before this patch and pass with
this patch.
Refs #27243 (which asks to support more compression formats).
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27917
This is a translation of Cassandra's CQL unit test source file
validation/operations/InsertInvalidateSizedRecordsTest.java into our
cqlpy framework.
This is one of the tests added to Cassandra as part of the vector
search work, but actually has nothing to do with vector search -
it checks what happens when key columns of different types exceeed
their maximum size (64KB).
Unfortunately, each one of the tests added here *fail* on ScyllaDB,
providing more reproducers for two already known issues (which
already had plenty of reproducers...):
Refs #8627 Cleanly reject updates with indexed values where value > 64k
Refs #12247 Better error reporting for oversized keys during INSERT
One of the tests also fails on Cassandra, due to CASSANDRA-19270.
It is not clear to me how this unit test actually passed on Cassandra,
I can only guess that the Python driver somehow makes the request
differently than what the Java unit tests use to make requests to
Cassandra.
One of the tests in the original Cassandra source file I did not
translate, readingEmptyStringsForDifferentTypes, because it tests
cqlsh, not pure CQL.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#27944
It is considered a dangerous practice as it creates a side-effect for
later calls to the same function.
Create a new variable instead and mutate that. Also remove the unused
update_known_ids parameter, which defaults to True and no caller changes
it. Passing False to this param also seem to have no effect. Instead of
trying to guess what the desired effect of passing False is and fixing
it, just remove this unused param.
Found by CodeQL "Modification of parameter with default".
To hopefully shut up CodeQL "Iterable can be either a string or a sequence".
This change makes the code more readable anyway, so it is more than just
a gratuitous change to make some code-scanner happy.
Replace manual init of parent fields.
Found by CodeQL: "Missing call to superclass `__init__` during object
initialization".
The secret_key is not initialized to server.secret_key, instead of
server.access_key. This probably fixes a (benign) bug.
Write the boost logs into stdout in HRF format and in XML to the file. The XML file will be used for parsing and providing the error information in the summary section of the fail.
Fixes: https://github.com/scylladb/scylladb/issues/28045
Framework enhancements, no need to backport.
Closesscylladb/scylladb#28107
* github.com:scylladb/scylladb:
test.py: remove XML log from fail summary
test.py: fix truncated boost output to stdout file
Change the behavior of the catching the boost log output. With this
change boost will output it's logging to stdour with HRF format and to
the tempfile in XML format. This will help for easier debuggint when all
messages will be in the output file and still in the fail summary.
This will move responsibility for running tests with pytest in the same manner as it was done with boost tests. From this commit, test.py is not responsible anymore for running python tests and relies completely on pytest.
This is another step for unification of test execution.
Convert skip_mode function to `pytest.mark` to be able to use to annotate the whole module instead of each test explicitly.
NOTE: this is a breaking change. From this commit, several directories with tests will require a path to the file to launch the test. Affected directories
test/alternator
test/broadcast_tables
test/cql
test/cqlpy
test/rest_api
Changes only in framework, so no backport.
This PR will increase the amount of the tests by 30 test, due to the fact that how test.py and pytest discover tests. test.py count a file as a test, and when skip used in suite.yaml it will exclude the tests from discovery completely.
While the pytest count test funstion as a test and uses skip_mode mark and will discover the tests, but it will skip them during execution, hence the difference
test.py output before PR:
```bash
> ./test.py --mode=release rest_api/test_compaction_task rest_api/test_task_manager --list --no-gather-metrics
```
test.py output in this PR:
```bash
> ./test.py --mode=release test/rest_api/test_compaction_task.py test/rest_api/test_task_manager.py --list
rest_api/test_compaction_task.py::test_global_major_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_major_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task.release.1
rest_api/test_compaction_task.py::test_reshaping_compaction_task.release.1
rest_api/test_compaction_task.py::test_resharding_compaction_task.release.1
rest_api/test_compaction_task.py::test_regular_compaction_task.release.1
rest_api/test_compaction_task.py::test_compaction_task_abort.release.1
rest_api/test_compaction_task.py::test_major_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task_async.release.1
rest_api/test_compaction_task.py::test_compaction_progress[major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_compaction_task.py::test_compaction_progress[shard_major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_compaction_task.py::test_compaction_progress[table_major_keyspace_compaction_task_impl_run_fail].release.1
rest_api/test_task_manager.py::test_task_manager_modules.release.1
rest_api/test_task_manager.py::test_task_manager_tasks.release.1
rest_api/test_task_manager.py::test_task_manager_status_running.release.1
rest_api/test_task_manager.py::test_task_manager_status_done.release.1
rest_api/test_task_manager.py::test_task_manager_status_failed.release.1
rest_api/test_task_manager.py::test_task_manager_not_abortable.release.1
rest_api/test_task_manager.py::test_task_manager_wait.release.1
rest_api/test_task_manager.py::test_task_manager_ttl.release.1
rest_api/test_task_manager.py::test_task_manager_user_ttl.release.1
rest_api/test_task_manager.py::test_task_manager_sequence_number.release.1
rest_api/test_task_manager.py::test_task_manager_recursive_status.release.1
rest_api/test_task_manager.py::test_module_not_exists.release.1
rest_api/test_task_manager.py::test_task_folding.release.1
rest_api/test_task_manager.py::test_abort_on_unregistered_task.release.1
```
Fixes: https://github.com/scylladb/scylladb/issues/27716Closesscylladb/scylladb#26395
* github.com:scylladb/scylladb:
test.py: fix test_vector_similarity.py
docs: add directories excluded from test.py
test.py: prevent file descriptors leaking
test.py: capture print inside the test
test.py: do not print header for collection with test.py
test.py: remove not supported functionality
test.py: switch of execution of several test directories by test.py runner
test.py: integrate python tests to be executed with pytest runner
test.py: fix test/vector_search_validator to be able to run with pytest
test.py: prepare base class for migration
test.py: move environment preparation to one method
test.py: introduce new environment variable TESTPY_PREPARED_ENVIRONMENT
repair: Implement auto repair for tablet repair
This patch implements the basic auto repair support for tablet repair.
It was decided to add no per table configuration for the initial
implementation, so two scylla yaml config options are introduced to set
the default auto repair configs for all the tablet tables.
- auto_repair_enabled_default
Set true to enable auto repair for tablet tables by default. The value
will be overridden by the per keyspace or per table configuration which
is not implemented yet.
- auto_repair_threshold_default_in_seconds
Set the default time in seconds for the auto repair threshold for tablet
tables. If the time since last repair is bigger than the configured
time, the tablet is eligible for auto repair. The value will be
overridden by the per keyspace or per table configuration which is not
implemented yet.
The following metrcis are added:
- auto_repair_needs_repair_nr
The number of tablets with auto repair enabled that needs repair
- auto_repair_enabled_nr
The number of tablets with auto repair enabled
The metrics are useful to tell if auto repair is falling behind.
In the future, more auto repair scheduling will be added, e.g.,
scheduling based on the repaired and unrepaired sstable set size,
tombstone ratio and so on, in addition to the time based scheduling.
Fixes SCYLLADB-99
New feature. No backport.
Closesscylladb/scylladb#27534
* github.com:scylladb/scylladb:
topology_coordinator: Add metrics for tablet repair
repair: Implement auto repair for tablet repair
Create and drop view operations are currently performed on all shards, and their execution is not fully serialized. On slower processors this can lead to interleavings that leave stale entries in `system.scylla_views_build`
A problematic sequence looks like this:
* `on_create_view()` runs on shard 0 → entries for shard 0 and shard 1 are created
* `on_drop_view()` runs on shard 0 → entry for shard 0 is removed
* `on_create_view()` runs on shard 1 → entries for shard 0 and shard 1 are created again
* `on_drop_view()` runs on shard 1 → entry for shard 1 is removed, while the shard 0 entry remains
This results in a leftover row in `system.scylla_views_builds_in_progress`, causing `view_build_test.cc` to get stuck indefinitely in an eventual state and eventually be terminated by CI.
This patch fixes the issue by fully serializing all view create and drop operations through shard 0. Shard 0 becomes the single execution point and notifies other shards to perform their work in order. Requests originating.
new process:
- view_builder::on_create_view(...) runs only on shard 0 and kicks off dispatch_create_view(...) in the background.
- dispatch_create_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
- dispatch_create_view(...) calls handle_seed_view_build_progress(...) on shard 0. That:
- writes the global “build progress” row across all shards via _sys_ks.register_view_for_building_for_all_shards(...).
- After seeding, dispatch_create_view(...) broadcasts to all shards with container().invoke_on_all(...).
- Each shard runs handle_create_view_local(...), which:
- waits for pending base writes/streams, flushes the base,
- resets the reader to the current token and adds the new view,
- handles errors and triggers _build_step to continue processing.
Drop view
- view_builder::on_drop_view(...) runs only on shard 0 and kicks off dispatch_drop_view(...) in the background.
- dispatch_drop_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed.
- It broadcasts handle_drop_view_local(...) to all shards with invoke_on_all(...).
- Each shard runs handle_drop_view_local(...), which:
- removes the view from local build state (_base_to_build_step and _built_views) by scanning existing steps,
- ignores missing keyspace cases.
- After all shards finish local cleanup, shard 0 runs handle_drop_view_global_cleanup(...), which:
- removes global build progress, built‑view state, and view build status in system tables,
Shutdown
- drain() waits on _view_notification_sem before _sem so in‑flight dispatches finish before bookkeeping is halted.
In addition, the test is adjusted to remove the long eventual wait (596.52s / 30 iterations) and instead rely on the default wait of 17 iterations (~4.37 minutes), eliminating unnecessary delays while preserving correctness.
Fixes: https://github.com/scylladb/scylladb/issues/27898
Backport: not required as the problem happens on master
Closesscylladb/scylladb#27929
To prepare for implementation of filtering we skip validation
of where clauses in vector search queries. All queries that would
be blocked by the lack of ALLOW FILTERING now will pass through.
Fixes: VECTOR-410
Closesscylladb/scylladb#27758