scylladb

Author	SHA1	Message	Date
Piotr Smaron	d4c28690e1	db: fail reads and writes with local consistencty level to a DC with RF=0 When read or write operations are performed on a DC with RF=0 with LOCAL_QUORUM or LOCAL_ONE consistency level, Cassandra throws `Unavailable` exception. Scylla allowed such read operations and failed write operations with a cryptic: "broken promise" error. This occured because the initial availability check passed (quorum of 0 requires 0 replicas), but execution failed later when no replicas existed to process the mutation. This patch adds an explicit RF=0 validation for LOCAL_ONE and LOCAL_QUORUM that throws before attempting operation execution. The change also requires `test_query_dc_with_rf_0_does_not_crash_db` to be upgraded. This testcase was asserting somewhat similar scenario, but wasn't taking into account the whole matrix of combinations: - scenarios: successful vs unsuccesful operation outcome - local consistency levels: LOCAL_QUORUM & LOCAL_ONE - operations: SELECT (read) & INSERT (write) and so it's been extended to cover both the pre-existing and the current issues and the whole matrix of combinations. Fixes: scylladb/scylladb#27893	2026-01-22 12:49:45 +01:00
Sergey Zolotukhin	799d837295	test: disable test_start_bootstrapped_with_invalid_seed The test intermittently fails when an invalid DNS name is resolved, likely due to ISP DNS error hijacking (see scylladb/scylladb#28153). Disable this test to unblock CI. Fixes scylladb/scylladb#28153 Closes scylladb/scylladb#28162	2026-01-15 10:25:45 +01:00
Tomasz Grabiec	eef798d84f	Merge 'Distribute data evenly among primary replicas during restore' from Robert Bindar Most likely `817fdad` uncovered the fact that our choice of primary replica was resonating with tablet allocation and we were ending up picking the same replica as primary within a scope instead of rotating primaryship among all replicas in the scope. This created situations where for instance, restoring into a 9 nodes with primary_replica_only=true would put all data into 3 nodes, leaving the other 6 unused. The balancing of the dataset was performed by the subsequent repair step. This PR fixes this by changing the formula for picking up the primary replica out of a set of eligible replicas from within the passed scope. The PR also extends the testing scenarios in `test_backup.py` so we get to run restore for a set of topologies, for all combinations of scope, primary_replica_only and min_tablet_counts. Most of the work was done by @bhalevy [here](https://github.com/scylladb/scylladb/compare/master...bhalevy:scylla:load-balance-primary-replica), this PR just splitted it and did touchups here and there. Fixes #27281 Closes scylladb/scylladb#27397 * github.com:scylladb/scylladb: test: reduce dataset and number of test cases or debug builds test: bump repair timeout up, it's sometimes not enough in CI test: refactor test_refresh.py to match test_restore_with_streaming_scopes. test: extend test_restore_with_streaming_scopes test: Adjust test_restore_primary_replica_different_dc_scope_all test: Refactor restoring code in test_backup to match SM pattern test: add check_mutation_replicas calls after fresh creation of dataset test: extend create_dataset to accept consistency_level test: refactor check_mutation_replicas so it's more readable test: make create_dataset async and refactor so it's configurable test: use defaultdict in collect_mutations test: add log marks to facilitate reusing server for restore locator: tablets: Distribute data evenly among primary replicas during restore	2026-01-14 18:57:55 +01:00
Avi Kivity	bd08b6e5b2	Merge 'Unify configuration of object storage endpoints (take 2)' from Pavel Emelyanov To configure S3 storage, one needs to do ``` object_storage_endpoints: - name: s3.us-east-1.amazonaws.com port: 443 https: true aws_region: us-east-1 ``` and for GCS it's ``` object_storage_endpoints: - name: https://storage.googleapis.com:433 type: gs credentials_file: <gcp account credentials json file> ``` This PR updates the S3 part to look like ``` object_storage_endpoints: - name: https://s3.us-east-1.amazonaws.com:443 aws_region: us-east-1 ``` fixes: #26570 This is 2nd attempt, previous one (#27360) was reverted because it reported endpoint configs in new format via API and CQL always, even if the endpoint was configured in the old way. This "broke" scylla manager and some dtests. This version has this bug fixed, and endpoints are reported in the same format as they were configured with. About correctness of the changes. No modifications to existing tests are made here, so old format is respected correctly (as far as it's covered by tests). To prove the new format works the the test_get_object_store_endpoints is extended to validate both options. Some preparations to this test to make this happen come on their own with the PR #28111 to show that they are valid and pass before changing the core code. Enhancing the way configuration is made, likely no need to backport. Closes scylladb/scylladb#28112 * github.com:scylladb/scylladb: test: Validate S3 endpoints new format works docs: Update docs according to new endpoints config option format object_storage: Create s3 client with "extended" endpoint name s3/storage: Tune config updating sstable: Shuffle args for s3_client_wrapper test: Rename badconf variable into objconf test: Split the object_store/test_get_object_store_endpoints test	2026-01-14 18:29:03 +02:00
Yaniv Michael Kaul	d919aacc69	storage_proxy: mark write_timeouts metric for counter write timeouts When a counter write times out (due to rpc::timeout_error or timed_out_error), the code was throwing mutation_write_timeout_exception but not marking the write_timeouts metric. This resulted in counter write timeouts not being counted in the scylla_storage_proxy_coordinator_write_timeouts metric. Regular writes go through mutate_internal -> mutate_end, which catches mutation_write_timeout_exception and marks the metric. However, counter writes use a separate code path (mutate_counters) that has its own exception handling but was missing the metric update. This fix adds get_stats().write_timeouts.mark() before throwing the timeout exception in the counter write path, consistent with how the CAS path handles cas_write_timeouts. Refs: https://scylladb.atlassian.net/browse/SCYLLADB-245 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#28019	2026-01-14 17:50:46 +02:00
Gleb Natapov	bee5f63cb6	topology coordinator: complete pending operation for a replaced node A replaced node may have pending operation on it. The replace operation will move the node into the 'left' state and the request will never be completed. More over the code does not expect left node to have a request. It will try to process the request and will crash because the node for the request will not be found. The patch checks is the replaced node has peening request and completes it with failure. It also changes topology loading code to skip requests for nodes that are in a left state. This is not strictly needed, but makes the code more robust. Fixes #27990 Closes scylladb/scylladb#28009	2026-01-14 13:11:27 +01:00
Patryk Jędrzejczak	6b5923c64e	test: test_group0_schema_versioning: wait for schema sync in system.local `test_schema_versioning_with_recovery` is currently flaky. It performs a write with CL=ALL and then checks if the schema version is the same on all nodes by calling `verify_table_versions_synced`. All nodes are expected to sync their schema before handling the replica write. The node in RECOVERY mode should do it through a schema pull, and other nodes should do it through a group 0 read barrier. The problem is in `verify_local_schema_versions_synced` that compares the schema versions in `system.local`. The node in RECOVERY mode updates the schema version in `system.local` after it acknowledges the replica write as completed. Hence, the check can fail. We fix the problem by making the function wait until the schema versions match. Note that RECOVERY mode is about to be retired together with the whole gossip-based topology in 2026.2. So, this test is about to be deleted. However, we still want to fix it, so that it doesn't bother us in older branches. Fixes #23803 Closes scylladb/scylladb#28114	2026-01-14 09:55:45 +01:00
Botond Dénes	122b7847e5	Merge 'index: Accept view properties in CREATE INDEX' from Dawid Mędrek Problem ------- Secondary indexes are implemented via materialized views under the hood. The way an index behaves is determined by the configuration of the view. Currently, it can be modified by performing the CQL statement `ALTER MATERIALIZED VIEW` on it. However, that raises some concerns. Consider, for instance, the following scenario: 1. The user creates a secondary index on a table. 2. In parallel, the user performs writes to the base table. 3. The user modifies the underlying materialized view, e.g. by setting the `synchronous_updates` to `true` [1]. Some of the writes that happened before step 3 used the default value of the property (which is `false`). That had an actual consequence on what happened later on: the view updates were performed asynchronously. Only after step 3 had finished did it change. Unfortunately, as of now, there is no way to avoid a situation like that. Whenever the user wants to configure a secondary index they're creating, they need to do it in another schema change. Since it's not always possible to control how the database is manipulated in the meantime, it leads to problems like the one described. That's not all, though. The fact that it's not possible to configure secondary indexes is inconsistent with other schema entities. When it comes to tables or materialized views, the user always have a means to set some or even all of the properties during their creation. Solution -------- The solution to this problem is extending the `CREATE INDEX` CQL statement by view properties. The syntax is of form: ``` > CREATE INDEX <index name> > .. ON <keyspace>.<table> (<columns>) > .. WITH <properties> ``` where `<properties>` corresponds to both index-specific and view properties [2, 3]. View properties can only be used with indexes implemented with materialized views; for example, it will be impossible to create a vector index when specifying any view property (see examples below). When a view property is provided, it will be applied when creating the underlying materialized view. The behavior should be similar to how other CQL statements responsible for creating schema entities work. High-level implementation strategy ---------------------------------- 1. Make auxiliary changes. 2. Introduce data structures representing the new set of index properties: both index-specific and those corresponding to the underlying view. 3. Extend `CREATE INDEX` to accept view properties. 4. Extend `DESCRIBE INDEX` and other `DESCRIBE` statements to include view properties in their output. User documentation is also updated at the steps to reflect the corresponding changes. Implementation considerations ----------------------------- There are a number of schema properties that are now obsolete. They're accepted by other CQL statements, but they have no effect. They include: * `index_interval` * `replicate_on_write` * `populate_io_cache_on_flush` * `read_repair_chance` * `dclocal_read_repair_chance` If the user tries to create a secondary index specifying any of those keywords, the statement will fail with an appropriate error (see examples below). Unlike materialized views, we forbid specifying the clustering order when creating a secondary index [4]. This limitation may be lifted later on, but it's a detail that may or may not prove troublesome. It's better to postpone covering it to when we have a better perspective on the consequences it would bring. Examples -------- Good examples ``` > CREATE INDEX idx ON ks.t (v); > CREATE INDEX idx ON ks.t (v) WITH comment = 'ok view property'; > CREATE INDEX idx ON ks.t (v) .. WITH comment = 'multiple view properties are ok' .. AND synchronous_updates = true; > CREATE INDEX idx ON ks.t (v) .. WITH comment = 'default value ok' .. AND synchronous_updates = false; ``` Bad examples ``` > CREATE INDEX idx ON ks.t (v) WITH replicate_on_write = true; SyntaxException: Unknown property 'replicate_on_write' > CREATE INDEX idx ON ks.t (v) .. WITH OPTIONS = {'option1': 'value1'} .. AND comment = 'some text'; InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot specify options for a non-CUSTOM index" > CREATE CUSTOM INDEX idx ON ks.t (v) .. WITH OPTIONS = {'option1': 'value1'} .. AND comment = 'some text'; InvalidRequest: Error from server: code=2200 [Invalid query] message="CUSTOM index requires specifying the index class" > CREATE CUSTOM INDEX idx ON ks.t (v) .. USING 'vector_index' .. WITH OPTIONS = {'option1': 'value1'} .. AND comment = 'some text'; InvalidRequest: Error from server: code=2200 [Invalid query] message="You cannot use view properties with a vector index" > CREATE INDEX idx ON ks.t (v) WITH CLUSTERING ORDER BY (v ASC); InvalidRequest: Error from server: code=2200 [Invalid query] message="Indexes do not allow for specifying the clustering order" ``` and so on. For more examples, see the relevant tests. References: [1] https://docs.scylladb.com/manual/branch-2025.4/cql/cql-extensions.html#synchronous-materialized-views [2] https://docs.scylladb.com/manual/branch-2025.4/cql/secondary-indexes.html#create-index [3] https://docs.scylladb.com/manual/branch-2025.4/cql/mv.html#mv-options [4] https://docs.scylladb.com/manual/branch-2025.4/cql/dml/select.html#ordering-clause Fixes scylladb/scylladb#16454 Backport: not needed. This is an enhancement. Closes scylladb/scylladb#24977 * github.com:scylladb/scylladb: cql3: Extend DESC INDEX by view properties cql3: Forbid using CLUSTERING ORDER BY when creating index cql3: Extend CREATE INDEX by MV properties cql3/statements/create_index_statement: Allow for view options cql3/statements/create_index_statement: Rename member cql3/statements/index_prop_defs: Re-introduce index_prop_defs cql3/statements/property_definitions: Add extract_property() cql3/statements/index_prop_defs.cc: Add namespace cql3/statements/index_prop_defs.hh: Rename type cql3/statements/view_prop_defs.cc: Move validation logic into file cql3/statements: Introduce view_prop_defs.{hh,cc} cql3/statements/create_view_statement.cc: Move validation of ID schema/schema.hh: Do not include index_prop_defs.hh	2026-01-14 09:54:27 +02:00
Avi Kivity	489d1a0fbc	Merge 'replica: don't throw exceptions for read timeout' from Botond Dénes Read timeouts are a common occurence and they typically occur when the replica is overloaded. So throwing exceptions for read timeouts is very harmful. Be careful not to thow exceptions while propagating them up the future chain. Add a test to enfore and detect regressions. Fixes: scylladb/scylladb#25062 Improvement, normally not a backport candidate, but we may decide to backport if customer(s) are found to suffer from this. Closes scylladb/scylladb#25068 * github.com:scylladb/scylladb: reader_permit: remove check_abort() test/boost/database_test: add test for read timeout exceptions sstables/mx/reader: don't throw exceptions on the read-path readers/multishard: don't throw exceptions on the read-path replica/table: don't throw exceptions on the read-path multishard_mutation_query: fix indentation multishard_mutation_query: don't throw exceptions on the read-path service/storage_proxy: don't throw exceptions on the full-scan path cql3/query_processor: don't throw exceptions on the read-path reader_permit: add get_abort_exception()	2026-01-13 16:17:41 +02:00
Avi Kivity	c6dfae5661	treewide: #include Seastar headers with angle brackets Seastar is an external library from the point of view of ScyllaDB, so should be included with angle brackets. Closes scylladb/scylladb#27947	2026-01-13 14:56:15 +02:00
Tomasz Grabiec	63b9a7e2b5	test: pylib: log_browsing: Grep logs without considering newly appended lines At the end of the test case, the framework greps logs for errors and backtraces. The servers are still running at this point. Some test cases enable debug-level logging. If servers manage to produce new lines between the python script processes them, the grep will never return. Protect against this by grepping over a file snapshot. Fixes #28086 Closes scylladb/scylladb#28088	2026-01-13 14:41:02 +02:00
Pavel Emelyanov	9ffd22491f	test: Validate S3 endpoints new format works Extend the test_get_object_store_endpoints() test to configure S3 endpoints in full-url format and check that they are rendered properly via API/CQL. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:24:18 +03:00
Pavel Emelyanov	83e88d206c	test: Rename badconf variable into objconf It's not actually a "bad" config, it's just some config the test works with. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:23:20 +03:00
Pavel Emelyanov	9c627bc44a	test: Split the object_store/test_get_object_store_endpoints test It tests two things -- the way object storage config is represented via API and CQL (from sytem.config) and that updating config affects CREATE KEYSPACE CQL (with keyspace storage options) It's better to split the test, as its former part is going to be extented to validate old/new config formats (see #26570) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:23:03 +03:00
Robert Bindar	dfcabb5fa4	test: reduce dataset and number of test cases or debug builds Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:51 +02:00
Robert Bindar	ca3c57e821	test: bump repair timeout up, it's sometimes not enough in CI Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:49 +02:00
Robert Bindar	6f5e58e718	test: refactor test_refresh.py to match test_restore_with_streaming_scopes. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:48 +02:00
Robert Bindar	6e636a4231	test: extend test_restore_with_streaming_scopes to test restoring with a different min_tablet_count than the schema was originally created with. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:46 +02:00
Robert Bindar	92cd1ddec3	test: Adjust test_restore_primary_replica_different_dc_scope_all to match the new topology arhitecture Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:44 +02:00
Robert Bindar	db13ece9a0	test: Refactor restoring code in test_backup to match SM pattern This patch refactors the restoring code in cluster/test_backup.py so it matches better the way SM works. The patch also refactors test_restore_with_streaming_scopes so to facilitate running restore scenarios under all supported scopes with or w/o primary_replica_only enabled by reusing the servers and backups for a topology. This allows us to test a lot more scenarios without making the test impossibly slow. split from bhalevy/load-balance-primary-replica Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:43 +02:00
Robert Bindar	ba01589f53	test: add check_mutation_replicas calls after fresh creation of dataset to validate that mutation assertions are sane split from bhalevy/load-balance-primary-replica Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:41 +02:00
Robert Bindar	b835d32cb0	test: extend create_dataset to accept consistency_level Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:35 +02:00
Robert Bindar	e7d44356d9	test: refactor check_mutation_replicas so it's more readable split from bhalevy/load-balance-primary-replica Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:31 +02:00
Robert Bindar	733b4dbbb7	test: make create_dataset async and refactor so it's configurable with num_keys and min_tablet_count split from bhalevy/load-balance-primary-replica Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:46:20 +02:00
Robert Bindar	f2c8949e4a	test: use defaultdict in collect_mutations split from bhalevy/load-balance-primary-replica Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:45:03 +02:00
Robert Bindar	45faeba97d	test: add log marks to facilitate reusing server for restore split from bhalevy/load-balance-primary-replica Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-01-13 11:44:48 +02:00
Nadav Har'El	2a831ad373	Merge 'Address CodeQL Errors' from Botond Dénes Address all errors reported by CodeQL as reported on https://github.com/scylladb/scylladb/security/quality. This is a mixed bag, with some harmless issues, while others are severe problems which will result in the code breaking (if it is even run). I suspect some of the more severe problems were found in dead code that is not used at all -- hence nobody noticed. Still, these issues are good to fix, so we can reduce noise in the reports and improve the maintainability of the code. Code cleanup, no backport Closes scylladb/scylladb#27838 * github.com:scylladb/scylladb: pgo/pgo.py: don't mutate input params test/pylib/coverage_utils.py: profdata_to_lcov: don't mutate defaulted param test/cluster/dtest/tools/misc.py: add type annotations to list_to_hashed_dict() idl-compiler.py: raise TypeError instead of raw str test/pylib/lcov_utils.py: don't call set when iterating over it configure.py: move away from .format(**locals()) test/cluster/object_store/conftest.py: add missing call to parent constructor idl-compiler.py: add missing call to parent class constructor tools/scyllatop/fake.py: pass correct number of args to _add_metric	2026-01-13 11:43:57 +02:00
Nadav Har'El	609b283d98	test/cqlpy: add another reproducer for known issue This patch adds a second reproducer for issue #25839, which is about scanning a secondary index which returns partial results. The new test uses count(*) without requesting the row themselves, but still has the same problem of counting only part of the rows. This is the problem that a user reported in issue #28026. Unlike the previous test, this test works correctly on older versions of Scylla - by using larger data, like on Cassandra - without changing a configuration variable that did not yet exist. So with this test we can confirm that this bug is a Scylla 5.2 regression: test/cqlpy/run --release 5.1 test_secondary_index.py::test_short_count passes, while test/cqlpy/run --release 5.2 test_secondary_index.py::test_short_count fails. It also fails on master, so the new test is marked "xfail". Refs #25839 Refs #28026 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28108	2026-01-13 11:15:27 +02:00
Botond Dénes	7b562bb185	Merge 'system.clients: Address SSL refactor review comments' from Piotr Smaron and Copilot Addresses outstanding review comments from PR #22961 where SSL field collection was refactored into generic_server::connection base class. This patch consists of minor cosmetic enhancements for increased readability, mainly, with some minor fixups explained in specific commits. Cosmetic changes, no need to backport. Closes scylladb/scylladb#27575 * github.com:scylladb/scylladb: test_ssl: fix indentation generic_server: improve logging broken TLS connection test_ssl: improve timeout and readability alternator/server: update SSL comment	2026-01-13 11:00:26 +02:00
Botond Dénes	354c805e6a	reader_permit: remove check_abort() This method can cause performance regressions if used in the wrong place -- namely if it is used to abort reads by throwing the abort exception. Exceptions should be propagated during reads without throwing them, otherwise they cause extra CPU load, making a bad situation worse. Remove this method, so it doesn't accidentally get more users, migrate remaining users to get_abort_exception().	2026-01-13 10:47:57 +02:00
Botond Dénes	a0ddac655d	test/boost/database_test: add test for read timeout exceptions Read timeouts shouldn't trigger exceptions thrown, exceptions should be solely propagated via futures, otherwise they put extra strain on the system at the worst possible time: when it is overload already enough that reads started to time out. The test covers both single partition reads and full scans, with two scenarios: * timeout while the read is queued * timeout when the read is already ongoing	2026-01-13 10:47:57 +02:00
Michał Hudobski	c8aa49b196	vector search, paging: add test for paging warnings We add a test that validates that indexed queries do not throw a warning related to vector search paging Fixes: SCYLLADB-248 Closes scylladb/scylladb#28077	2026-01-13 10:33:36 +02:00
Nadav Har'El	34191d8fd4	alternator: fix signature checking of headers with multiple spaces We have a test in test_compressed_response.py that reproduces a bug where in Alternator's signature checking code, if a header had multiple consecutive spaces its signature isn't checked correctly. This patch fixes this and that xfailing test begins to pass. But it turns out that the handling of multiple consecutive spaces in headers when calculating the authentication signature is just one example of "header canonization" that the AWS Signature V4 specification requires us to do. There are additional types of header canonization that Alternator must do, and this patch also adds new tests in test_authorization.py for checking all the types of canonization. Fortunately, for all other types of canonizations, we already handled them correctly - Alternator already lowercases header names, sorts them alphabetically and removes leading and trailing spaces before calculating the signature. So most of the new tests added pass also without this patch, and only one of them, test_canonization_middle_whitespace, needs this patch to pass. As usual, all the new tests also pass on DynamoDB. Fixes #27775 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28102	2026-01-13 10:29:13 +02:00
Andrei Chekun	3f14d45d6e	test.py: fix the link for the failed_test directory With new UI Jenkins escaping the HTML tags during rendering to prevent XSS. This will show just link without custom name as a string that can be copied and then pasted to navigate to the failed directory. Closes scylladb/scylladb#28062	2026-01-13 10:22:38 +02:00
Yaniv Kaul	4e3aa53f8b	test/rest_api/test_gossiper.py: fix for Variable defined multiple times To fix the problem, we need to remove the first, redundant definition of test_gossiper_unreachable_endpoints (lines 19-24). The second definition (lines 25-40) should be retained as it has more substantial test logic. No other code changes or imports are needed, as the test logic is preserved fully in the retained definition. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27632	2026-01-13 10:17:29 +02:00
Avi Kivity	66aee0fb5e	alternator: add optional listeners for proxy protocol v2 Following `954f2cbd2f`, which added proxy protocol v2 listeners for CQL, we do the same for alternator. We add two optional ports for plain and TLS-wrapped HTTP. We test each new port, that the old ports still work, and that mixing up a port with no proxy protocol and a connection with proxy protocol (or the opposite) fails. The latter serves to show that the testing strategy is valid and doesn't just pass whatever happens. We also verify that the correct addresses (and TLS mode) show up in system.clients. Closes scylladb/scylladb#27889	2026-01-13 09:59:24 +02:00
Nadav Har'El	e7df03127b	alternator: support "deflate" encoding in request compression Currently Alternator supports compressed requests in the gzip format with "Content-Encoding: gzip". We did not support any other compression formats. It turns out that DynamoDB also supports the "deflate" encoding. The "deflate" format is just a small variant of gzip and also supported by the same zlib library that we already use, so it is very easy to add support for it as well. So this patch adds it. Beyond compatibility with DynamoDB, another benefit of this patch is symmetry with our response compression support (PR #27454), where we supported both gzip and deflate compression of responses - so we should support the same for requests. This patch also adds tests for Content-Encoding: deflate, which pass on DynamoDB (proving that "deflate" is indeed supported there). On Alternator the new tests failed before this patch and pass with this patch. Refs #27243 (which asks to support more compression formats). Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27917	2026-01-13 09:58:12 +02:00
Nadav Har'El	a1f198d453	test/cqlpy: translate Cassandra's test InsertInvalidateSizedRecordsTest This is a translation of Cassandra's CQL unit test source file validation/operations/InsertInvalidateSizedRecordsTest.java into our cqlpy framework. This is one of the tests added to Cassandra as part of the vector search work, but actually has nothing to do with vector search - it checks what happens when key columns of different types exceeed their maximum size (64KB). Unfortunately, each one of the tests added here fail on ScyllaDB, providing more reproducers for two already known issues (which already had plenty of reproducers...): Refs #8627 Cleanly reject updates with indexed values where value > 64k Refs #12247 Better error reporting for oversized keys during INSERT One of the tests also fails on Cassandra, due to CASSANDRA-19270. It is not clear to me how this unit test actually passed on Cassandra, I can only guess that the Python driver somehow makes the request differently than what the Java unit tests use to make requests to Cassandra. One of the tests in the original Cassandra source file I did not translate, readingEmptyStringsForDifferentTypes, because it tests cqlsh, not pure CQL. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#27944	2026-01-13 08:59:36 +02:00
Botond Dénes	4ec3d76b87	test/pylib/coverage_utils.py: profdata_to_lcov: don't mutate defaulted param It is considered a dangerous practice as it creates a side-effect for later calls to the same function. Create a new variable instead and mutate that. Also remove the unused update_known_ids parameter, which defaults to True and no caller changes it. Passing False to this param also seem to have no effect. Instead of trying to guess what the desired effect of passing False is and fixing it, just remove this unused param. Found by CodeQL "Modification of parameter with default".	2026-01-13 08:33:17 +02:00
Botond Dénes	157fe2b80e	test/cluster/dtest/tools/misc.py: add type annotations to list_to_hashed_dict() To hopefully shut up CodeQL "Iterable can be either a string or a sequence". This change makes the code more readable anyway, so it is more than just a gratuitous change to make some code-scanner happy.	2026-01-13 08:33:17 +02:00
Botond Dénes	3c6e9637a0	test/pylib/lcov_utils.py: don't call set when iterating over it Probably a typo. Found by CodeQL "Non-callable called".	2026-01-13 08:33:17 +02:00
Botond Dénes	d2db84714e	test/cluster/object_store/conftest.py: add missing call to parent constructor Replace manual init of parent fields. Found by CodeQL: "Missing call to superclass `__init__` during object initialization". The secret_key is not initialized to server.secret_key, instead of server.access_key. This probably fixes a (benign) bug.	2026-01-13 08:33:17 +02:00
Botond Dénes	725e99e263	Merge 'test.py: Fix boost logs' from Andrei Chekun Write the boost logs into stdout in HRF format and in XML to the file. The XML file will be used for parsing and providing the error information in the summary section of the fail. Fixes: https://github.com/scylladb/scylladb/issues/28045 Framework enhancements, no need to backport. Closes scylladb/scylladb#28107 * github.com:scylladb/scylladb: test.py: remove XML log from fail summary test.py: fix truncated boost output to stdout file	2026-01-13 06:19:05 +02:00
Andrei Chekun	dfa6a61721	test.py: remove XML log from fail summary Remove XML log from fail summary. Add text from the first error in the XML file to the fail summary	2026-01-12 14:26:58 +01:00
Andrei Chekun	d96a50481a	test.py: fix truncated boost output to stdout file Change the behavior of the catching the boost log output. With this change boost will output it's logging to stdour with HRF format and to the tempfile in XML format. This will help for easier debuggint when all messages will be in the output file and still in the fail summary.	2026-01-12 14:15:23 +01:00
Botond Dénes	6bcc18e5c6	erge 'test.py: integrate python tests to be executed with pytest runner' from Andrei Chekun This will move responsibility for running tests with pytest in the same manner as it was done with boost tests. From this commit, test.py is not responsible anymore for running python tests and relies completely on pytest. This is another step for unification of test execution. Convert skip_mode function to `pytest.mark` to be able to use to annotate the whole module instead of each test explicitly. NOTE: this is a breaking change. From this commit, several directories with tests will require a path to the file to launch the test. Affected directories test/alternator test/broadcast_tables test/cql test/cqlpy test/rest_api Changes only in framework, so no backport. This PR will increase the amount of the tests by 30 test, due to the fact that how test.py and pytest discover tests. test.py count a file as a test, and when skip used in suite.yaml it will exclude the tests from discovery completely. While the pytest count test funstion as a test and uses skip_mode mark and will discover the tests, but it will skip them during execution, hence the difference test.py output before PR: ```bash > ./test.py --mode=release rest_api/test_compaction_task rest_api/test_task_manager --list --no-gather-metrics ``` test.py output in this PR: ```bash > ./test.py --mode=release test/rest_api/test_compaction_task.py test/rest_api/test_task_manager.py --list rest_api/test_compaction_task.py::test_global_major_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_major_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task.release.1 rest_api/test_compaction_task.py::test_reshaping_compaction_task.release.1 rest_api/test_compaction_task.py::test_resharding_compaction_task.release.1 rest_api/test_compaction_task.py::test_regular_compaction_task.release.1 rest_api/test_compaction_task.py::test_compaction_task_abort.release.1 rest_api/test_compaction_task.py::test_major_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_cleanup_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_offstrategy_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_rewrite_sstables_keyspace_compaction_task_async.release.1 rest_api/test_compaction_task.py::test_compaction_progress[major_keyspace_compaction_task_impl_run_fail].release.1 rest_api/test_compaction_task.py::test_compaction_progress[shard_major_keyspace_compaction_task_impl_run_fail].release.1 rest_api/test_compaction_task.py::test_compaction_progress[table_major_keyspace_compaction_task_impl_run_fail].release.1 rest_api/test_task_manager.py::test_task_manager_modules.release.1 rest_api/test_task_manager.py::test_task_manager_tasks.release.1 rest_api/test_task_manager.py::test_task_manager_status_running.release.1 rest_api/test_task_manager.py::test_task_manager_status_done.release.1 rest_api/test_task_manager.py::test_task_manager_status_failed.release.1 rest_api/test_task_manager.py::test_task_manager_not_abortable.release.1 rest_api/test_task_manager.py::test_task_manager_wait.release.1 rest_api/test_task_manager.py::test_task_manager_ttl.release.1 rest_api/test_task_manager.py::test_task_manager_user_ttl.release.1 rest_api/test_task_manager.py::test_task_manager_sequence_number.release.1 rest_api/test_task_manager.py::test_task_manager_recursive_status.release.1 rest_api/test_task_manager.py::test_module_not_exists.release.1 rest_api/test_task_manager.py::test_task_folding.release.1 rest_api/test_task_manager.py::test_abort_on_unregistered_task.release.1 ``` Fixes: https://github.com/scylladb/scylladb/issues/27716 Closes scylladb/scylladb#26395 * github.com:scylladb/scylladb: test.py: fix test_vector_similarity.py docs: add directories excluded from test.py test.py: prevent file descriptors leaking test.py: capture print inside the test test.py: do not print header for collection with test.py test.py: remove not supported functionality test.py: switch of execution of several test directories by test.py runner test.py: integrate python tests to be executed with pytest runner test.py: fix test/vector_search_validator to be able to run with pytest test.py: prepare base class for migration test.py: move environment preparation to one method test.py: introduce new environment variable TESTPY_PREPARED_ENVIRONMENT	2026-01-12 14:17:19 +02:00
Botond Dénes	04b8f72946	Merge 'repair: Implement auto repair for tablet repair' from Asias He repair: Implement auto repair for tablet repair This patch implements the basic auto repair support for tablet repair. It was decided to add no per table configuration for the initial implementation, so two scylla yaml config options are introduced to set the default auto repair configs for all the tablet tables. - auto_repair_enabled_default Set true to enable auto repair for tablet tables by default. The value will be overridden by the per keyspace or per table configuration which is not implemented yet. - auto_repair_threshold_default_in_seconds Set the default time in seconds for the auto repair threshold for tablet tables. If the time since last repair is bigger than the configured time, the tablet is eligible for auto repair. The value will be overridden by the per keyspace or per table configuration which is not implemented yet. The following metrcis are added: - auto_repair_needs_repair_nr The number of tablets with auto repair enabled that needs repair - auto_repair_enabled_nr The number of tablets with auto repair enabled The metrics are useful to tell if auto repair is falling behind. In the future, more auto repair scheduling will be added, e.g., scheduling based on the repaired and unrepaired sstable set size, tombstone ratio and so on, in addition to the time based scheduling. Fixes SCYLLADB-99 New feature. No backport. Closes scylladb/scylladb#27534 * github.com:scylladb/scylladb: topology_coordinator: Add metrics for tablet repair repair: Implement auto repair for tablet repair	2026-01-12 14:16:01 +02:00
Petr Gusev	889d7782ed	treewide: use coroutine::maybe_yield in coroutines It's more efficient since coroutine::maybe_yield returns a lightweight struct (awaitable), not the future. Closes scylladb/scylladb#28101	2026-01-12 10:38:47 +01:00
Alex	e430065c92	db: views: serialize create/drop view operations via shard 0 Create and drop view operations are currently performed on all shards, and their execution is not fully serialized. On slower processors this can lead to interleavings that leave stale entries in `system.scylla_views_build` A problematic sequence looks like this: * `on_create_view()` runs on shard 0 → entries for shard 0 and shard 1 are created * `on_drop_view()` runs on shard 0 → entry for shard 0 is removed * `on_create_view()` runs on shard 1 → entries for shard 0 and shard 1 are created again * `on_drop_view()` runs on shard 1 → entry for shard 1 is removed, while the shard 0 entry remains This results in a leftover row in `system.scylla_views_builds_in_progress`, causing `view_build_test.cc` to get stuck indefinitely in an eventual state and eventually be terminated by CI. This patch fixes the issue by fully serializing all view create and drop operations through shard 0. Shard 0 becomes the single execution point and notifies other shards to perform their work in order. Requests originating. new process: - view_builder::on_create_view(...) runs only on shard 0 and kicks off dispatch_create_view(...) in the background. - dispatch_create_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed. - dispatch_create_view(...) calls handle_seed_view_build_progress(...) on shard 0. That: - writes the global “build progress” row across all shards via _sys_ks.register_view_for_building_for_all_shards(...). - After seeding, dispatch_create_view(...) broadcasts to all shards with container().invoke_on_all(...). - Each shard runs handle_create_view_local(...), which: - waits for pending base writes/streams, flushes the base, - resets the reader to the current token and adds the new view, - handles errors and triggers _build_step to continue processing. Drop view - view_builder::on_drop_view(...) runs only on shard 0 and kicks off dispatch_drop_view(...) in the background. - dispatch_drop_view(...) (shard 0) first checks should_ignore_tablet_keyspace(...) and returns early if needed. - It broadcasts handle_drop_view_local(...) to all shards with invoke_on_all(...). - Each shard runs handle_drop_view_local(...), which: - removes the view from local build state (_base_to_build_step and _built_views) by scanning existing steps, - ignores missing keyspace cases. - After all shards finish local cleanup, shard 0 runs handle_drop_view_global_cleanup(...), which: - removes global build progress, built‑view state, and view build status in system tables, Shutdown - drain() waits on _view_notification_sem before _sem so in‑flight dispatches finish before bookkeeping is halted. In addition, the test is adjusted to remove the long eventual wait (596.52s / 30 iterations) and instead rely on the default wait of 17 iterations (~4.37 minutes), eliminating unnecessary delays while preserving correctness. Fixes: https://github.com/scylladb/scylladb/issues/27898 Backport: not required as the problem happens on master Closes scylladb/scylladb#27929	2026-01-12 09:23:22 +02:00
Michał Hudobski	92c988514c	vector_search: allow all where clauses in vector search queries To prepare for implementation of filtering we skip validation of where clauses in vector search queries. All queries that would be blocked by the lack of ALLOW FILTERING now will pass through. Fixes: VECTOR-410 Closes scylladb/scylladb#27758	2026-01-11 12:56:44 +02:00

1 2 3 4 5 ...

10533 Commits