scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Dawid Pawlik	c2d27d1a50	index: remove Chinese, Japanese, and Korean language analyzers Remove "chinese", "japanese", and "korean" from the list of accepted full-text search analyzer options. Exposing these options commits ScyllaDB to supporting them long-term — if we ever switch from one backend search engine to another, CJK analyzers are the most likely to lose out-of-the-box support, unlike the popular European languages that are broadly available across text analysis libraries. Restrict the accepted set now, while FTS is still new, to avoid a future compatibility burden. Add a test to check if the CJK language analyzer options are rejected. Fixes: VECTOR-672 Closes scylladb/scylladb#29877	2026-05-18 18:20:47 +03:00
Szymon Malewski	15493872b2	vector_search: fix decimal/varint precision loss in filter value_to_json() value_to_json() converts CQL values to JSON for vector search filters. For decimal and varint types, it used rjson::parse() on the JSON string, which parses through a double and silently loses precision for values exceeding ~15 significant digits — producing wrong filter results. Additionally, for decimal type we need an exact string representation that preserves the original (unscaled, scale) pair, because partition keys use byte-level identity: different serialized representations of the same numeric value are distinct rows, so the filter must reproduce the exact representation stored in the key. Add big_decimal::to_string_canonical() which follows the Java BigDecimal toString() spec (JDK 8+), producing a bijective string representation that uses exponential notation for extreme scales instead of expanding trailing zeros (which could cause OOM). This could replace to_string(), but doing so has wider consequences (e.g. hash/equality contract for decimal_type) described in SCYLLADB-1574. Use it in value_to_json() for decimal_type, and use rjson::from_string() for varint_type, both bypassing the lossy double parse path. Tests cover the new to_string_canonical() and the filter fix, as well as existing decimal type behavior (key representation, clustering order, toJson) that we rely on and must not break. The CQL decimal type tests (test_type_decimal.py) also pass against Cassandra. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1583 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-1574 Closes scylladb/scylladb#29505	2026-05-18 17:07:26 +03:00
Evgeniy Naydanov	39a10d6d67	test: remove dead suite subclasses and legacy execution pipeline After all test suites migrated to test_config.yaml with type: Python, the specialized suite classes (Topology, CQLApproval, Run, Tool) and the legacy execution pipeline (find_tests, run_test, TestSuite.run, Test.run) became unreachable. Remove all this dead code. Deleted files: - suite/topology.py, suite/cql_approval.py, suite/run.py, suite/tool.py Simplified: - base.py: remove run_test(), read_log(), TestSuite.run(), add_test_list(), build_test_list(), all_tests(), test_count(), SUITE_CONFIG_FILENAME, disabled/flaky test tracking, and dead Test attributes (args, core_args, valid_exit_codes, allure_dir, is_flaky, is_cancelled, etc.) - python.py: remove PythonTestSuite.run(), PythonTest.run(), _prepare_pytest_params(), pattern, test_file_ext, xmlout, server_log, scylla_env setup, and shlex import. Simplify run_ctx() to take no parameters. - runner.py: remove --scylla-log-filename option, print_scylla_log_filename fixture, SUITE_CONFIG_FILENAME import, and suite.yaml probe in TestSuiteConfig.from_pytest_node(). - __init__.py: remove re-exports of deleted classes. - test_config.yaml: Topology -> Python, Approval -> Python. - conftest files: run_ctx(options=...) -> run_ctx(). - docs/dev/testing.md: update to reflect current pytest-based architecture, log paths, and removed features. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com> Closes scylladb/scylladb#29613	2026-05-17 22:16:31 +03:00
Marcin Maliszkiewicz	ec8f8e3a5b	Merge 'test: make test_vector_search_with_vector_store_mock 30 times faster!' from Nadav Har'El Before this patch, ``` test/cqlpy/run test_vector_search_with_vector_store_mock.py ``` Took 34 seconds. After this patch, it takes 1 second. Look at the individual patches for how the magic happened. The first patch lowers the test duration from 34 to 5 seconds, the second patch lowers it further to 1 second. Closes scylladb/scylladb#29891 * github.com:scylladb/scylladb: test/cqlpy: make test_vector_search_with_vector_store_mock faster vector-search: reset DNS timeout after changing host	2026-05-14 17:12:47 +02:00
Botond Dénes	1403f18240	Merge 'alternator: add more vector search features' from Nadav Har'El Recently (in commit `37fc1507f0`) we added vector search support for Alternator. That implementation was functional, but did not yet support all the features that we had envisioned. This patch series adds some of the missing features to Alternator's vector search. Each feature is described in more detail in its own patch. * Metrics related to vector search usage in Alternator. * `SimilarityFunction` option when creating a vector index to choose the similarity function. Defaults to `COSINE` (the existing default). Other options are `DOT_PRODUCT` and `EUCLIDEAN`. * An optimized vector type, `{"FLOAT32VECTOR": [1.0, 2.0, ..]}`, which is stored on disk efficiently as 32-bit floats, not a JSON. * A Query VectorSearch option `ReturnScores` asking to return the similarity score calculated for each returned result (the results are sorted in decreasing similarity score - the highest similarity is the best and returned first). Closes scylladb/scylladb#29554 * github.com:scylladb/scylladb: alternator: add ReturnScores option to VectorSearch vector_store_client: read and return similarity_scores alternator: add optimized vector type for vector search alternator: add SimilarityFunction option to vector index creation alternator: add vector search metrics	2026-05-14 10:41:41 +03:00
Nadav Har'El	5c065c7746	test/cqlpy: make test_vector_search_with_vector_store_mock faster The previous patch made test_vector_search_with_vector_store_mode significantly faster, but at 5 seconds for 7 tests, it was still not fast enough. It turns out that the reason why the tests was slow is that each test used a function-scoped fixture, which set up the vector store mock again and again, separately for each test. This - especially waiting for the client in Scylla to recognize the new server - took time (before the previous patch it was 5 seconds, after the patch it went down to 0.5 seconds - but still too slow). The solution is simple: 1. Create a module scoped fixture that creates the mock and connects it to Scylla just once for all the tests in that file. 2. The function scoped fixture just uses the module-scoped one but resets the saved responses, to avoid one test influencing the other. After this patch, the time to run this test file is down to 1 second (!). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:57:56 +03:00
Nadav Har'El	c56361a6d7	vector_store_client: read and return similarity_scores The vector store returns for every ANN search, in addition to the keys of the matching items, two additional vectors - "distances" and "similarity_cores". The "distances" are raw distance metrics - lower scores are better matches, while "similarity_scores" are modified such that higher scores are better matches. Traditionally, search scores in systems like Cassandra and Open Search use the "similarity scores" approach (higher is better, results are returned in decreasing similarity order), so this is the more interesting vector of the two. But before this patch, our vector_store_client::ann() inspected only "distances". But... then, it didn't return even that to the caller :-) So in this patch, we: 1. Ignore "distances" and instead look at "similarity scores", which is what users really want based on their experience with other vector and non-vector search engines. 2. Return the similarity score of each match together with the match. We already have this score (the vector store returns it) and we can add it to the existing primary_key structure of each result. So each result is a "struct primary_key" which has fields partition, clustering, and after this patch - similarity. Existing callers in CQL and Alternator vector search will ignore this "similarity" field in each result, and not notice it was added. But in the next patch, we'll allow Alternator's vector search to return this similarity in each result. The existing unit tests for vector_store_client.cc mocked vector-store responses with "distances", without "similarity_scores", so no longer represent what we actually expect the vector store to do. So this patch also contains modifications for these tests, to mock and to test "similarity_scores" - not "distances". The more interesting tests, in the next patch, use the real vector store and check that we really do get a "similarity_scores" response from it. This patch also handles a small corner case for DOT_PRODUCT, which is the only unbounded similarity function. If the similarity overflows the 32-bit float, the vector store returns a JSON "null" instead of a JSON number (since JSON doesn't support infinite numbers). Our existing vector-store client code errored out when it saw this "null", which is wrong - the request should be allowed to proceed. So in this patch when we see a "null" JSON for similarity, we return +Inf. This is usually correct because the top results really have +Inf, not -Inf, but if we ask for all items we can reach those with similarity -Inf and incorrectly assign +Inf to them (we have a test for this case in the next patch). But this problenm won't happen when Limit is low, and in any case it's better than aborting the request after it had already succeeded. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:19:17 +03:00
Nadav Har'El	51c35c05e2	test/cqlpy: teach run-cassandra to use Docker The test/cqlpy/run-cassandra script makes it quite easy to run test/cqlpy tests against Cassandra, which is important for checking compatibility. Unfortunately, because modern Linux distributions like Fedora do not have either Cassandra or the old version of Java that it needs, the user needs to download those manually. This is fairly easy, and explained in detail in test/cqlpy/README.md, but nevertheless is a non-trivial manual step. So this patch adds an even simpler alternative, the "--docker" option which tells the script to run the official Cassandra docker image, complete with the version of Java that it prefers - the user does not need to download or install Cassandra or Java. The image is efficiently cached by Docker, so running run-cassandra again doesn't need to download it again; Moreover, trying several different versions of Cassandra only needs to download and store the shared parts (base image and Java) once. test/cqlpy/run-cassandra --docker test_file.py::test_function Runs by default the latest Cassandra 5 release. You can also use "--docker=4" to get the latest Cassandra 4 release, "--docker=3.11" to get the latest Cassandra 3.11 patch release, or "--docker=3.11.1" to get a specific patch release. In addition to the "--docker" option, this patch also introduces a second option, "--java-docker", which takes only Java from docker, but runs your locally installed Cassandra (to which you should point with the CASSANDRA environment variable, as before). This option can be useful if your host does not have a suitable version of Java, but you want to run a locally-installed or locally-modified version of Cassandra. The "--java-docker" option defaults to getting Java 11, to use other versions you can use for example "--java-docker=17". Fixes #25826. Closes scylladb/scylladb#29860	2026-05-13 11:57:18 +02:00
Yaniv Michael Kaul	c359a09189	test: add UDF/UDA keyspace isolation and UDT tests Port 3 tests from scylla-dtest user_functions_test.py: - test_udf_with_udt: UDF taking frozen UDT arg, verifies DROP TYPE blocked - test_udf_with_udt_keyspace_isolation: cross-keyspace UDT references rejected - test_aggregate_with_udt_keyspace_isolation: cross-keyspace UDT in UDA rejected All tests use Lua (Scylla's supported UDF language). Reproduces CASSANDRA-9409. Closes scylladb/scylladb#1928 Closes scylladb/scylladb#29843	2026-05-12 14:57:14 +03:00
Piotr Smaron	1018710e38	test/cqlpy: un-xfail oversized indexed value build test Issue #8627 is fixed, so test_too_large_indexed_value_build now passes and should run normally instead of XPASSing under strict xfail. Fixes: SCYLLADB-1938 Closes scylladb/scylladb#29853	2026-05-12 11:40:53 +02:00
Botond Dénes	8d6f031a4a	schema: fix DESCRIBE showing NullCompactionStrategy when compaction is disabled When a table's compaction is disabled via 'enabled': 'false', the DESCRIBE output incorrectly showed NullCompactionStrategy instead of the actual strategy. This happened because schema_properties() called compaction_strategy(), which returns compaction_strategy_type::null when compaction is disabled. Fix it by using configured_compaction_strategy(), which always returns the real strategy type - consistent with how schema_tables.cc serializes it to disk. Fixes SCYLLADB-1353 Closes scylladb/scylladb#29804	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	150345cc52	Merge 'test: per-bucket isolation for S3/GCS object storage tests' from Ernest Zaslavsky This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions. New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully. A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion. A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations. Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness. Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility. \| Test Name \| new test specific retry strategy execution time (ms) \| original execution time (ms) \| Δ (ms) \| Speedup \| \|--------------------------------------------------------------\|----------------:\|-------------:\|---------:\|--------:\| \| test_client_upload_file_multi_part_with_remainder_proxy \| 19,261 \| 61,395 \| −42,134 \| 3.2× \| \| test_client_upload_file_multi_part_without_remainder_proxy \| 16,901 \| 53,688 \| −36,787 \| 3.2× \| \| test_client_upload_file_single_part_proxy \| 3,478 \| 6,789 \| −3,311 \| 2.0× \| \| test_client_multipart_copy_upload_proxy \| 1,303 \| 1,619 \| −316 \| 1.2× \| \| test_client_put_get_object_proxy \| 150 \| 365 \| −215 \| 2.4× \| \| test_client_readable_file_stream_proxy \| 125 \| 327 \| −202 \| 2.6× \| \| test_small_object_copy_proxy \| 205 \| 389 \| −184 \| 1.9× \| \| test_client_put_get_tagging_proxy \| 181 \| 350 \| −169 \| 1.9× \| \| test_client_multipart_upload_proxy \| 1,252 \| 1,416 \| −164 \| 1.1× \| \| test_client_list_objects_proxy \| 729 \| 881 \| −152 \| 1.2× \| \| test_chunked_download_data_source_with_delays_proxy \| 830 \| 960 \| −130 \| 1.2× \| \| test_client_readable_file_proxy \| 148 \| 279 \| −131 \| 1.9× \| \| test_client_upload_file_multi_part_with_remainder_minio \| 3,358 \| 3,170 \| +188 \| 0.9× \| \| test_client_upload_file_multi_part_without_remainder_minio \| 3,131 \| 2,929 \| +202 \| 0.9× \| \| test_client_upload_file_single_part_minio \| 519 \| 421 \| +98 \| 0.8× \| \| test_download_data_source_proxy \| 180 \| 237 \| −57 \| 1.3× \| \| test_client_list_objects_incomplete_proxy \| 590 \| 641 \| −51 \| 1.1× \| \| test_large_object_copy_proxy \| 952 \| 991 \| −39 \| 1.0× \| \| test_client_multipart_upload_fallback_proxy \| 148 \| 185 \| −37 \| 1.3× \| \| test_client_multipart_copy_upload_minio \| 641 \| 674 \| −33 \| 1.1× \| No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods. Closes scylladb/scylladb#29508 * github.com:scylladb/scylladb: test: extract object storage helpers to test/pylib/object_storage.py test: add per-test bucket isolation to object_store fixtures s3: add client::make overload with custom retry strategy test: add s3_test_fixture and migrate tests to per-bucket isolation s3: add create_bucket and delete_bucket to client	2026-05-12 12:38:24 +03:00
Piotr Smaron	71542206bc	cql: return InvalidRequest for oversized partition/clustering keys When a partition key or clustering key value exceeds the 64 KiB limit (65535 bytes serialized), Scylla used to raise a generic std::runtime_error "Key size too large: N > M" from the low-level compound-key serializer. That error surfaced to clients as a CQL server error (code 0x0000, "NoHostAvailable"-looking), which is both ugly and incompatible with Cassandra - Cassandra returns a clean InvalidRequest with the message "Key length of N is longer than maximum of M". Fix this at the single chokepoint: compound_type::serialize_value in keys/compound.hh. The serializer is on every path that materializes a key - INSERT/UPDATE/DELETE/BATCH build mutations through it, and SELECT builds partition and clustering ranges through it - so a single throw replacement produces a clean InvalidRequest consistently across all paths and all key shapes (single, compound PK, composite CK). The previous approach on this PR branch patched three call sites in cql3/restrictions/statement_restrictions.cc, which only covered SELECT, duplicated the check, and placed it mid-restrictions code (flagged in review). Dropping those changes in favour of the root-cause fix here. Un-xfail the tests this fixes: - test/cqlpy/test_key_length.py: test_insert_65k_pk, test_insert_65k_ck, test_where_65k_pk, test_where_65k_ck, test_insert_65k_ck_composite, test_insert_total_compound_pk_err, test_insert_total_composite_ck_err. - test/cqlpy/cassandra_tests/.../insert_test.py: testPKInsertWithValueOver64K, testCKInsertWithValueOver64K. - test/cqlpy/cassandra_tests/.../select_test.py: testPKQueryWithValueOver64K. test_insert_65k_pk_compound stays xfail: its oversized value gets rejected by the Python driver's CQL wire-protocol encoder (see CASSANDRA-19270) before reaching the server, so the fix can't apply. Updated its reason. testCKQueryWithValueOver64K stays xfail with an updated reason: Cassandra silently returns empty for an oversized clustering key in WHERE, while Scylla now throws InvalidRequest - a deliberate choice mirroring the partition-key case, documented in the discussion on #10366. Add three tight-boundary tests (addressing review feedback on the previous revision) that pin MAX+1 behaviour for SELECT and INSERT of both partition and clustering keys. Update test/cluster/dtest/limits_test.py to match the new message ("Key length of \\d+ is longer than maximum of 65535"). fixes #10366 fixes #12247 Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com> Closes scylladb/scylladb#23433	2026-05-11 16:56:35 +03:00
Piotr Smaron	959f67b345	cql: verify tuples length in multi-column IN restriction When a multi-column IN restriction contains tuples with a different number of elements than the number of restricted columns (e.g. `(b, c, d) IN ((1, 2), (2, 1, 4))`), Scylla would either produce an inconsistent error message or, for over-sized tuples, an internal type-mismatch error referencing the list literal representation. Validate each tuple's arity against the number of restricted columns while building the IN restriction and raise a clear "Expected N elements in value tuple, but got M" error in both the under- and over-sized cases. Fixes #13241 Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com> Closes scylladb/scylladb#18407	2026-05-11 16:55:09 +03:00
Nadav Har'El	f1b2b9bd52	Merge 'Register `fulltext_index` custom index type' from Dawid Pawlik This PR adds the `fulltext_index` custom index class, laying the groundwork for full-text search in ScyllaDB. It focuses on the CQL-facing layer - schema validation, option parsing, and metadata - without implementing the search backend itself. Users can now write: ```cql CREATE CUSTOM INDEX ON t(content) USING 'fulltext_index' WITH OPTIONS = {'analyzer': 'english', 'positions': 'false'}; ``` The implementation follows the same custom index pattern established by vector search: a `custom_index` subclass registered in the factory map, with no backing materialized view. This keeps the door open for a CDC-based indexing pipeline similar to the one vector search uses. As part of this work, the option validation helpers (`validate_enumerated_option`, `validate_positive_option`, `validate_factor_option`) were extracted from `vector_index.cc` into a shared header so both index types can reuse them. The `custom_index` base class also gained a virtual `index_type_name()` method, giving each subclass a self-describing name for error messages without hardcoding strings in shared code. The PR is split into three commits: 1. Extract shared validation utilities and add `index_type_name()` to `custom_index` 2. Implement `fulltext_index` with column type and option validation 3. Integration tests covering creation, validation, describe, and metadata Fixes: SCYLLADB-1517 Fixes: SCYLLADB-1510 References: SCYLLADB-1516 Closes scylladb/scylladb#29658 * github.com:scylladb/scylladb: test/cqlpy: add integration tests for `fulltext_index` index: unify custom index description index: add `fulltext_index` custom index implementation index: extract option validation helpers	2026-05-11 16:16:58 +03:00
Nadav Har'El	fcfad51284	Merge 'cql3/selection: require EXECUTE on UDA REDUCEFUNC at SELECT time' from Marcin Maliszkiewicz selection::used_functions() pushed the UDA, its SFUNC and its FINALFUNC, but never the REDUCEFUNC. The reducefunc is invoked by the distributed aggregation path in service::mapreduce_service, so a user could cause it to run server-side without holding EXECUTE on it as long as the query took the mapreduce path. Also push agg.state_reduction_function so select_statement::check_access requires EXECUTE on it too. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1756 Backport: no, it's a minor fix and UDFs are experimental feature in Scylla Closes scylladb/scylladb#29717 * github.com:scylladb/scylladb: test/cqlpy: add test for EXECUTE permission on UDA sub-functions cql3/selection: require EXECUTE on UDA REDUCEFUNC at SELECT time	2026-05-11 16:14:38 +03:00
Marcin Maliszkiewicz	fa9d15d31a	test/cqlpy: add test for EXECUTE permission on UDA sub-functions Verify that SELECT of a UDA requires EXECUTE on its SFUNC, FINALFUNC, and REDUCEFUNC individually. If any one permission is missing, the query must be rejected at planning time (even on an empty table). The test is parameterized over the three sub-functions and uses Lua on Scylla or Java on Cassandra, so it runs on both backends. The REDUCEFUNC case is skipped on Cassandra since REDUCEFUNC is a Scylla extension. Refs SCYLLADB-1756	2026-05-11 10:23:39 +02:00
Nadav Har'El	34136d3bc2	Merge 'vector_search: test: migrate CQL tests for vector search from C++/Boost to pytest' from Karol Nowacki Migrate vector search (ANN ordered select query) CQL tests from C++/Boost suite to pytest. This migration includes: - New pytest tests in `test/cqlpy/test_vector_search_with_vector_store_mock.py` - VectorStoreMock server as pytest fixture to simulate vector store responses The benefits of this migration are: - Extended test coverage to verify CQL protocol serialization and driver - Reduced overall test time (no compilation required for pytest) Fixes SCYLLADB-695 No backport needed as this is a refactoring. Closes scylladb/scylladb#29593 * github.com:scylladb/scylladb: vector_search: test: migrate paging warnings tests to Python vector_search: test: migrate local_vector_index to Python vector_search: test: migrate vector_index_with_additional_filtering_column to Python vector_search: test: migrate cql_error_contains_http_error_description to Python vector_search: test: migrate pk in restriction test to Python	2026-05-10 22:09:17 +03:00
Dawid Pawlik	b6d5ff344b	test/cqlpy: add integration tests for `fulltext_index` Add `test_fulltext_index.py` covering the `fulltext_index` custom index: - Creation on text, varchar, and ascii columns - Rejection of non-text types (int, blob, vector) - Validation of analyzer and positions options - Rejection of unsupported option keys - Case-insensitive class name lookup - DESCRIBE INDEX output with and without options - No backing materialized view in `system_schema.views` - IF NOT EXISTS idempotent behavior - Metadata correctness in `system_schema.indexes`	2026-05-08 11:30:08 +02:00
Yaniv Michael Kaul	7557c64f20	test/cqlpy: add tests for hyphenated column names Verify that double-quoted column names with hyphens (e.g. "my-col") work correctly for CREATE TABLE, INSERT, and SELECT. Also verify that unquoted hyphenated names are rejected with a syntax error.	2026-05-06 11:32:04 +03:00
Karol Nowacki	20b953ef8c	vector_search: test: migrate paging warnings tests to Python Move the paging warning related tests from C++ vector_store_client_test to Python test_vector_search_with_vector_store_mock.	2026-05-05 18:23:30 +02:00
Karol Nowacki	84787ce6a5	vector_search: test: migrate local_vector_index to Python Move the local vector index test from C++ vector_store_client_test to Python test_vector_search_with_vector_store_mock. The test creates a local vector index on ((pk1, pk2), embedding) and verifies that SELECT with partition key restriction and ANN ordering works correctly.	2026-05-05 18:23:30 +02:00
Karol Nowacki	0bb7e47090	vector_search: test: migrate vector_index_with_additional_filtering_column to Python Move the SCYLLADB-635 regression test from C++ vector_store_client_test to Python test_vector_search_with_vector_store_mock. The test creates a vector index on (embedding, ck1) and verifies that SELECT with ANN ordering works correctly when additional filtering columns are included in the index definition.	2026-05-05 18:23:30 +02:00
Karol Nowacki	5a8af3c727	vector_search: test: migrate cql_error_contains_http_error_description to Python Move the test that verifies HTTP error descriptions from the vector store are propagated through CQL InvalidRequest messages from the C++ vector_store_client_test to the Python test_vector_search_with_vector_store_mock. The test configures the mock to return HTTP 404 with 'index does not exist' and asserts the CQL SELECT raises InvalidRequest containing '404'.	2026-05-05 18:23:30 +02:00
Karol Nowacki	b672972c5f	vector_search: test: migrate pk in restriction test to Python Move vector search (ANN ordered select query) with IN restrictions on partition key from C++/Boost test suite to pytest (cqlpy). Add VectorStoreMock server as pytest fixture to simulate vector store responses.	2026-05-05 18:23:30 +02:00
Botond Dénes	eb3326b417	Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta should be merged after #29235 Complete the typed skip markers migration started in the plugin PR. Every bare `@pytest.mark.skip` decorator and `pytest.skip()` runtime call across the test suite is replaced with a typed equivalent, making skip reasons machine-readable in JUnit XML and Allure reports. 62 files changed across 8 commits, covering ~127 skip sites in total. Bare `pytest.skip` provides only a free-text reason string. CI dashboards (JUnit, Allure) cannot distinguish between a test skipped due to a known bug, a missing feature, a slow test, or an environment limitation. This makes it hard to track skip debt, prioritize fixes, or filter dashboards by skip category. The typed markers (`skip_bug`, `skip_not_implemented`, `skip_slow`, `skip_env`) introduced by the `skip_reason_plugin` solve this by embedding a `skip_type` field into every skip report entry. \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_bug` \| 24 \| 16 \| Skip reason references a known bug/issue \| \| `skip_not_implemented` \| 10 \| 5 \| Feature not yet implemented in Scylla \| \| `skip_slow` \| 4 \| 3 \| Test too slow for regular CI runs \| \| `skip_not_implemented` (bare) \| 2 \| 1 \| Bare `@pytest.mark.skip` with no reason (COMPACT STORAGE, #3882) \| \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_env` \| ~85 \| 34 \| Feature/config/topology not available at runtime \| \| `skip_bug` \| 2 \| 2 \| Known bugs: Streams on tablets (#23838), coroutine task not found (#22501) \| - Comments: 7 comments/docstrings across 5 files updated from `pytest.skip()` to `skip()` - Plugin hardened: `warnings.warn()` → `pytest.UsageError` for bare `@pytest.mark.skip` at collection time — bare skips are now a hard error, not a warning - Guard tests: New `test/pylib_test/test_no_bare_skips.py` with 3 tests that prevent regression: - AST scan for bare `@pytest.mark.skip` decorators - AST scan for bare `pytest.skip()` runtime calls - Real `pytest --collect-only` against all Python test directories Runtime skip sites use the convenience wrappers from `test.pylib.skip_types`: ```python from test.pylib.skip_types import skip_env ``` Usage: ```python skip_env("Tablets not enabled") ``` 1. test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs — 24 decorator sites, 16 files 2. test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented — 10 decorator sites, 5 files 3. test: migrate @pytest.mark.skip to @pytest.mark.skip_slow — 4 decorator sites, 3 files 4. test: migrate bare @pytest.mark.skip to skip_not_implemented — 2 bare decorators, 1 file 5. test: migrate runtime pytest.skip() to typed skip_env() — ~85 sites, 34 files 6. test: migrate runtime pytest.skip() to typed skip_bug() — 2 sites, 2 files 7. test: update comments referencing pytest.skip() to skip() — 7 comments, 5 files 8. test/pylib: reject bare pytest.mark.skip and add codebase guards — plugin hardening + 3 guard tests - All 60 plugin + guard tests pass (`test/pylib_test/`) - No bare `@pytest.mark.skip` or `pytest.skip()` calls remain in the codebase - `pytest --collect-only` succeeds across all test directories with the hardened plugin SCYLLADB-1349 Closes scylladb/scylladb#29305 * github.com:scylladb/scylladb: test/alternator: replace bare pytest.skip() with typed skip helpers test: migrate new bare skips introduced by upstream after rebase test/pylib: reject bare pytest.mark.skip and add codebase guards test: update comments referencing pytest.skip() to skip_env() test: migrate runtime pytest.skip() to typed skip_bug() test: migrate runtime pytest.skip() to typed skip_env() test: migrate bare @pytest.mark.skip to skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_slow test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs	2026-04-22 15:48:27 +03:00
Ernest Zaslavsky	9faaf1f09c	test: extract object storage helpers to test/pylib/object_storage.py Move S3/GCS server classes (S3Server, MinioWrapper, GSFront, GSServer), factory functions (create_s3_server, create_gs_server), CQL helpers (format_tuples, keyspace_options), bucket naming (_make_bucket_name), and the s3_server fixture from test/cluster/object_store/conftest.py into a shared module at test/pylib/object_storage.py. The conftest.py is now a thin wrapper that re-exports symbols and defines only the fixtures specific to the object_store suite (object_storage, s3_storage). All external importers are updated. Old class names (S3_Server, GSServer) are kept as aliases for backward compatibility.	2026-04-21 19:08:57 +03:00
Łukasz Paszkowski	d18eb9479f	cql/statement: Create keyspace_metadata with correct initial_tablets count In `ks_prop_defs::as_ks_metadata(...)` a default initial tablets count is set to 0, when tablets are enabled and the replication strategy is NetworkReplicationStrategy. This effectively sets _uses_tablets = false in abstract_replication_strategy for the remaining strategies when no `tablets = {...}` options are specified. As a consequence, it is possible to create vnode-based keyspaces even when tablets are enforced with `tablets_mode_for_new_keyspaces`. The patch sets a default initial tablets count to zero regardless of the chosen replication strategy. Then each of the replication strategy validates the options and raises a configuration exception when tablets are not supported. All tests are altered in the following way: + whenever it was correct, SimpleStrategy was replaced with NetworkTopologyStrategy + otherwise, tablets were explicitly disabled with ` AND tablets = {'enabled': false}` Fixes https://github.com/scylladb/scylladb/issues/25340 Closes scylladb/scylladb#25342	2026-04-20 17:57:38 +03:00
Wojciech Mitros	6011cb8a4c	db/view: track range tombstones in update stream during view update building The view update builder ignored range tombstone changes from the update stream when there all existing mutation fragments were already consumed. The old code assumed range tombstones 'remove nothing pre-existing, so we can ignore it', but this failed to update _update_current_tombstone. Consequently, when a range delete and an insert within that range appeared in the same batch, the range tombstone was not applied to the inserted row, or was applied to a row outside the range that it covered causing it to incorrectly survive/be deleted in the materialized view. Fix by handling is_range_tombstone_change() fragments in the update-only branch, updating _update_current_tombstone so subsequent clustering rows correctly have the range tombstone applied to them. Fixes SCYLLADB-1555 Closes scylladb/scylladb#29483	2026-04-20 13:38:52 +02:00
Wojciech Mitros	073710a661	view: apply existing range tombstones after exhausting the update reader When view_update_builder::on_results() hits the path where the update fragment reader is already exhausted, it still needs to keep tracking existing range tombstones and apply them to encountered rows. Otherwise a row covered by an existing range tombstone can appear alive while generating the view update and create a spurious view row. Update the existing tombstone state even on the exhausted-reader path and apply the effective tombstone to clustering rows before generating the row tombstone update. Add a cqlpy regression test covering the partition-delete-after-range-tombstone case. Fixes: SCYLLADB-1554 Closes scylladb/scylladb#29481	2026-04-20 13:29:05 +02:00
Artsiom Mishuta	b078cd1e72	test: migrate new bare skips introduced by upstream after rebase Migrate 3 bare skip sites that appeared in upstream/master after the initial migration: - test/cluster/test_strong_consistency.py: 2 @pytest.mark.skip → @pytest.mark.skip_bug (SCYLLADB-1056) - test/cqlpy/conftest.py: pytest.skip() → skip_env() in skip_on_scylla_vnodes fixture	2026-04-19 17:34:41 +02:00
Artsiom Mishuta	0b6b380b80	test: update comments referencing pytest.skip() to skip_env() Update 7 comments/docstrings across 5 files that still referenced pytest.skip() to reference the typed skip_env() wrapper for consistency with the migrated code.	2026-04-19 11:14:03 +02:00
Artsiom Mishuta	8a80e2c3be	test: migrate runtime pytest.skip() to typed skip_env() Migrate runtime pytest.skip() calls across 34 files to use the typed skip_env() wrapper from test.pylib.skip_types. These sites skip at runtime because a required feature, config option, library version, build mode, or runtime topology is not available. Also fixes 'raise pytest.skip(...)' in test_audit.py — skip_env() already raises internally, so the explicit raise was incorrect. Each file gains one new import: from test.pylib.skip_types import skip_env	2026-04-19 11:09:29 +02:00
Artsiom Mishuta	fb0974a329	test: migrate bare @pytest.mark.skip to skip_not_implemented Migrate 2 bare @pytest.mark.skip decorators (no reason string) to @pytest.mark.skip_not_implemented with an explicit reason referencing issue #3882 (COMPACT STORAGE not implemented).	2026-04-19 11:06:30 +02:00
Artsiom Mishuta	a39fb9d29a	test: migrate @pytest.mark.skip to @pytest.mark.skip_slow Migrate 4 @pytest.mark.skip decorator sites to @pytest.mark.skip_slow across 3 test files where the skip reason indicates a slow test.	2026-04-19 11:06:30 +02:00
Artsiom Mishuta	638efedc3c	test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented Migrate 10 @pytest.mark.skip decorator sites to @pytest.mark.skip_not_implemented across 5 test files where the skip reason indicates a feature not yet implemented.	2026-04-19 11:06:30 +02:00
Artsiom Mishuta	465636bc53	test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs Migrate 24 @pytest.mark.skip decorator sites to @pytest.mark.skip_bug across 16 test files where the reason references a known bug or issue.	2026-04-19 11:06:30 +02:00
Botond Dénes	facb50cbf9	Merge 'test.py: refactor test.py' from Andrei Chekun With the latest changes, there are a lot of code that is redundant in the test.py. This PR just cleans this code. Also, it narrows using dynamic scope for fixtures to test/alternator and test/cqlpy. All the rest by default will have module scope. test.py will be a wrapper for pytest mostly for CI use. As for now test.py have important part of calculating the number of threads to start pytest with. This is not possible to do in pytest itself. No backport needed, framework enhancement only. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-666 Closes scylladb/scylladb#28852 * github.com:scylladb/scylladb: test.py: remove testpy_test_fixture_scope test.py: add logger for 3rd party service test.py: delete dead code in test.py	2026-04-17 12:51:14 +03:00
Pawel Pery	7883f161bb	vector-store: fix creating local vector search indexes with a part of the partition key Users ought to have possibility to create the local index for Vector Search based only on a part of the partition key. This commits provides this by removing requirements of 'full partition key only' for custom local index. The commit updates docs to explain that local vector index can use only a part of the partition key. The commit implements cqlpy test to check fixed functionality. Fixes: SCYLLADB-953 Needs to be backported to 2026.1 as it is a fix for local vector indexes. Closes scylladb/scylladb#28931	2026-04-17 11:44:15 +02:00
Andrei Chekun	745debe9ec	test.py: remove testpy_test_fixture_scope With migration to pyest this fixture is useless. Removing and setting the session to the module for the most of the tests. Add dynamic_scope function to support running alternator fixtures in session scope, while Test and TestSuite are not deleted. This is for migration period, later on this function should be deleted.	2026-04-16 22:08:33 +02:00
Nikos Dragazis	d38f44208a	test/cqlpy: Harden mutation_fragments tests against background flushes Several tests in test_select_from_mutation_fragments.py assume that all mutations end up in a single SSTable. This assumption can be violated by background memtable flushes triggered by commitlog disk pressure. Since the Scylla node is taken from a pool, it may carry unflushed data from prior tests that prevents closed segments from being recycled, thereby increasing the commitlog disk usage. A main source of such pressure is keyspace-level flushes from earlier tests in this module, which rotate commitlog segments without flushing system tables (e.g., `system.compaction_history`), leaving closed segments dirty. Additionally, prior tests in the same module may have left unflushed data on the shared test table (`test_table` fixture), keeping commitlog segments dirty on its behalf as well. When commitlog disk usage exceeds its threshold, the system flushes the test table to reclaim those segments, potentially splitting a running test's mutations across multiple SSTables. This was observed in CI, where test_paging failed because its data was split across two SSTables, resulting in more mutation fragments than the hardcoded expected count. This patch fixes the affected tests in two ways: 1. Where possible, tests are reworked to not assume a single SSTable: - test_paging - test_slicing_rows - test_many_partition_scan 2. Where rework is impractical, major compaction is added after writes and before validation to ensure that only one SSTable will exist: - test_smoke - test_count - test_metadata_and_value - test_slicing_range_tombstone_changes Fixes SCYLLADB-1375. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#29389	2026-04-15 21:46:00 +03:00
Nadav Har'El	1eb8d170dd	Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik This series allows creating multiple vector indexes on the same column so users can rebuild an index without losing query availability. The intended flow is: 1. Create a new vector index on a column that already has one. 2. Keep serving ANN queries from the old index while the new one is being built. 3. Verify the new index is ready. 4. Automatically switch to the remaining index. 5. Drop the old index. To make that deterministic, `index_version` is changed from the base table schema version to a real creation timeuuid. When multiple vector indexes exist on the same column, ANN query planning now picks the index according to the routing implemented in Vector Store (newest serving index). This keeps queries on the old index until it the new one is up and ready. This patch also removes the create-time restriction that rejected a second vector index on the same column. Name collisions are still rejected as before. Test coverage is updated accordingly: - Scylla now verifies that two vector indexes can coexist on the same column. - Cassandra/SAI behavior is still covered and is still expected to reject duplicate indexes on the same column. Fixes: VECTOR-610 Closes scylladb/scylladb#29407 * github.com:scylladb/scylladb: docs: document vector index metadata and duplicate handling test/cqlpy: cover vector index duplicate creation rules vector_index: allow multiple named indexes on one column vector_index: store `index_version` as creation timeuuid	2026-04-15 14:40:15 +03:00
Nadav Har'El	986167a416	Merge 'cql3: fix authorization bypass via BATCH prepared cache poisoning' from Marcin Maliszkiewicz execute_batch_without_checking_exception_message() inserted entries into the authorized prepared cache before verifying that check_access() succeeded. A failed BATCH therefore left behind cached 'authorized' entries that later let a direct EXECUTE of the same prepared statement skip the authorization check entirely. Move the cache insertion after the access check so that entries are only cached on success. This matches the pattern already used by do_execute_prepared() for individual EXECUTE requests. Introduced in `98f5e49ea8` Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1221 Backport: all supported versions Closes scylladb/scylladb#29432 * github.com:scylladb/scylladb: test/cqlpy: add reproducer for BATCH prepared auth cache bypass cql3: fix authorization bypass via BATCH prepared cache poisoning	2026-04-14 22:31:54 +03:00
Piotr Dulikowski	9fc2c65d18	Merge 'cql3: implement WRITETIME() and TTL() of individual elements of map, set, and UDT' from Nadav Har'El In commit `727f68e0f5` we added the ability to SELECT: * Individual elements of a map: `SELECT map_col[key]`. * Individual elements of a set: `SELECT set_col[key]` returns key if the key exists in the set, or null if it doesn't, allowing to check if the element exists in the set. * Individual pieces of a UDT: `SELECT udt_col.field`. But at the time, we didn't provide any way to retrieve the meta-data for this value, namely its timestamp and TTL. We did not support `SELECT TIMESTAMP(collection[key])`, or `SELECT TIMESTAMP(udt.field)`. Users requested to support such SELECTs in the past (see issue #15427), and Cassandra 5.0 added support for this feature - for both maps and sets and udts - so we also need this feature for compatibility. This feature was also requested recently by vector-search developers, who wanted to read Alternator columns - stored as map elements, not individual columns - with their WRITETIME information. The first four patches in this series adds the feature (in four smaller patches instead one big one), the fifth and sixth patches add tests (cqlpy and boost tests, respectively). The seventh patch adds documentation. All the new tests pass on Cassandra 5, failed on Scylla before the present fix, and pass with it. The fix was surprisingly difficult. Our existing implementation (from `727f68e0f5` building on earlier machinery) doesn't just "read" `map_col[key]` and allow us to return just its timestamp. Rather, the implementation reads the entire map, serializes it in some temporary format that does not include the timestamps and ttls, and then takes the subscript key, at which point we no longer have the timestamp or ttl of the element. So the fix had to cross all these layers of the implementation. While adding support for UDT fields in a pre-existing grammar nonterminal "subscriptExpr", we unintentionally added support for UDT fields also in LWT expressions (which used this nonterminal). LWT missing support for UDT fields was a long-time known compatibility issue (#13624) so we unintentionally fixed it :-) Actually, to completely fix it we needed another small change in the expression implementation, so the eighth patch in this series does this. Fixes #15427 Fixes #13624 Closes scylladb/scylladb#29134 * github.com:scylladb/scylladb: cql3: support UDT fields in LWT expressions cql3: document WRITETIME() and TTL() for elements of map, set or UDT test/boost: test WRITETIME() and TTL() on map collection elements test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT cql3: prepare and evaluate WRITETIME/TTL on collection elements and UDT fields cql3: parse per-element timestamps/TTLs in the selection layer cql3: add extended wire format for per-element timestamps and TTLs cql3: extend WRITETIME/TTL grammar to accept collection and UDT elements	2026-04-14 12:35:46 +02:00
Dawid Pawlik	800dec2180	test/cqlpy: cover vector index duplicate creation rules Add cqlpy tests for the current CREATE INDEX behavior of vector indexes. Cover named and unnamed duplicates, IF NOT EXISTS, coexistence of multiple named vector indexes on the same column, interactions between named and unnamed indexes, and the same-name-on-different-table case.	2026-04-14 12:21:38 +02:00
Marcin Maliszkiewicz	db5e4f2cb8	test/cqlpy: add reproducer for BATCH prepared auth cache bypass An unprivileged user could bypass authorization checks by exploiting the BATCH prepared statement cache: 1. Prepare an INSERT on a table the user has no access to 2. Execute it inside a BATCH — gets Unauthorized 3. Execute the same prepared INSERT directly — succeeds	2026-04-14 10:37:42 +02:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Nadav Har'El	33dbb63aef	cql3: support UDT fields in LWT expressions In an earlier patch, we used the CQL grammar's "subscriptExpr" in the rule for WRITETIME() and TTL(). But since we also wanted these to support UDT fields (x.a), not just collection subscripts (x[3]), we expanded subscriptExpr to also support the field syntax. But LWT expressions already used this subscriptExpr, which meant that LWT expressions unintentionally gained support for UDT fields. Missing support for UDT fields in LWT is a long-standing known Cassandra-compatibility bug (#13624), and now our grammar finally supports the missing syntax. But supporting the syntax is not enough for correct implementation of this feature - we also need to fix the expression handling: Two bugs prevented expressions like `v.a = 0` from working in LWT IF clauses, where `v` is a column of user-defined type. The first bug was in get_lhs_receiver() in prepare_expr.cc: it lacked a handler for field_selection nodes, causing an "unexpected expression" internal error when preparing a condition like `IF v.a = 0`. The fix adds a handler that returns a column_specification whose type is taken from the prepared field_selection's type field. The second bug was in search_and_replace() in expression.cc: when recursing into a field_selection node it reconstructed it with only `structure` and `field`, silently dropping the `field_idx` and `type` fields that are set during preparation. As a result, any transformation that uses search_and_replace() on a prepared expression containing a field_selection — such as adjust_for_collection_as_maps() called from column_condition_prepare() — would zero out those fields. At evaluation time, type_of() on the field_selection returned a null data_type pointer, causing a segmentation fault when the comparison operator tried to call ->equal() through it. The fix preserves field_idx and type when reconstructing the node. Fixes #13624.	2026-04-12 14:28:01 +03:00
Nadav Har'El	ccb94618cc	test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT This patch adds many tests verifying the behavior of WRITETIME() and TTL() on individual elements of maps, sets and UDTs, serving as a regression test for issue #15427. We also add tests verifying our understanding of related issues like WRITETIME() and TTL() of entire collections and of individual elements of frozen collections. All new tests pass on Cassandra 5.0, helping to verify that our implementation is compatible with Cassandra. They also pass on ScyllaDB after the previous patch (most didn't before that patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-12 14:27:40 +03:00
Dawid Pawlik	2dd8eef38c	vector_index: store `index_version` as creation timeuuid Vector indexes currently store the base table schema version in `index_version`. That value is name-based, not time-based, so it does not represent when the index was created. Store a timeuuid instead and change the relevant interfaces from `table_schema_version` to `utils::UUID`. This is a prerequisite for supporting multiple vector indexes on the same column where the oldest index must be selected deterministically via routing implemented in Vector Store. Update the cqlpy tests to check the new semantics directly: recreating the index changes `index_version`, while ALTER TABLE does not.	2026-04-10 13:05:21 +02:00

1 2 3 4 5 ...

412 Commits