scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 10:00:35 +00:00

Author	SHA1	Message	Date
Avi Kivity	620df7103f	cql3: statement_restrictions: do not pass view schema back and forth For indexed queries, statement_restrictions calculates _view_schema, which is passed via get_view_schema() to indexed_select_statement(), which passes it right back to statement_restrictions via one of three functions to calculate clustering ranges. Avoid the back-and-forth and use the stored value. Using a different value would be broken. This change allows unifying the signatures of the four functions that get clustering ranges.	2026-04-19 20:57:03 +03:00
Avi Kivity	6fce090e30	cql3: statement_restrictions: pre-analyze token range restrictions Convert token range restrictions to the predicate format we introduced earlier, where we have a function to solve for the token range rather than running the analysis at runtime. Again the truth is that the function will delegate to possible_partition_token_values() which actually will do the analysis at runtime, but it's one step closer. We add a new variant element for predicate::on, since it doesn't fit the existing element (the token isn't a column).	2026-04-19 20:57:03 +03:00
Avi Kivity	941011bb4a	cql3: statement_restrictions: pre-analyze partition key columns The expression tree for partition keys is analyzed during runtime: in partition_range_from_singles() (for example), we call find_binop and get_subscripted_column() to understand the expression structure. This analysis is problematic because it has to match the analysis during prepare time; and they have to evolve in lock step. Here, we move the analysis to the prepare stage. This is done by augmenting the expression into a new predicate struct. It contains the original expression (as a fallback for paths not yet converted), as well as a solve_for function which contains a function built at prepare time that embeds all the necessary analysis. We introduce the `predicate` type which is an augmentation of boolean expressions. In addition to the expression, we remember what column the expression is on, and a function that computes what values the column can take on that would make the expression true. The field that says what column the predicate is about is typed as a variant since later on we will have predicates on non-columns (the token, or a clustering prefix). Note that currently the function engages in some run-time analysis of its own, since it calls possible_lhs_values that itself does analysis, but this is a step in the right direction.	2026-04-19 20:57:03 +03:00
Avi Kivity	c73f3ac55f	cql3: statement_restrictions: do not collect subscripted partition key columns An indexed SELECT of the from SELECT ... WHERE pk['sub'] = ? is impossible because our indexes do not support frozen maps, and partition key collections must be frozen. Stop collecting such constructs for the purpose of determining the partition range. This reduces having to deal with combinations of restrictions on the column and its entries later on. In case we start supporting indexes on frozen maps, leave an on_internal_error to remind us.	2026-04-19 20:57:03 +03:00
Avi Kivity	531f137ed3	cql3: statement_restrictions: split _partition_range_restrictions into three cases _partition_range_restrictions are a vector of expressions, one per partition key column, except that it can be empty if there is no restriction on the partition that can be translated to a read command, and if the restriction is on a token range, the first element only is used. Separate the three cases into distinct structs. After this, additional work can be done utilizing the specialization.	2026-04-19 20:57:03 +03:00
Avi Kivity	fcf7c4c90d	cql3: statement_restrictions: move value_list, value_set to header file They don't really need to be public, but will be used in intermediate storage.	2026-04-19 20:57:03 +03:00
Avi Kivity	926886fcfb	cql3: statement_restrictions: wrap get_partition_key_ranges statement_restrictions::get_partition_key_ranges() re-interprets the expressions used to specify the partition key. This means that the analysis phase (determining what those expressions are and how they are to be used) and the execution phase (using them) are in separate places. This makes it very hard to refactor while preserving correctness. As a first step in unifying the two phases, we move the selection of the strategy (using token, cartesian product, or single partition) from execution to analysis, by making the if-tree return a function to be executed at execution time, rather than running the if-tree itself at execution time.	2026-04-19 20:57:03 +03:00
Avi Kivity	eec0b20dbc	cql3: statement_restrictions: prepare statement_restrictions for capturing `this` Prevent copying/moving, that can change the address, and instead enforce using shared_ptr. Most of the code is already using shared_ptr, so the changes aren't very large. To forbid non-shared_ptr construction, the constructors are annotated with a private_tag tag class.	2026-04-19 20:57:03 +03:00
Avi Kivity	374be94faa	test: statement_restrictions: add index_selection regression test In preparation for refactoring statement_restrictions, add a simple and an exhaustive regression test, encoding the index selection algorithm into the test. We cannot change the index selection algorithm because then mixed-node clusters will alter the sorting key mid-query (if paging takes place). Because the exhaustive space has such a large stack frame, and because Address Santizer bloats the stack frame, increase it for debug builds.	2026-04-19 20:57:01 +03:00
Botond Dénes	6ce0968960	compaction: release GC'ed sstables incrementally during compaction Garbage collected sstables created during incremental compaction are deleted only at the end of the compaction, which increases the memory footprint. This is inefficient, especially considering that the related input sstables are released regularly during compaction. This commit implements incremental release of GC sstables after each output sstable is sealed. Unlike regular input sstables, GC sstables use a different exhaustion predicate: a GC sstable is only released when its token range no longer overlaps with any remaining input sstable. This is because GC sstables hold tombstones that may shadow data in still-alive overlapping input sstables; releasing them prematurely would cause data resurrection. Fixes #5563 Closes scylladb/scylladb#28984	2026-04-17 18:20:47 +03:00
Botond Dénes	6eb2d15f39	Merge 'Replace CAS estimated histogram with estimated_histogram_with_max' from Amnon Heiman ScyllaDB uses estimated_histogram in many places. We already have a more efficient alternative: estimated_histogram_with_max. It is both CPU- and memory-efficient, and it can be exported as Prometheus native histograms. Its main limitation (which also has benefits) is that the bucket layout is fixed at compile time, so histograms with different configurations cannot be mixed. The end goal is to replace all uses of estimated_histogram in the codebase. That migration requires a few small API adjustments, so it is done in steps. This PR replaces estimated_histogram for CAS contention. The PR includes a patch that adds functionality to the base approx_exponential_histogram, which will be used by the API. The specific histograms are defined in a single place and cover the range 1-100; this makes future changes easy. New feature, no need to backport Closes scylladb/scylladb#29017 * github.com:scylladb/scylladb: storage_proxy: migrate CAS contention histograms to estimated_histogram_with_max estimated_histogram.hh: Add bucket offset and count to approx_exponential_histogram	2026-04-17 13:12:59 +03:00
Andrzej Jackowski	e256d9f69d	test: retry get_coordinator_host() after topology coordinator stop After stopping the topology coordinator, a new topology coordinator may not yet be started when get_coordinator_host() is called. Make the function always retry via wait_for so that every caller is protected against this race. Fixes SCYLLADB-1553 Closes scylladb/scylladb#29489	2026-04-17 12:08:26 +02:00
Botond Dénes	fbcfe3f88f	test: use uuid4 for DockerizedServer container names to avoid collisions Container names were generated as {name}-{pid}-{counter}, where the counter is a per-process itertools.count. This scheme breaks across CI runs on the same host: if a prior job was killed abruptly (SIGKILL, cancellation) its containers are left running since --rm only removes containers on exit. A subsequent run whose worker inherits the same PID (common in containerized CI with small PID namespaces) and reaches the same counter value will collide with the orphaned container. Replace pid+counter with uuid.uuid4(), which generates a random UUID, making names unique across processes, hosts, and time without any shared state or leaking host identifiers. Fixes: SCYLLADB-1540 Closes scylladb/scylladb#29509	2026-04-17 11:56:51 +02:00
Botond Dénes	57f8be49e9	Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov The PR serves two purposes. First, it makes the flag usage be consistent across multiple ways to load sstables components. For example, the sstable::load_metadata() doesn't set it (like .load() does) thus potentially refusing to load "corrupted" components, as the flag assumes. Second, it removes the fanout of db.get_config().ignore_component_digest_mismatch() over the code. This thing is called pretty much everywhere to initialize the sstable_open_config, while the option in question is "scylla state" parameter, not "sstable opening" one. Code cleanup, not backporting Closes scylladb/scylladb#29513 * github.com:scylladb/scylladb: sstables: Remove ignore_component_digest_mismatch from sstable_open_config sstables: Move ignore_component_digest_mismatch initialization to constructor sstables: Add ignore_component_digest_mismatch to sstables_manager config	2026-04-17 12:54:17 +03:00
Avi Kivity	cad3c0de94	test: write minio log to testlog dir for Jenkins artifact collection Write the MinIO server log directly to tempdir_base (testlog/<arch>/) instead of the per-server temp directory that gets destroyed on shutdown. This preserves the log for Jenkins artifact collection, helping debug S3-related flaky test failures like the stcs_reshape_overlapping_s3_test hang (SCYLLADB-1481). Closes scylladb/scylladb#29458	2026-04-17 12:51:55 +03:00
Botond Dénes	facb50cbf9	Merge 'test.py: refactor test.py' from Andrei Chekun With the latest changes, there are a lot of code that is redundant in the test.py. This PR just cleans this code. Also, it narrows using dynamic scope for fixtures to test/alternator and test/cqlpy. All the rest by default will have module scope. test.py will be a wrapper for pytest mostly for CI use. As for now test.py have important part of calculating the number of threads to start pytest with. This is not possible to do in pytest itself. No backport needed, framework enhancement only. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-666 Closes scylladb/scylladb#28852 * github.com:scylladb/scylladb: test.py: remove testpy_test_fixture_scope test.py: add logger for 3rd party service test.py: delete dead code in test.py	2026-04-17 12:51:14 +03:00
Pawel Pery	7883f161bb	vector-store: fix creating local vector search indexes with a part of the partition key Users ought to have possibility to create the local index for Vector Search based only on a part of the partition key. This commits provides this by removing requirements of 'full partition key only' for custom local index. The commit updates docs to explain that local vector index can use only a part of the partition key. The commit implements cqlpy test to check fixed functionality. Fixes: SCYLLADB-953 Needs to be backported to 2026.1 as it is a fix for local vector indexes. Closes scylladb/scylladb#28931	2026-04-17 11:44:15 +02:00
Karol Nowacki	c643f321af	vector_search: decrease default connection timeout to 3s Decrease the default connection timeout to 3s to better align with the default CQL query timeout of 10s. The previous timeout allowed only one failover request in high availability scenario before hitting the CQL query timeout. By decreasing the timeout to 3s, we can perform up to three failover requests within the CQL query timeout, which significantly improves the chances of successfully completing the query in high availability scenarios. Fixes: SCYLLADB-95	2026-04-17 12:26:39 +03:00
Karol Nowacki	9269ca9cf7	vector_search: add unreachable node detection time config Add option `vector_store_unreachable_node_detection_time_in_ms` to control parameters related to detecting unreachable vector store nodes. This parameter is used to set the TCP connect timeout, keepalive parameters, and TCP_USER_TIMEOUT. By configuring these parameters, we can detect unreachable vector store nodes faster and trigger failover mechanisms in a timely manner.	2026-04-17 12:26:38 +03:00
Piotr Smaron	686029f52c	audit: disable caching for the audit log table The audit table had caching enabled by default, which provides no value since audit data is write-heavy and rarely read back through the cache. This wastes cache space that could be used for more important user data. Disable caching by setting keys and rows_per_partition to NONE and enabled to false, consistent with get_disabled_caching_options() and other system tables such as system.batchlog, system.large_partitions, and CDC log tables. Closes scylladb/scylladb#29506	2026-04-17 11:17:10 +02:00
Piotr Dulikowski	37fc1507f0	Merge 'Alternator: Add vector search support' from Nadav Har'El This series adds support for vector search in Alternator based on the existing implementation in CQL. The series adds APIs for `CreateTable` and `UpdateTable` to add or remove vector indexes to Alternator tables, `DescribeTable` to list them and check the indexing status, and `Query` to perform a vector search - which contacts the vector store for the actual ANN (approximate nearest neighbor) search. Correct functionality of these features depend on some features of the the vector store, that were already done (see https://github.com/scylladb/vector-store/pull/394). This initial implementation is fully functional, and can already be useful, but we do not yet support all the features we hope to eventually support. Here are things that we have not done yet, and plan to do later in follow-up pull requests: 1. Support a new optimized vector type ("V") - in addition to the "list of numbers" type supported in this version. 2. Allow choosing a different similarity function when creating an index, by SimilarityFunction in VectorIndex definition. 3. Allow choosing quantization (f32/f16/bf16/i8/b1) to ask the vector index to compress stored vectors. 4. Support oversampling and rescoring, defined per-index and per-query. 5. Support HNSW tuning parameters — maximum_node_connections, construction_beam_width, search_beam_width. 6. Support pre-filtering over key columns, which are available at the vector store, by sending the filter to the vector store (translated from DynamoDB filter syntax to the vector's store's filter syntax). A decision still need to be made if this will use KeyConditionExpression or FilterExpression. This version supports only post-filtering (with `FilterExpression`). 7. Support projecting non-key attributes into the index (Projection=INCLUDE and Projection=ALL), and then 1. pre-filtering using these attributes, and 2. efficiently return these attributes (using Select=ALL_PROJECTED_ATTRIBUTES, which today returns just the key columns). 8. Optimize the performance of `Query`, which today is inefficient for Select=ALL_ATTRIBUTES because it serially retrieves the matching items one at a time. 9. Returning the similarity scores with the items (the design proposes ReturnVectorSearchSimilarity). 10. Add more vector-search-specific metrics, beyond the metric we already have counting Query requests. For example separate latency and request-count metrics for vector-search Queries (distinct from GSI/LSI queries), and a metric accumulating the total Limit (K) across all vector search queries. 11. Consider how (and if at all) we want to run the tests in test/alternator/test_vector.py that need the vector store in the CI. Currently they are skipped in CI and only run manually (with `test/alternator/run --vs test_vector`). 12. UpdateTable 'Update' operation to modify index parameters. Only some can be modified, e.g., Oversampling. 13. Support for "local index" (separate index for each partition). 14. Make sure that vector search and Streams can be enabled concurrently on the same table - both need CDC but we need to verify that one doesn't confuse the other or disables options that the other needs. We can only do this after we have Alternator Streams running on tablets (since vector store requires tablets). Testing the new Alternator vector search end-to-end requires running both Scylla and the vector store together. We will have such end-to-end tests in the vector store repository (see https://github.com/scylladb/vector-store/pull/392), but we also add in this pull request many end-to-end tests written in Python, that can be run with the command "test/alternator/run --vs test_vector.py". The "--vs" option tells the run script to run both Scylla and the vector store (currently assumed to be in `.../vector-store/target/release/vector-store`). About 65% of the tests in this pull request check supported syntax and error paths so can run without the vector store, while about 35% of the tests do perform actual Query operations and require the vector store to be running. Currently, the tests that do require the vector store will not get run by CI, but can be easily re-run manually with `test/alternator/run --vs test_vector.py`. In total, this series includes 78 functional tests in 2200 lines of Python code. This series also includes documentation for the new Alternator feature and the new APIs introduced. You can see a more detailed design document here: https://docs.google.com/document/d/1cxLI7n-AgV5hhH1DTyU_Es8_f-t8Acql-1f58eQjZLY/edit Two patches in this series split the huge alternator/executor.cc, after this series continued to grow it and it reached a whoppng 7,000 lines. These patches are just reorganization of code, no functional changes. But it's time that we finally do this (Refs #5783), we can't just continue to grow executor.cc with no end... Closes scylladb/scylladb#29046 * github.com:scylladb/scylladb: test/alternator: add option to "run" script to run with vector search alternator: document vector search test/alternator: fix retries in new_dynamodb_session test/alternator: test for allowed characters in attribute names test/alternator: tests for vector index support alternator, vector: add validation of non-finite numbers in Query alternator: Query: improve error message when VectorSearch is missing alternator: add per-table metrics for vector query alternator: clean up duplicated code alternator: fix default Select of Query alternator: split executor.cc even more alternator: split alternator/executor.cc alternator: validate vector index attribute values on write alternator: DescribeTable for vector index: add IndexStatus and Backfilling alternator: implement Query with a vector index alternator: fix bug in describe_multi_item() alternator: prevent adding GSI conflicting with a vector index alternator: implement UpdateTable with a vector index alternator: implement DescribeTable with a vector index alternator: implement CreateTable with a vector index alternator: reject empty attribute names cdc: fix on_pre_create_column_families to create CDC log for vector search	2026-04-17 10:25:45 +02:00
Avi Kivity	04b54f363b	Merge 'Enable vnodes-to-tablets migrations with arbitrary tokens' from Nikos Dragazis This PR removes the power-of-two token constraint from vnodes-to-tablets migrations, allowing clusters with randomly generated tokens to migrate without manual token reassignment. Previously, migrations required vnode tokens to be a power of two and aligned. In practice, these conditions are not met with Scylla's default random token assignment, so the constraint is a blocker for real-world use. With the introduction of arbitrary tablet boundaries in PR #28459, the tablet layer can now support arbitrary tablet boundaries. This PR builds on that capability to allow arbitrary vnode tokens during migration. When the highest vnode token does not coincide with the end of the token ring, the vnode wraps around, but tablets do not support that. This is handled by splitting it into two tablets: one covering the tail end of the ring and one covering the beginning. Testing has been updated accordingly: existing cluster tests now use randomly generated tokens instead of precomputed power-of-two values, and a new Boost test validates the wrap-around tablet boundary logic. Fixes SCYLLADB-724. New feature, no backport is needed. Closes scylladb/scylladb#29319 * github.com:scylladb/scylladb: test: Use arbitrary tokens in vnodes->tablets migration tests test: boost: Add test for wrap-around vnodes storage_service: Support vnodes->tablets migrations w/ arbitrary tokens storage_service: Hoist migration precondition	2026-04-17 00:46:35 +03:00
Andrei Chekun	745debe9ec	test.py: remove testpy_test_fixture_scope With migration to pyest this fixture is useless. Removing and setting the session to the module for the most of the tests. Add dynamic_scope function to support running alternator fixtures in session scope, while Test and TestSuite are not deleted. This is for migration period, later on this function should be deleted.	2026-04-16 22:08:33 +02:00
Andrei Chekun	21addb2173	test.py: add logger for 3rd party service With migration of preparation environment and starting 3rd party services to the pytest, they're output the logs to the terminal. So this PR binds them their own log file to avoid polluting the terminal.	2026-04-16 22:08:33 +02:00
Andrei Chekun	13770ab394	test.py: delete dead code in test.py With the latest changes, there are a lot of code that is redundant in the test.py. This PR just cleans this code. Changes in other files are related to cleaning code from the test.py, especially with redundant parameter --test-py-init and moving prepare_environment to pytest itself.	2026-04-16 22:08:31 +02:00
Avi Kivity	999e108139	Merge 'test: lib: fix broken retry in start_docker_service' from Dario Mirovic The retry loop in `start_docker_service` passes the parse callbacks via `std::move` into `create_handler` on each iteration. After the first iteration, the moved-from `std::function` objects are empty. All subsequent retries skip output parsing entirely and immediately treat the service as successfully started. This defeats the entire purpose of the retry mechanism. Fix by passing the callbacks by copy instead of move, so the original callbacks remain valid across retries. Fixes SCYLLADB-1542 This is a CI stability issue and should be backported. Closes scylladb/scylladb#29504 * github.com:scylladb/scylladb: test/lib: fix typos in proc_utils, gcs_fixture, and dockerized_service test: gcs_fixture: rename container from "local-kms" to "fake-gcs-server" test: fix proc_utils.cc formatting from previous commit test: lib: use unique container name per retry attempt test: lib: fix broken retry in start_docker_service	2026-04-16 21:48:25 +03:00
Radosław Cybulski	c5ed6b22ae	alternator: add CHILD_SHARDS filtering Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user need to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only list of stream shards, that are direct descendants of passed parent stream shard. Each stream shard cover a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is a Amazon's functionality and is required by KCL. Add unit tests. Fixes: #25160 Closes scylladb/scylladb#28189	2026-04-16 18:27:55 +03:00
Andrei Chekun	ba04e1e2c3	codeowners: add owner for the test framework Add @xtrey as a codeowner of the test framework Closes scylladb/scylladb#29518	2026-04-16 17:57:21 +03:00
Piotr Szymaniak	d0c3f78d76	test/alternator: extend local TTL streams timeout Increase the non-AWS wait in the TTL streams test to reduce vnode CI flakes caused by delayed expiration visibility. Fixes SCYLLADB-1556 Closes scylladb/scylladb#29516	2026-04-16 15:53:35 +03:00
copilot-swe-agent[bot]	ec7450bff8	topology_coordinator, tablets: Log active tablet transitions when going idle This will make debugging of stalled tablet transitions easier. We saw several issues when topology state machine was blocked by active tablet migrations, which was not obvious at first glance of the logs. Now it will be east to tell if tablet transitions are blocking progress and which transitions are stuck. Closes scylladb/scylladb#28616	2026-04-16 14:34:37 +03:00
Benny Halevy	05a00fe140	compaction_manager: fix use-after-free in postponed_compactions_reevaluation() drain() signals the postponed_reevaluation condition variable to terminate the postponed_compactions_reevaluation() coroutine but does not await its completion. When enable() is called afterwards, it overwrites _waiting_reevalution with a new coroutine, orphaning the old one. During shutdown, really_do_stop() only awaits the latest coroutine via _waiting_reevalution, leaving the orphaned coroutine still alive. After sharded::stop() destroys the compaction_manager, the orphaned coroutine resumes and reads freed memory (is_disabled() accesses _state). Fix by introducing stop_postponed_compactions(), awaiting the reevaluation coroutine in both drain() and stop() after signaling it, if postponed_compactions_reevaluation() is running. It uses an std::optional<future<>> for _waiting_reevalution and std::exchange to leave _waiting_reevalution disengaged when postponed_compactions_reevaluation() is not running. This prevents a race between drain() and stop(). While at it, fix typo in _waiting_reevalution -> _waiting_reevaluation. Fixes: SCYLLADB-1463 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#29443	2026-04-16 14:33:31 +03:00
Nadav Har'El	d3d5db37d7	test/alternator: add option to "run" script to run with vector search Add to test/alternator/run the option "-vs" which runs alongside with Scylla a vector store, to allow running Alternator tests with vector indexing. To get the vector store, do git clone git@github.com:scylladb/vector-store.git cargo build --release "run -vs" looks for an executable in ../vector-store/target/*/vector-store but can also be overridden by the VECTOR_STORE environment variable. test/alternator/run runs the vector store exactly like it runs Scylla - in a temporary directory, on a temporary IP address in the localhost subnet (127.0.0/8), killing it when the test end, and showing the output of both programs (Scylla and vector store). These transient runs of Scylla and vector store are configured to be able to communicate to each other. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:18 +03:00
Nadav Har'El	3d8463ccd2	alternator: document vector search This patch adds a new document, docs/alternator/vector-search.md, on the new vector search feature in Alternator. It introduces this feature, and the DynamoDB APIs that we extended to support it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	164b0e37e1	test/alternator: fix retries in new_dynamodb_session The new_dynamodb_session() function had a bug which we never noticed because we hardly used it, but it became more noticable when the new test/alternator/test_vector.py started to use it: By default, boto3 retries a request up to 9 times when it encounters a retriable error (such as an Internal Server Error). We don't want such retries in our tests - it makes failures slower, but more importantly it can hide "flaky" bugs by retrying 9 times until it happens to succeed. The new_dynamodb_session() had code (copied from the dynamodb fixture) to set boto3's "max_attempts" configuration to 0, to disable this retry. But this code had an incorrect "if" to only be done if we're testing on "localhost". This is wrong: We almost never use "localhost" as the target of the test; Both test/cqlpy/run and test.py pick an IP address in the localhost subnet (127/8) and uses that IP address - not the string "localhost". This bug only existed in new_dynamodb_session() - the more commonly used "dynamodb" fixture didn't have this bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	858dee0b30	test/alternator: test for allowed characters in attribute names One of the tests in the previous patch checked that strange characters are allowed in attribute names used for vector indexing. It turns out we never had a test that verifies that regardless of vector indexes - any character whatsoever is allowed in attribute names. This is different from table names which are much more limited. So this patch adds the missing test. As usual, the new test also passes on DynamoDB, showing that these stange characters in attribute names are also allowed by DynamoDB.	2026-04-16 14:30:17 +03:00
Nadav Har'El	58538e18e8	test/alternator: tests for vector index support In this patch we add a large collection of basic functional tests for the vector index support, covering the CreateTable, UpdateTable, DescribeTable and Query operations and the various ways in which those are allowed to work - or expected to fail. These tests were written in parallel with writing the code so they (hopefully) cover all the corner cases considered during development, and make sure these corner cases are all handled correctly and will not regress in the future. Some of these tests do not involve querying of the index and focus on the structure of requests and the kind of syntax allowed. But other tests are end-to-end, requiring the vector store to be running and trying to index Alternator data and query it. These tests are marked "needs_vector_store", and are immediately skipped in Scylla is not configured to connect to a vector store. In a later patch we'll add a an option to test/alternator/run to be able to run these end-to-end tests by automatically running both Scylla and the Vector Store. We'll have additional end-to-end tests in the vector-store repository. Note that vector search is a new API feature that doesn't exist in DynamoDB, so we are adding new parameters and outputs to existing operations. The AWS SDKs don't normally allow doing that, so the test added here begins by teaching the Python SDK to use the new APIs we added. This piece of code can also be used by end-users to use vector search (at least in Python...) before we officially add this support to ScyllaDB's SDK wrappers.	2026-04-16 14:30:17 +03:00
Nadav Har'El	fe5a5a813f	alternator, vector: add validation of non-finite numbers in Query Non-finite numbers (Inf, NaN) don't make sense in vector search, and also not allowed in the DynamoDB API as numbers. But the parsing code in Query's QueryVector accepted "Inf" and "NaN" and then failed to send the request to the vector store, resulting in a strange error message. Let's fix it in the parsing code. We have a test (test_query_vectorsearch_queryvector_bad_number_string) that verifies this fix. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	aa070fae5b	alternator: Query: improve error message when VectorSearch is missing Before this patch, if we attempt a Query with IndexName is a vector index but forget a "VectorSearch" parameter, the error is misleading: The code expects a GSI or LSI, and when it can't find a GSI or LSI with that name, it reports that the index is missing. But this is not helpful. So in this patch we produce a more helpful message: That the index does exist, and is a vector index, so a "VectorSearch" parameter is mandatory and is missing.	2026-04-16 14:30:16 +03:00
Nadav Har'El	f932f94422	alternator: add per-table metrics for vector query The per-table metrics for Query were not incremented for the vector variant of the Query operations, only the global metrics were incremented. This patch fixes this oversight, and add a test that reproduces it (the new test fails before this patch, and passes after).	2026-04-16 14:30:16 +03:00
Nadav Har'El	8cf510e06c	alternator: clean up duplicated code De-duplicate some code introduced in earlier patches, such a two nearly-identical loops over the indexes (one to check if there is a vector index, the second to get its dimensions), and two nearly- identical chunks of code to get the item contents when there is or there isn't a clustering key. There should be no functional changes in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	f15c6634a7	alternator: fix default Select of Query In earlier patches, when Query'ing a vector index, we set the default Select to ALL_ATTRIBUTES. However, according to the DynamoDB documentation for Query, "If neither Select nor ProjectionExpression are specified, DynamoDB defaults to ALL_ATTRIBUTES when accessing a table, and ALL_PROJECTED_ATTRIBUTES when accessing an index." This default should also apply to vector index, so this patch fixes this. The new behavior is not only more compatible with DynamoDB, it is also much more efficient by default, as ALL_PROJECTED_ATTRIBUTES does not need to read from the base table - it returns the results that the vector store returned. Of course, if the user needs the more efficient ALL_ATTRIBUTES this option is still available - it's just no longer the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	2e274bbdba	alternator: split executor.cc even more This patch continues the effort to split the huge executor.cc (5000 lines before this patch) even more. In this patch we introduce a new source file, executor_util.cc, for various utility functions that are used for many different operations and therefore are useful to have in a header file. These utility functions will now be in executor_util.cc and executor_util.hh - instead of executor.cc and executor.hh. Various source files, including executor.cc, the executor_read.cc introduced in the previous patch, as well as older source files like as streams.cc, ttl.cc and serialization.cc, use the new header file. This patch removes over 700 lines of code from executor.cc, and also removes a large amount of utility functions declerations from executor.hh. Originally, executor.hh was meant to be about the interface that the Alternator server needs to execute the different DynamoDB API operations - and after this patch it returns closer to this original goal. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	751da00692	alternator: split alternator/executor.cc Already six years ago, in #5783, we noticed that alternator/executor.cc has grown too large. The previous patches added hundreds of more lines to it to implement vector search, and it reached a whopping 7,000 lines of code. This is too much. This patch splits from executor.cc two major chunks: 1. The implementation of read requests - GetItem, BatchGetItem, Query (base table, GSI/LSI, and vector-search), and Scan - was moved to a new source file alternator/executor_read.cc. The new file has 2,000 lines. 2. Moved 250 lines of template functions dealing with attribute paths and maps of them to a new header file, attribute_path.hh. These utilities are used for many different operations - various read operations use them for ProjectionExpression, and UpdateItem uses them for modifications to nested attributes, so we need the new header file from both executor.cc and executor_read.cc The remaining executor.cc is still pretty big, 5,000 lines, and contains write operations (PutItem, UpdateItem, DeleteItem, BatchWriteItem) as well as various table and other operations, and also many utility functions used by many types of operations, so we can later continue this refactoring effort. Refs #5783 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:10 +03:00
Emil Maskovsky	91df3795fc	encryption: cover system.raft table in system_info_encryption Extend system_info_encryption to encrypt system.raft SSTables. system.raft contains the Raft log, which may hold sensitive user data (e.g. batched mutations), so it warrants the same treatment as system.batchlog and system.paxos. During upgrade, existing unencrypted system.raft SSTables remain readable. Existing data is rewritten encrypted via compaction, or immediately via nodetool upgradesstables -a. Update the operator-facing system_info_encryption description to mention system.raft and add a focused test that verifies the schema extension is present on system.raft. Fixes: CUSTOMER-268 Backport: 2026.1 - closes an encryption-at-rest coverage gap: system.raft may persist sensitive user-originated data unencrypted; backport to the current LTS. Closes scylladb/scylladb#29242	2026-04-16 13:22:10 +02:00
Botond Dénes	d006c4c476	Merge 'Untie (partially) cql3/statements from db::config' from Pavel Emelyanov There's a bunch of db::config options that are used by cql3/statements/ code. For that they use data_dictionary/database as a proxy to get db::config reference. This PR moves most of these accessed options onto cql_config Options migrated to cql_config: 1. select_internal_page_size 2. strict_allow_filtering 3. enable_parallelized_aggregation 4. batch_size_warn_threshold_in_kb 5. batch_size_fail_threshold_in_kb 6. 7 keyspace replication restriction options 7. 2 TWCS restriction options 8. restrict_future_timestamp 9. strict_is_not_null_in_views (with view_restrictions struct) 10. enable_create_table_with_compact_storage Some options need special treatment and are still abused via database, namely: 1. enable_logstor 2. cluster_name 3. partitioner 4. endpoint_snitch Fixing components inter-dependencies, not backporting Closes scylladb/scylladb#29424 * github.com:scylladb/scylladb: cql3: Move enable_create_table_with_compact_storage to cql_config cql3: Move strict_is_not_null_in_views to cql_config cql3: Move restrict_future_timestamp to cql_config cql3: Move TWCS restriction options to cql_config cql3: Move keyspace restriction options to cql_config cql3: Move batch_size_fail_threshold_in_kb to cql_config cql3: Move batch_size_warn_threshold_in_kb to cql_config cql3: Move enable_parallelized_aggregation to cql_config cql3: Move strict_allow_filtering to cql_config cql3: Move select_internal_page_size to cql_config test: Fix cql_test_env to use updateable cql_config from db::config cql3: Add cql_config parameter to parsed_statement::prepare()	2026-04-16 14:04:43 +03:00
Botond Dénes	88a8324e68	erge 'db: store large data records in SSTable metadata and serve via virtual tables' from Benny Halevy `system.large_partitions`, `system.large_rows`, and `system.large_cells` store records keyed by SSTable name. When SSTables are migrated between shards or nodes (resharding, streaming, decommission), the records are lost because the destination never writes entries for the migrated SSTables. This patch series moves the source of truth for large data records into the SSTable's scylla metadata component (new `LargeDataRecords` tag 13) and reimplements the three `system.large_` tables as virtual tables that query live SSTables on demand. A cluster feature flag (`LARGE_DATA_VIRTUAL_TABLES`) gates the transition for safe rolling upgrades. When the cluster feature is enabled, each node drops the old system large_ tables and starts serving the corresponding tables using virtual tables that represent the large data records now stored on the sstables. Note that the virtual tables will be empty after upgrade until the sstables that contained large data are rewritten, therefore it is recommended to run upgrade sstables compaction or major compaction to repopulate the sstables scylla-metadata with large data records. 1. keys: move key_to_str() to keys/keys.hh — make the helper reusable across large_data_handler, virtual tables, and scylla-sstable 2. sstables: add LargeDataRecords metadata type (tag 13) — new struct with binary-serialized key fields, scylla-sstable JSON support, format documentation 3. large_data_handler: rename partition_above_threshold to above_threshold_result — generalize the struct for reuse 4. large_data_handler: return above_threshold_result from maybe_record_large_cells — separate booleans for cell size vs collection elements thresholds 5. sstables: populate LargeDataRecords from writer — bounded min-heaps (one per large_data_type), configurable top-N via `compaction_large_data_records_per_sstable` 6. test: add LargeDataRecords round-trip unit tests — verify write/read, top-N bounding, below-threshold behavior 7. db: call initialize_virtual_tables from shard 0 only — preparatory refactoring to enable cross-shard coordination 8. db: implement large_data virtual tables with feature flag gating — three virtual table classes, feature flag activation, legacy SSTable fallback, dual-threshold dedup, cross-shard collection Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1276 * Although this fixes a bug where large data entries are effectively lost when sstables are renamed or migrated, the changes are intrusive and do not warrant a backport Closes scylladb/scylladb#29257 * github.com:scylladb/scylladb: db: implement large_data virtual tables with feature flag gating db: call initialize_virtual_tables from shard 0 only test: add LargeDataRecords round-trip unit tests sstables: populate LargeDataRecords from writer large_data_handler: return above_threshold_result from maybe_record_large_cells large_data_handler: rename partition_above_threshold to above_threshold_result sstables: add LargeDataRecords metadata type (tag 13) sstables: add fmt::formatter for large_data_type keys: move key_to_str() to keys/keys.hh	2026-04-16 14:03:31 +03:00
Pavel Emelyanov	4d352c7cf5	sstables: Remove ignore_component_digest_mismatch from sstable_open_config The ignore_component_digest_mismatch flag is now initialized at sstable construction time from sstables_manager::config (which is populated from db::config at boot time). Remove the flag from sstable_open_config struct and all call sites that were setting it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:49:14 +03:00
Pavel Emelyanov	9107e055b3	sstables: Move ignore_component_digest_mismatch initialization to constructor Initialize the ignore_component_digest_mismatch flag from sstables_manager::config in the sstable constructor initializer list instead of in load(). This ensures the flag value is set at construction time when the manager config is available, rather than at load time. Mark the member const to reflect its immutability after construction. Fixes the bootstrap path which now correctly reads the flag from manager config initialized from db::config at boot time, instead of using the default value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:49:00 +03:00
Pavel Emelyanov	8abfd9af00	sstables: Add ignore_component_digest_mismatch to sstables_manager config Copy the ignore_component_digest_mismatch flag from db::config to sstables_manager::config during database initialization. This makes the flag available early in the boot process, before SSTables are loaded, enabling later commits to move the flag initialization from load-time to construction-time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:48:49 +03:00
Nadav Har'El	83670d2493	alternator: validate vector index attribute values on write When a table has a vector index, writes to the indexed attribute (via PutItem, UpdateItem, or BatchWriteItem) must supply a value that is a vector of the appropriate length: It must be a list of exactly the declared number of elements, where each element is a numeric type ("N") representable as a 32-bit float. Before this patch, invalid values were silently accepted and the item was simply not indexed (it was skipped by the vector store when it read this item). Now these writes are rejected with a ValidationException. This is analogous to the existing validation of GSI/LSI key attribute values - in DynamoDB after a certain attribute becomes the key of a GSI or LSI, the user is no longer allowed to write the same type. The implementation we add here is also analogous to the implementation of the GSI/LSI key validation. The GSI/LSI key validation is done by validate_value_if_index_key / si_key_attributes, and in this patch we add the vector-index parallels: vector_index_attributes() collects the attribute name and declared dimensions for every vector index in the schema, and validate_value_if_vector_index_attribute() enforces the type limitations. For efficiency in the common case where a table has no vector indexes and no GSIs/LSIs, both validation functions are out-of-line and each call site guards the call with an explicit empty() check, so no function-call overhead is incurred when there is nothing to validate. For UpdateItem, the map of vector index attributes is cached in update_item_operation (alongside the existing _key_attributes cache) to avoid recomputing it on every call to update_attribute().	2026-04-16 13:31:49 +03:00

1 2 3 4 5 ...

53337 Commits