scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Szymon Malewski	6b2fce03f9	alternator: optional stripping of http response headers In Alternator's HTTP API, response headers can dominate bandwidth for small payloads. The Server, Date, and Content-Type headers were sent on every response but many clients never use them. This patch introduces three Alternator config options: - alternator_http_response_server_header, - alternator_http_response_disable_date_header, - alternator_http_response_disable_content_type_header, which allow customizing or suppressing the respective HTTP response headers. All three options support live update (no restart needed). The Server header is no longer sent by default; the Date and Content-Type defaults preserve the existing behavior. The Server and Date header suppression uses Seastar's set_server_header() and set_generate_date_header() APIs added in https://github.com/scylladb/seastar/pull/3217. This patch also fixes deprecation warnings from older Seastar HTTP APIs. Tests are in test/alternator/test_http_headers.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70 Closes scylladb/scylladb#28288	2026-05-19 10:47:13 +03:00
Nadav Har'El	cd61a44ab8	test/alternator: test response compression of tiny responses This patch adds to the existing collection of tests for Alternator response compression another test with a tiny response being compressed. This test serves two purposes: 1. It verifies setting alternator_response_compression_threshold_in_bytes to a tiny number like 1 really means that tiny responses would be compressed. 2. It verifies that our compression code, which has a special code path for the small chunk at the end of the compression, works correctly. The original motivation for writing this test was a false alarm by Claude Code which claimed that Alternator's response compression code has a serious, exploitable, memory overrun bug, because it set the wrong size limit on that last chunk. Claude was wrong, there is no such bug. We did set an oversized limit on the last chunk (so this patch fixes this typo), but it didn't matter - because the code used deflateBound - the guaranteed maximum size of the uncompressed data - for the buffer's size, so the buffer was unconditionally big enough, no matter which avail_out limit we passed to delate() it could never overflow. The included test passes even before this patch, even with ASAN enabled to detect memory overflows - no overflow was happening. It also passes after the typo correction in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29718	2026-05-19 10:02:26 +03:00
Szymon Malewski	cb8e11653f	test/alternator: Number normalization tests DynamoDB normalizes Number values, so different string representations of the same number (e.g., "1000" vs "1e3") should be treated as the same value in all contexts. In Alternator this is true in most cases, thanks to implicit normalization in Decimal `to_string()` function. However this is fragile - and in fact this function should be fixed due to OOM vulnerability in CQL use (#8002). This patch adds tests that should prevent regression in cases that work currently. Unfortunately not all contexts work currently - mainly the HASH keys are not normalized and backend handles them by byte representation. Added test replicate this incorrect behaviour All added tests pass with DynamoDB, with one exception: weirdly DynamoDB doesn't recognise unnormalized numbers in BatchGetItem as duplicate keys. Ref SCYLLADB-1575 Closes scylladb/scylladb#29501	2026-05-18 09:42:33 +03:00
Nadav Har'El	4082fdf350	alternator: add ReturnScores option to VectorSearch A vector search operation in Alternator (VectorSearch option to Query) returns items sorted by decreasing similarity to the searched vector. Although the items are sorted by decreasing similarity scores, before this patch the user had no way to see the values of these scores. This patch adds a new VectorSearch option, `ReturnScores`. This option defaults to `NONE`. But if set to `SIMILARITY`, the query will return an array `Scores` with the same length as `Items`, which gives the similarity score for each item. As usual, this patch includes the implementation, the documentation, and tests for the new feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 14:19:17 +03:00
Nadav Har'El	85c6cafb1d	alternator: add optimized vector type for vector search Today in Alternator vector search, vectors are presented to the API as lists of numbers. I.e., in JSON a vector is sent in requests and responses as: {"L": [{"N": "3.14159"}, {"N":" "6.7"}} This format is verbose and inefficient for long vectors. Even worse, because the "N" number format has precision guarantees in DynamoDB, we cannot optimize the storage of such vectors by, for example, storing the numbers as 32-bit floats. We actually store these vectors as JSON, exactly as shown above. So in this patch we introduce a new DynamoDB type, "FLOAT32VECTOR", for vectors. The above vector will look like this in JSON: {"FLOAT32VECTOR": [3.14159, 6.7]} Note that each number is an unquoted JSON number, not a JSON string. Importantly, the definition of the "FLOAT32VECTOR" type specifies that components of the vector only have 32-bit precision. This means that Scylla may store internally these vectors as lists of 32-bit floats - not as a JSON. And indeed, this patch includes this optimization: Top-level vector attributes are now encoded in an optimized way, as a byte 5 (alternator_type::FLOAT32VECTOR) followed by the elements of the vector, just 4 bytes each (the 4-byte big-endian IEEE 754 representation of each floating-point component). This patch also includes documentation, and extensive tests that the new "FLOAT32VECTOR" type works (which also serves as an example how to use it in the boto3 SDK), that it is indeed encoded internally as 32-bit floats and not wasteful JSON strings, and that vector search on such items work. The last thing requires cooperation from the vector store, of course - it needs to be able to understand the new optimized encoding of vector attributes in addition to the old unoptimized one. Note that the old unoptimized ("list of numbers") vectors are still supported. Although not recommended for general use, some users might still want to use the unoptimized type if they have pre-existing data created on DynamoDB or Alternator without vector search in mind, and the vectors already exist as lists of numbers. Although this is less important, the new vector type "FLOAT32VECTOR" is also allowed in a Query's QueryVector. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 11:57:45 +03:00
Nadav Har'El	ea910acdd4	alternator: add SimilarityFunction option to vector index creation Before this patch, vector search always used the COSINE similarity function. In this patch we add the ability to choose a different similarity function when creating a new vector index (with CreateTable or UpdateTable) by using the SimilarityFunction option. We still default to "COSINE" if SimilarityFunction isn't specified. Allowed similarity functions are COSINE, DOT_PRODUCT, and EUCLIDEAN. DescribeTable can also retrieve a vector index's SimilarityFunction. As usual, this patch also includes documentation for the new feature, and tests. Some of the tests can run without a vector store - verifying the API syntax and which similarity function is supported - but we also add tests that require the vector store and check that the different similarity functions actually sort the nearest items in the expected order. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 11:57:45 +03:00
Nadav Har'El	70283967d3	alternator: add vector search metrics Before this patch, we did not have any special metrics for vector search in Alternator. We have had count of "Query" operations, but there was no distinction between "standard" queries - of a base table or GSI/LSI - and vector-search queries. This patch adds four new metrics: * vector_search_query - counting how many Query requests are actually vector searches. * vector_search_query_returned_items - counting how many items were returned by vector searches. * vector_search_query_items_from_vs - counting how many results were retrieved from the vector-store backend. * vector_search_query_items_from_base_table - counting how many items were read from the base table during vector-search queries. Some vector search queries using SELECT=ALL_PROJECTED_ATTRIBUTES or COUNT are optimized to not need to read items from the base table. This patch also includes documentation for the new four metrics, and tests that they count what we want them to count. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-13 11:57:44 +03:00
Piotr Szymaniak	459c1dc32f	test/alternator: stop avoiding tablets in Streams tests Alternator Streams now supports tablets, so stop skipping the TTL Streams test in tablet mode and stop forcing vnodes in the Streams audit test. Refs SCYLLADB-463 Closes scylladb/scylladb#29697	2026-05-10 22:13:15 +03:00
Nadav Har'El	df8c9b17b8	Merge 'alternator: Graduate Alternator Streams from experimental' from Piotr Szymaniak As a final step for https://scylladb.atlassian.net/browse/SCYLLADB-461 we need to graduate Alternator Streams from experimental. So let's remove `--experimental-features=alternator-streams` and map the obsolete config string to `UNUSED` for backward compatibility. Also, remove the related gating of the feature. Finally, stop providing the config flag in test configs. Fixes SCYLLADB-1680 Fixes #16367 To documentation tracked by https://scylladb.atlassian.net/browse/SCYLLADB-462 still remains. This PR needs to hit 2026.2, so (only) if it branches before the PR is merged to `master`, we'd need to backport. Closes scylladb/scylladb#29604 * github.com:scylladb/scylladb: test: Stop providing alternator-streams experimental flag alternator: Graduate Alternator Streams from experimental	2026-05-10 22:10:03 +03:00
Nadav Har'El	63927e07ea	Merge 'alternator/streams: keep disabled streams usable and purge on re-enable' from Piotr Szymaniak When an Alternator stream is disabled, the data should continue to be accessible so that consumers can finish reading. When the stream is later re-enabled, a new StreamArn is produced and only then the old data is purged. On disable, the existing CDC options (including preimage and postimage) are preserved so that DescribeStream can still report StreamViewType. All stream APIs continue to work on the disabled stream, with all shards reported as closed (EndingSequenceNumber set). No new CDC records are written; existing data expires via TTL after 24 hours. On re-enable, the old CDC log table is dropped as a separate Raft group0 schema change and a fresh one is created with a new UUID, giving a new StreamArn. This is Alternator-specific — CQL CDC keeps reusing the log table. Re-enabling is the only way to immediately purge old stream data. Old stream data is removed immediately upon re-enable (a discrepancy with DynamoDB, which keeps it readable for 24 hours through the old StreamArn). Tests updated to cover the new disable and re-enable behavior. Fixes #7239 Fixes SCYLLADB-523 Closes scylladb/scylladb#29413 * github.com:scylladb/scylladb: alternator/streams: remove dead next_iter in get_records test/alternator: fix stream wait timeouts to use wall-clock time docs/alternator: document stream disable/re-enable behavior alternator/streams: keep disabled streams usable and purge on re-enable	2026-05-10 22:04:35 +03:00
Piotr Szymaniak	744848a85f	test/alternator: fix stream wait timeouts to use wall-clock time Both disable_stream and wait_for_active_stream used time.process_time() for their timeouts, but process_time measures CPU time, not wall-clock time. Since these loops spend most of their time sleeping and waiting on API calls, the timeouts could last far longer than intended. Use time.time() instead to enforce actual wall-clock deadlines.	2026-05-07 14:45:42 +02:00
Piotr Szymaniak	38bd068f78	alternator/streams: keep disabled streams usable and purge on re-enable Previously, disabling Alternator Streams would create a blank cdc::options with only enabled=false, which meant losing access also to stored Streams's data (including preimage and postimage). Now, when a stream is disabled: - The existing CDC options are preserved (only 'enabled' is flipped to false), so StreamViewType remains available. - DescribeStream enumerates all shards with EndingSequenceNumber set, indicating they are closed. - GetRecords omits NextShardIterator for disabled streams. - DescribeTable (supplement_table_stream_info) reports the stream ARN and StreamEnabled: false when the CDC log table still exists. - ListStreams uses get_base_table instead of is_log_for_some_table so that disabled streams whose log table still exists are listed. When a stream is re-enabled on an Alternator table that has an existing (disabled) CDC log table, the old log table is dropped and a fresh one is created with a new UUID, producing a new StreamArn. This is Alternator-specific behavior; CQL CDC tables continue to reuse the existing log table. The old stream data is lost immediately upon re-enable. DynamoDB keeps it readable for 24 hours. Tests: - test_streams_closed_read, test_streams_disabled_stream: remove xfail now that disabled streams are usable. - test_streams_reenable: new test verifying that re-enabling produces a new ARN and the old data is still readable via the old ARN (xfail because Scylla currently purges old data on re-enable). Fixes scylladb/scylladb#7239	2026-05-07 14:45:42 +02:00
Piotr Szymaniak	9a86044c63	test: Stop providing alternator-streams experimental flag Now that alternator-streams is no longer an experimental feature, stop passing it in test configurations.	2026-04-22 15:25:37 +02:00
Botond Dénes	eb3326b417	Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta should be merged after #29235 Complete the typed skip markers migration started in the plugin PR. Every bare `@pytest.mark.skip` decorator and `pytest.skip()` runtime call across the test suite is replaced with a typed equivalent, making skip reasons machine-readable in JUnit XML and Allure reports. 62 files changed across 8 commits, covering ~127 skip sites in total. Bare `pytest.skip` provides only a free-text reason string. CI dashboards (JUnit, Allure) cannot distinguish between a test skipped due to a known bug, a missing feature, a slow test, or an environment limitation. This makes it hard to track skip debt, prioritize fixes, or filter dashboards by skip category. The typed markers (`skip_bug`, `skip_not_implemented`, `skip_slow`, `skip_env`) introduced by the `skip_reason_plugin` solve this by embedding a `skip_type` field into every skip report entry. \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_bug` \| 24 \| 16 \| Skip reason references a known bug/issue \| \| `skip_not_implemented` \| 10 \| 5 \| Feature not yet implemented in Scylla \| \| `skip_slow` \| 4 \| 3 \| Test too slow for regular CI runs \| \| `skip_not_implemented` (bare) \| 2 \| 1 \| Bare `@pytest.mark.skip` with no reason (COMPACT STORAGE, #3882) \| \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_env` \| ~85 \| 34 \| Feature/config/topology not available at runtime \| \| `skip_bug` \| 2 \| 2 \| Known bugs: Streams on tablets (#23838), coroutine task not found (#22501) \| - Comments: 7 comments/docstrings across 5 files updated from `pytest.skip()` to `skip()` - Plugin hardened: `warnings.warn()` → `pytest.UsageError` for bare `@pytest.mark.skip` at collection time — bare skips are now a hard error, not a warning - Guard tests: New `test/pylib_test/test_no_bare_skips.py` with 3 tests that prevent regression: - AST scan for bare `@pytest.mark.skip` decorators - AST scan for bare `pytest.skip()` runtime calls - Real `pytest --collect-only` against all Python test directories Runtime skip sites use the convenience wrappers from `test.pylib.skip_types`: ```python from test.pylib.skip_types import skip_env ``` Usage: ```python skip_env("Tablets not enabled") ``` 1. test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs — 24 decorator sites, 16 files 2. test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented — 10 decorator sites, 5 files 3. test: migrate @pytest.mark.skip to @pytest.mark.skip_slow — 4 decorator sites, 3 files 4. test: migrate bare @pytest.mark.skip to skip_not_implemented — 2 bare decorators, 1 file 5. test: migrate runtime pytest.skip() to typed skip_env() — ~85 sites, 34 files 6. test: migrate runtime pytest.skip() to typed skip_bug() — 2 sites, 2 files 7. test: update comments referencing pytest.skip() to skip() — 7 comments, 5 files 8. test/pylib: reject bare pytest.mark.skip and add codebase guards — plugin hardening + 3 guard tests - All 60 plugin + guard tests pass (`test/pylib_test/`) - No bare `@pytest.mark.skip` or `pytest.skip()` calls remain in the codebase - `pytest --collect-only` succeeds across all test directories with the hardened plugin SCYLLADB-1349 Closes scylladb/scylladb#29305 * github.com:scylladb/scylladb: test/alternator: replace bare pytest.skip() with typed skip helpers test: migrate new bare skips introduced by upstream after rebase test/pylib: reject bare pytest.mark.skip and add codebase guards test: update comments referencing pytest.skip() to skip_env() test: migrate runtime pytest.skip() to typed skip_bug() test: migrate runtime pytest.skip() to typed skip_env() test: migrate bare @pytest.mark.skip to skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_slow test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs	2026-04-22 15:48:27 +03:00
Radosław Cybulski	6f7bf30a14	alternator: increase wait time to tablet sync When forcing tablet count change via cql command, the underlying tablet machinery takes some time to adjust. Original code waited at most 0.1s for tablet data to be synchronized. This seems to be not enough on debug builds, so we add exponential backoff and increase maximum waiting time. Now the code will wait 0.1s first time and continue waiting with each time doubling the time, up to maximum of 6 times - or total time ~6s. Fixes: SCYLLADB-1655 Closes scylladb/scylladb#29573	2026-04-21 17:38:07 +02:00
Marcin Maliszkiewicz	9f11920b15	Merge 'alternator: fix remaining problems with new Stream ARN format' from Nadav Har'El This small series includes a few followups to the patch that changed Alternator Stream ARNs from using our own UUID format to something that resembles Amazon's Stream ARNs (and the KCL library won't reject as bogus-looking ARNs). The first patch is the most important one, fixing ListStreams's LastEvaluatedStreamArn to also use the new ARN format. It fixes SCYLLADB-539. The following patches are additional cleanups and tests for the new ARN code. Closes scylladb/scylladb#29474 * github.com:scylladb/scylladb: alternator: fix ListStreams paging if table is deleted during paging test/alternator: test DescribeStream on non-existent table alternator: ListStreams: on last page, avoid LastEvaluatedStreamArn alternator: remove dead code stream_shard_id alternator: fix ListStreams to return real ARN as LastEvaluatedStreamArn	2026-04-20 14:42:28 +02:00
Artsiom Mishuta	dce0c24a02	test/alternator: replace bare pytest.skip() with typed skip helpers	2026-04-19 17:34:41 +02:00
Avi Kivity	9fb67e3e96	Revert "alternator: optional stripping of http response headers" This reverts commit `73f0deef6d`. It prevents `2943d30b0c`, which causes high flakiness, from being reverted.	2026-04-19 15:14:48 +03:00
Artsiom Mishuta	0b6b380b80	test: update comments referencing pytest.skip() to skip_env() Update 7 comments/docstrings across 5 files that still referenced pytest.skip() to reference the typed skip_env() wrapper for consistency with the migrated code.	2026-04-19 11:14:03 +02:00
Artsiom Mishuta	b10028e556	test: migrate runtime pytest.skip() to typed skip_bug() Migrate 2 runtime pytest.skip() calls referencing known bugs to use the typed skip_bug() wrapper from test.pylib.skip_types: - test/alternator/test_ttl.py: Streams on tablets (#23838) - test/scylla_gdb/test_task_commands.py: coroutine task not found (#22501)	2026-04-19 11:10:42 +02:00
Artsiom Mishuta	8a80e2c3be	test: migrate runtime pytest.skip() to typed skip_env() Migrate runtime pytest.skip() calls across 34 files to use the typed skip_env() wrapper from test.pylib.skip_types. These sites skip at runtime because a required feature, config option, library version, build mode, or runtime topology is not available. Also fixes 'raise pytest.skip(...)' in test_audit.py — skip_env() already raises internally, so the explicit raise was incorrect. Each file gains one new import: from test.pylib.skip_types import skip_env	2026-04-19 11:09:29 +02:00
Szymon Malewski	73f0deef6d	alternator: optional stripping of http response headers In Alternator's HTTP API, response headers can dominate bandwidth for small payloads. The Server, Date, and Content-Type headers were sent on every response but many clients never use them. This patch introduces three Alternator config options: - alternator_http_response_server_header, - alternator_http_response_disable_date_header, - alternator_http_response_disable_content_type_header, which allow customizing or suppressing the respective HTTP response headers. All three options support live update (no restart needed). The Server header is no longer sent by default; the Date and Content-Type defaults preserve the existing behavior. The Server and Date header suppression uses Seastar's set_server_header() and set_generate_date_header() APIs added in https://github.com/scylladb/seastar/pull/3217. This patch also fixes deprecation warnings from older Seastar HTTP APIs. Tests are in test/alternator/test_http_headers.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70 Closes scylladb/scylladb#28288	2026-04-19 09:22:04 +03:00
Nadav Har'El	0d05e3b4a4	alternator: fix ListStreams paging if table is deleted during paging Currently, ListStreams paging works by looking in the list of tables for ExclusiveStartStreamArn and starting there. But it's possible that during the paging process, one of the tables got deleted and ExclusiveStartStreamArn no longer points to an existing table. In the current implementation this caused the paging to stop (think it reached the end). The solution is simple: ListStreams will now sort the list of tables by name (it anyway needs to be sorted by something to be consistent across pages), and will look with std::upper_bound for the first table after the ExclusiveStartStreamArn - we don't need to find that table name itself. The patch also includes a test reproducing this bug. As usual, the test passes on DynamoDB, fails on Alternator before this patch, and passes with the patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:02 +03:00
Nadav Har'El	930fb4c330	test/alternator: test DescribeStream on non-existent table We already had a test for DescribeStream being called on a bogus ARN returns a ValidationException. But if the stream is more legitimate- looking but refers to a non-existent table (e.g., an ARN taken in the past from a table that no longer exists), we should return ResourceNotFoundException. In this patch we add a test that verifies we indeed do this correctly. Moreover, Alternator's current stream ARNs include both a keyspace name and a table name, and either one being incorrect should lead to ResourceNotFoundException, and indeed the new test validates that it works as expected - there is no bug here (AI guessed we have a bug in the missing keyspace case, but this guess was wrong).	2026-04-19 09:12:02 +03:00
Nadav Har'El	02d474fca8	alternator: ListStreams: on last page, avoid LastEvaluatedStreamArn When ListStreams is on its last page and ran out streams to list, it shouldn't return a paging cookie (LastEvaluatedStreamArn) at all. Before this patch it does, and forces the user to make another call just to get another empty page, which is silly. This patch includes a fix and a reproducer test (that, as usual, passes on DynamoDB and fails on Alternator before the patch and succeeds after). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:02 +03:00
Nadav Har'El	1ac910c2ab	alternator: fix ListStreams to return real ARN as LastEvaluatedStreamArn Alternator Streams' "ListStreams" does paging by returning a "cookie" LastEvaluatedStreamArn from one request, that the user passes to the next request as ExclusiveStartStreamArn. In the past, Alternator's stream ARNs were UUIDs, but we recently changed them to match DynamoDB's ARN format which the KCL library requires. However, we didn't change ListStream's cookie format, and it remained UUIDs. This, however, goes against the documentation of DynamoDB, which states that LastEvaluatedStreamArn should be "the stream ARN of the item where the operation stopped". It shouldn't be some weird opaque cookie. So in this patch we add a test that confirms that indeed, in DynamoDB the LastEvaluatedStreamARN is really the last returned ARN and not an opaque cookie. The new test passes on DynamoDB, and fails on Alternator before the simple fix that this patch then does. Fixes SCYLLADB-539.	2026-04-19 09:12:01 +03:00
Piotr Szymaniak	4b6937b570	alternator/streams: Block tablet merges when Alternator Streams are enabled DynamoDB Streams API can only convey a single parent per stream shard. Tablet merges produce 2 parents, which is incompatible. When streams are requested on a tablet table, block tablet merges via tablet_merge_blocked (the allocator suppresses new merge decisions and revokes any active merge decision). add_stream_options() sets tablet_merge_blocked=true alongside enabled=true, so CreateTable needs no special handling — the flag is inert on vnode tables and immediately effective on tablet tables. For UpdateTable, CDC enablement is deferred: store the user's intent via enable_requested, and let the topology coordinator finalize enablement once no in-progress merges remain. A new helper, defer_enabling_streams_block_tablet_merges(), amends the CDC options to this deferred state. Disabling streams clears all flags, immediately re-allowing merges. The tablet allocator accesses the merge-blocked flag through a schema::tablet_merges_forbidden() accessor rather than reaching into CDC options directly. Mark test_parent_children_merge as xfail and remove downward (merge) steps from tablet_multipliers in test_parent_filtering and test_get_records_with_alternating_tablets_count.	2026-04-19 03:54:33 +02:00
Nadav Har'El	31e0315710	Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski Fix cdc writing unnecesary entries to it's log, like for example when Alternator deletes an item which in reality doesn't exist. Originally @wps0 tackled this issue. This patch is an extension of his work. His work involved adding `should_skip` function to cdc, which would process a `mutation` object and decide, wherever changes in the object should be added to cdc log or not. The issue with his approach is that `mutation` object might contain changes for more than one row. If - for example - the `mutation` object contains two changes, delete of non-existing row and create of non-existing row, `should_skip` function will detect changes in second item and allow whole `mutation` (BOTH items) to be added. For example (using python's boto3) running this on empty table: ``` with table.batch_writer() as batch: batch.put_item({'p': 'p', 'c': 'c0'}) batch.delete_item(Key={'p': 'p', 'c': 'c1'}) ``` will emit two events ("put" event and "delete" event), even though the item with `c` set to `c1` does not exist (thus can't be deleted). Note, that both entries in batch write must use the same partition key, otherwise upper layer with split them into separate `mutation` objects and the issue will not happen. The solution is to do similar processing, but consider each change separated from others. This is tricky to implement due to a way cdc works. When cdc processes `mutation` object (containing X changes), it emits cdc entries in phases. Phase 1 - emit `preimage` (old state) for each change (if requested). Phase 2 - for each change emit actual "diff" (update / delete and so on). Phase 3 - emit `postimage` (new state). We will know if change needs to be skipped during phase 2. By that time phase 1 is completed and preimage for the change is emited. At that moment we set a flag that the change (identified by clustering key value) needs to be skipped - we add a clustering key to a `ignore-rows` set (`_alternator_clustering_keys_to_ignore` variable) and continue normally. Once all phases finish we add a `postprocess` phase (`clean_up_noop_rows` function). It will go through generated cdc mutations and skip all modifications, for which clustering key is in `ignore-rows` set. After skipping we need to do a "cleanup" operation - each generated cdc mutation contain index (incremented by one), if we skipped some parts, the index is not consecutive anymore, so we reindex final changes. There's a special case worth mentioning - Alternator tables without clustering keys. At that point `mutation` object passed to cdc can contain exactly one change (since different partition keys are splitted by upper layers and Alternator will never emit `mutation` object containing two (or more) changes with the same primary key. Here, when we decide the change is to be skipped we add empty `bytes` object to `ignore-rows` set. When checking `ignore-rows` set, we check if it's empty or not (we don't check for presence of empty `bytes` object). Note: there might be some confusion between this patch and #28452 patch. Both started from the same error observation and use similar tests for validation, as both are easily triggered by BatchWrite commands (both needs `mutation` object passed to cdc to contain more than one single change). This issue tho is about wrong data written in cdc log and is fixed at cdc, where #28452 is about wrong way of parsing correct cdc data and is fixed at Alternator side of things. Note, that we need #28452 to truly verify (otherwise we will emit correct cdc entries, but Alternator will incorrectly parse them). Note: to benefit / notice this patch you need `alternator_streams_increased_compatibility` flag turned on. Note: rework is quite "broad" and covers a lot of ground - every operation, that might result in a no-change to the database state should be tested. An additional test was added - trying to remove a column from non-existing item, as well as trying to remove non-existing column from existing item. Fixes: #28368 Fixes: SCYLLADB-1528 Fixes: SCYLLADB-538 Closes scylladb/scylladb#28544 * github.com:scylladb/scylladb: alternator: remove unnecesary code alternator: fix Alternator writing unnecesary cdc entries alternator: add failing tests for Streams	2026-04-18 00:07:51 +03:00
Radosław Cybulski	9a6aed721b	alternator: add streams with tablets tests Add tests for Streams, when table uses tablets underneath. One test verifies filtering using CHILD_SHARDS feature. Other one makes sure we get read all data while the table undergoes tablet count change. Add `--tablet-load-stats-refresh-interval-in-seconds=1` to `alternator/run` script, as otherwise newly added tests will fail. The setting changes how often scylla refreshes tablet metadata. This can't be done using `scylla_config_temporary`, as 1) default is 60 seconds 2) scylla will wait full timeout (60s) to read configuration variable again.	2026-04-17 18:58:27 +02:00
Radosław Cybulski	6be16cf224	alternator: remove antitablet guards when using Streams Remove `if` condition, that prevented tables with tablets working with Streams. Remove a test, that verifies, that Alternator will reject tables with tablets underneath working with Streams feature enabled on them. Update few tests, that were expected to fail on tablets to enable their normal execution.	2026-04-17 18:58:26 +02:00
Radosław Cybulski	6e5aaa85b6	alternator: fix Alternator writing unnecesary cdc entries Work in this patch is a result of two bugs - spurious MODIFY event, when remove column is used in `update_item` on non-existing item and spurious events, when batch write item mixed noop operations with operations involving actual changes (the former would still emit cdc log entries). The latter issue required rework of Piotr Wieczorek's algorithm, which fixed former issue as well. Piotr Wieczorek previously wrote checks, that should prevent unnecesary cdc events from being written. His implementation missed the fact, that a single `mutation` object passed to cdc code to be analysed for cdc log entries can contain modifications for multiple rows (with the same timestamp - for example as a result to BatchWriteItem call). His code tries to skip whole `mutation`, which in such case is not possible, because BatchWriteItem might have one item that does nothing and second item that does modification (this is the reason for the second bug). His algorithm was extended and moved. Originally it was working as follows - user would sent a `mutation` object with some changes to be "augmented". The cdc would process those changes and built a set of cdc log changes based on them, that would be added to cdc log table. Piotr added a `should_skip` function, which processes user changes and tried to determine if they all should be dropped or not. New version, instead of trying to skip adding rows to cdc log `mutation` object, builds a rows-to-ignore set. After whole cdc log `mutation` object is completed, it processes it and go through it row by row. Any row that was previously added to a `rows_to_ignore` set will now be removed. Remaining rows are written to new cdc log `mutation` with new clustering key (`cdc$batch_seq_no` index value should probably be consecutive - we just want to be safe here) and returns new `mutation` object to be sent to cdc log table. The first bug is fixed as a side effect of new algorithm, which contains more precise checks detecting, if given mutation actually made a difference. Fixes: #28368 Fixes: SCYLLADB-538 Fixes: SCYLLADB-1528 Refs: #28452	2026-04-17 18:00:25 +02:00
Radosław Cybulski	2894542e57	alternator: add failing tests for Streams Add failing tests for Streams functionality. Trying to remove column from non-existing item is producing a MODIFY event (while it should none). Doing batch write with operations working on the same partition, where one operation is without side effects and second with will produce events for both operations, even though first changes nothing. First test has two versions - with and without clustering key. Second has only with clustering key, as we can't produce batch write with two items for the same partition - batch write can't use primary key more than once in single call. We also add a test for batch write, where one of three operations has no observable side effects and should not show up in Streams output, but in current scylla's version it does show.	2026-04-17 16:28:14 +02:00
Botond Dénes	facb50cbf9	Merge 'test.py: refactor test.py' from Andrei Chekun With the latest changes, there are a lot of code that is redundant in the test.py. This PR just cleans this code. Also, it narrows using dynamic scope for fixtures to test/alternator and test/cqlpy. All the rest by default will have module scope. test.py will be a wrapper for pytest mostly for CI use. As for now test.py have important part of calculating the number of threads to start pytest with. This is not possible to do in pytest itself. No backport needed, framework enhancement only. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-666 Closes scylladb/scylladb#28852 * github.com:scylladb/scylladb: test.py: remove testpy_test_fixture_scope test.py: add logger for 3rd party service test.py: delete dead code in test.py	2026-04-17 12:51:14 +03:00
Piotr Dulikowski	37fc1507f0	Merge 'Alternator: Add vector search support' from Nadav Har'El This series adds support for vector search in Alternator based on the existing implementation in CQL. The series adds APIs for `CreateTable` and `UpdateTable` to add or remove vector indexes to Alternator tables, `DescribeTable` to list them and check the indexing status, and `Query` to perform a vector search - which contacts the vector store for the actual ANN (approximate nearest neighbor) search. Correct functionality of these features depend on some features of the the vector store, that were already done (see https://github.com/scylladb/vector-store/pull/394). This initial implementation is fully functional, and can already be useful, but we do not yet support all the features we hope to eventually support. Here are things that we have not done yet, and plan to do later in follow-up pull requests: 1. Support a new optimized vector type ("V") - in addition to the "list of numbers" type supported in this version. 2. Allow choosing a different similarity function when creating an index, by SimilarityFunction in VectorIndex definition. 3. Allow choosing quantization (f32/f16/bf16/i8/b1) to ask the vector index to compress stored vectors. 4. Support oversampling and rescoring, defined per-index and per-query. 5. Support HNSW tuning parameters — maximum_node_connections, construction_beam_width, search_beam_width. 6. Support pre-filtering over key columns, which are available at the vector store, by sending the filter to the vector store (translated from DynamoDB filter syntax to the vector's store's filter syntax). A decision still need to be made if this will use KeyConditionExpression or FilterExpression. This version supports only post-filtering (with `FilterExpression`). 7. Support projecting non-key attributes into the index (Projection=INCLUDE and Projection=ALL), and then 1. pre-filtering using these attributes, and 2. efficiently return these attributes (using Select=ALL_PROJECTED_ATTRIBUTES, which today returns just the key columns). 8. Optimize the performance of `Query`, which today is inefficient for Select=ALL_ATTRIBUTES because it serially retrieves the matching items one at a time. 9. Returning the similarity scores with the items (the design proposes ReturnVectorSearchSimilarity). 10. Add more vector-search-specific metrics, beyond the metric we already have counting Query requests. For example separate latency and request-count metrics for vector-search Queries (distinct from GSI/LSI queries), and a metric accumulating the total Limit (K) across all vector search queries. 11. Consider how (and if at all) we want to run the tests in test/alternator/test_vector.py that need the vector store in the CI. Currently they are skipped in CI and only run manually (with `test/alternator/run --vs test_vector`). 12. UpdateTable 'Update' operation to modify index parameters. Only some can be modified, e.g., Oversampling. 13. Support for "local index" (separate index for each partition). 14. Make sure that vector search and Streams can be enabled concurrently on the same table - both need CDC but we need to verify that one doesn't confuse the other or disables options that the other needs. We can only do this after we have Alternator Streams running on tablets (since vector store requires tablets). Testing the new Alternator vector search end-to-end requires running both Scylla and the vector store together. We will have such end-to-end tests in the vector store repository (see https://github.com/scylladb/vector-store/pull/392), but we also add in this pull request many end-to-end tests written in Python, that can be run with the command "test/alternator/run --vs test_vector.py". The "--vs" option tells the run script to run both Scylla and the vector store (currently assumed to be in `.../vector-store/target/release/vector-store`). About 65% of the tests in this pull request check supported syntax and error paths so can run without the vector store, while about 35% of the tests do perform actual Query operations and require the vector store to be running. Currently, the tests that do require the vector store will not get run by CI, but can be easily re-run manually with `test/alternator/run --vs test_vector.py`. In total, this series includes 78 functional tests in 2200 lines of Python code. This series also includes documentation for the new Alternator feature and the new APIs introduced. You can see a more detailed design document here: https://docs.google.com/document/d/1cxLI7n-AgV5hhH1DTyU_Es8_f-t8Acql-1f58eQjZLY/edit Two patches in this series split the huge alternator/executor.cc, after this series continued to grow it and it reached a whoppng 7,000 lines. These patches are just reorganization of code, no functional changes. But it's time that we finally do this (Refs #5783), we can't just continue to grow executor.cc with no end... Closes scylladb/scylladb#29046 * github.com:scylladb/scylladb: test/alternator: add option to "run" script to run with vector search alternator: document vector search test/alternator: fix retries in new_dynamodb_session test/alternator: test for allowed characters in attribute names test/alternator: tests for vector index support alternator, vector: add validation of non-finite numbers in Query alternator: Query: improve error message when VectorSearch is missing alternator: add per-table metrics for vector query alternator: clean up duplicated code alternator: fix default Select of Query alternator: split executor.cc even more alternator: split alternator/executor.cc alternator: validate vector index attribute values on write alternator: DescribeTable for vector index: add IndexStatus and Backfilling alternator: implement Query with a vector index alternator: fix bug in describe_multi_item() alternator: prevent adding GSI conflicting with a vector index alternator: implement UpdateTable with a vector index alternator: implement DescribeTable with a vector index alternator: implement CreateTable with a vector index alternator: reject empty attribute names cdc: fix on_pre_create_column_families to create CDC log for vector search	2026-04-17 10:25:45 +02:00
Andrei Chekun	745debe9ec	test.py: remove testpy_test_fixture_scope With migration to pyest this fixture is useless. Removing and setting the session to the module for the most of the tests. Add dynamic_scope function to support running alternator fixtures in session scope, while Test and TestSuite are not deleted. This is for migration period, later on this function should be deleted.	2026-04-16 22:08:33 +02:00
Radosław Cybulski	c5ed6b22ae	alternator: add CHILD_SHARDS filtering Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user need to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only list of stream shards, that are direct descendants of passed parent stream shard. Each stream shard cover a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is a Amazon's functionality and is required by KCL. Add unit tests. Fixes: #25160 Closes scylladb/scylladb#28189	2026-04-16 18:27:55 +03:00
Piotr Szymaniak	d0c3f78d76	test/alternator: extend local TTL streams timeout Increase the non-AWS wait in the TTL streams test to reduce vnode CI flakes caused by delayed expiration visibility. Fixes SCYLLADB-1556 Closes scylladb/scylladb#29516	2026-04-16 15:53:35 +03:00
Nadav Har'El	d3d5db37d7	test/alternator: add option to "run" script to run with vector search Add to test/alternator/run the option "-vs" which runs alongside with Scylla a vector store, to allow running Alternator tests with vector indexing. To get the vector store, do git clone git@github.com:scylladb/vector-store.git cargo build --release "run -vs" looks for an executable in ../vector-store/target/*/vector-store but can also be overridden by the VECTOR_STORE environment variable. test/alternator/run runs the vector store exactly like it runs Scylla - in a temporary directory, on a temporary IP address in the localhost subnet (127.0.0/8), killing it when the test end, and showing the output of both programs (Scylla and vector store). These transient runs of Scylla and vector store are configured to be able to communicate to each other. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:18 +03:00
Nadav Har'El	164b0e37e1	test/alternator: fix retries in new_dynamodb_session The new_dynamodb_session() function had a bug which we never noticed because we hardly used it, but it became more noticable when the new test/alternator/test_vector.py started to use it: By default, boto3 retries a request up to 9 times when it encounters a retriable error (such as an Internal Server Error). We don't want such retries in our tests - it makes failures slower, but more importantly it can hide "flaky" bugs by retrying 9 times until it happens to succeed. The new_dynamodb_session() had code (copied from the dynamodb fixture) to set boto3's "max_attempts" configuration to 0, to disable this retry. But this code had an incorrect "if" to only be done if we're testing on "localhost". This is wrong: We almost never use "localhost" as the target of the test; Both test/cqlpy/run and test.py pick an IP address in the localhost subnet (127/8) and uses that IP address - not the string "localhost". This bug only existed in new_dynamodb_session() - the more commonly used "dynamodb" fixture didn't have this bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	858dee0b30	test/alternator: test for allowed characters in attribute names One of the tests in the previous patch checked that strange characters are allowed in attribute names used for vector indexing. It turns out we never had a test that verifies that regardless of vector indexes - any character whatsoever is allowed in attribute names. This is different from table names which are much more limited. So this patch adds the missing test. As usual, the new test also passes on DynamoDB, showing that these stange characters in attribute names are also allowed by DynamoDB.	2026-04-16 14:30:17 +03:00
Nadav Har'El	58538e18e8	test/alternator: tests for vector index support In this patch we add a large collection of basic functional tests for the vector index support, covering the CreateTable, UpdateTable, DescribeTable and Query operations and the various ways in which those are allowed to work - or expected to fail. These tests were written in parallel with writing the code so they (hopefully) cover all the corner cases considered during development, and make sure these corner cases are all handled correctly and will not regress in the future. Some of these tests do not involve querying of the index and focus on the structure of requests and the kind of syntax allowed. But other tests are end-to-end, requiring the vector store to be running and trying to index Alternator data and query it. These tests are marked "needs_vector_store", and are immediately skipped in Scylla is not configured to connect to a vector store. In a later patch we'll add a an option to test/alternator/run to be able to run these end-to-end tests by automatically running both Scylla and the Vector Store. We'll have additional end-to-end tests in the vector-store repository. Note that vector search is a new API feature that doesn't exist in DynamoDB, so we are adding new parameters and outputs to existing operations. The AWS SDKs don't normally allow doing that, so the test added here begins by teaching the Python SDK to use the new APIs we added. This piece of code can also be used by end-users to use vector search (at least in Python...) before we officially add this support to ScyllaDB's SDK wrappers.	2026-04-16 14:30:17 +03:00
Nadav Har'El	f932f94422	alternator: add per-table metrics for vector query The per-table metrics for Query were not incremented for the vector variant of the Query operations, only the global metrics were incremented. This patch fixes this oversight, and add a test that reproduces it (the new test fails before this patch, and passes after).	2026-04-16 14:30:16 +03:00
Nadav Har'El	f15c6634a7	alternator: fix default Select of Query In earlier patches, when Query'ing a vector index, we set the default Select to ALL_ATTRIBUTES. However, according to the DynamoDB documentation for Query, "If neither Select nor ProjectionExpression are specified, DynamoDB defaults to ALL_ATTRIBUTES when accessing a table, and ALL_PROJECTED_ATTRIBUTES when accessing an index." This default should also apply to vector index, so this patch fixes this. The new behavior is not only more compatible with DynamoDB, it is also much more efficient by default, as ALL_PROJECTED_ATTRIBUTES does not need to read from the base table - it returns the results that the vector store returned. Of course, if the user needs the more efficient ALL_ATTRIBUTES this option is still available - it's just no longer the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	0afc730b7b	alternator: reject empty attribute names Alternator has a function validate_attr_name_length() used to validate an attribute name passed in different operations like PutItem, UpdateItem, GetItem, etc. It fails the request if the attribute name is longer than 65535 characters. It turns out that we forgot to check if the attribute name length isn’t 0 - which should be forbidden as well! This patch fixes the validation code, and also adds a test that confirms that after this patch empty attribute names are rejected - just like DynamoDB does - whereas before this patch they were silently accepted. We want to fix this issue now, because in a later patch we intend to use the same validation function also for vector indexes - and want it to be accurate. Fixes SCYLLADB-1069. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 13:28:15 +03:00
Piotr Szymaniak	4c93c2af62	audit/alternator: support audit_tables=alternator.<table> shorthand The real keyspace name of an Alternator table T is "alternator_T". Expand the "alternator.T" format used in the audit_tables config flag to the real keyspace name at parse time, so users don't need to spell out the internal "alternator_T.T" form.	2026-04-15 12:29:15 +02:00
Piotr Szymaniak	0714d8aded	audit/alternator: Add negative audit tests Add tests for the unhappy path of Alternator audit logging: - Category filtering: operations are not logged when their category (DML, QUERY, DDL) is excluded from audit_categories. - Keyspace filtering: operations on a keyspace not listed in audit_keyspaces are not logged. - Error entries: a failed operation (thrown exception after audit_info is set) produces an audit entry with error=true. - Empty-keyspace bypass: global operations like ListTables and DescribeEndpoints are logged regardless of audit_keyspaces because should_log() short-circuits on an empty keyspace.	2026-04-15 12:29:15 +02:00
Piotr Szymaniak	ad05b44931	audit/alternator: Add testing of auditing There is a new test file created, `test/alternator/test_audit.py`. The file contains a suite of tests of all auditing operations.	2026-04-15 12:29:15 +02:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Artsiom Mishuta	b1e9c0b867	test/pylib: add typed skip markers plugin Add skip_reason_plugin.py — a framework-agnostic pytest plugin that provides typed skip markers (skip_bug, skip_not_implemented, skip_slow, skip_env) so that the reason a test is skipped is machine-readable in JUnit XML and Allure reports. Bare untyped pytest.mark.skip now triggers a warning (to become an error after full migration). Runtime skips via skip() are also enriched by parsing the [type] prefix from the skip message. The plugin is a class (SkipReasonPlugin) that receives the concrete SkipType enum and an optional report_callback from conftest.py, keeping it decoupled from allure and project-specific types. Extract SkipType enum and convenience runtime skip wrappers (skip_bug, skip_env, etc.) into test/pylib/skip_types.py so callers only need a single import instead of importing both SkipType and skip() separately. conftest.py imports SkipType from the new module and registers the plugin instance unconditionally (for all test runners). New files: - test/pylib/skip_reason_plugin.py: core plugin — typed marker processing, bare-skip warnings, JUnit/Allure report enrichment (including runtime skip() parsing via _parse_skip_type helper) - test/pylib/skip_types.py: SkipType enum and convenience wrappers (skip_bug, skip_not_implemented, skip_slow, skip_env) - test/pylib_test/test_skip_reason_plugin.py: 17 pytester-based test functions (51 cases across 3 build modes) covering markers, warnings, reports, callbacks, and skip_mode interaction Infrastructure changes: - test/conftest.py: import SkipType from skip_types, register SkipReasonPlugin with allure report callback - test/pylib/runner.py: set SKIP_TYPE_KEY/SKIP_REASON_KEY stash keys for skip_mode so the report hook can enrich JUnit/Allure with skip_type=mode without longrepr parsing - test/pytest.ini: register typed marker definitions (required for --strict-markers even when plugin is not loaded) Migrated test files (representative samples): - test/cluster/test_tablet_repair_scheduler.py: skip -> skip_bug (#26844), skip -> skip_not_implemented - test/cqlpy/.../timestamp_test.py: skip -> skip_slow - test/cluster/dtest/schema_management_test.py: skip -> skip_not_implemented - test/cluster/test_change_replication_factor_1_to_0.py: skip -> skip_bug (#20282) - test/alternator/conftest.py: skip -> skip_env - test/alternator/test_https.py: use skip_env() wrapper Fixes SCYLLADB-79 Closes scylladb/scylladb#29235	2026-04-08 10:38:56 +03:00
Nadav Har'El	a0e79f391f	Merge 'alternator: fix batch write item squashing cdc entries' from Radosław Cybulski When `BatchWriteItem` operates on multiple items sharing the same partition key in `always_use_lwt` write isolation mode, all CDC log entries are emitted under a single timestamp. The previous `get_records` parsing algorithm in `alternator/streams.cc` assumed that all CDC log entries sharing the same timestamp correspond to a single DynamoDB item change. As a result, it would incorrectly squash multiple distinct item changes into a single Streams record — producing wrong event data (e.g., one INSERT instead of four, with mismatched key/attribute values). Note: the bug is specific to `always_use_lwt` mode because only in LWT mode does the entire batch share a single timestamp. In non-LWT modes, each item in the batch receives a separate timestamp, so the entries naturally stay separate. Commit 1: alternator: add BatchWriteItem Streams test - Adds new tests `test_streams_batchwrite_no_clustering_deletes_non_existing_items` and `test_streams_batchwrite_no_clustering_deletes_existing_items` that cover the corner cases of batch-deleting a existing and non-existing item in a table without a clustering key. CDC tables without clustering keys are handled differently, and this path was previously untested for delete operations. - Adds a new test `test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data`, that is a simple way to trigger a bug. - Adds a new test `test_streams_batchwrite_into_the_same_partition_deletes_existing_items`, that validates various combinations of puts and deletes in a single BatchWrite against the same partition. - Adds a new `test_table_ss_new_and_old_images_write_isolation_always` fixture and extends `create_table_ss` to accept `additional_tags`, enabling tests with a specific write isolation mode. Commit 2: alternator: fix BatchWriteItem squashed Streams entries The core fix rewrites the CDC log entry parsing in `get_records` to distinguish items by their clustering key: - Introduces `managed_bytes_ptr_hash` and `managed_bytes_ptr_equal` helper structs for pointer-based hash map lookups on `managed_bytes`. - Replaces the single `record`/`dynamodb` pair with a `std::unordered_map<const managed_bytes, Record, ...>` (`records_map`) keyed by the base table's clustering key value from each CDC log row. For tables without a clustering key, all entries map to a single sentinel key. - Adds a validation that Alternator tables have at most one clustering key column (as required by the DynamoDB data model). - On end-of-record (`eor`), flushes all accumulated per-clustering-key records into the output, each with a unique `eventID` (the `event_id` format now includes an index suffix). - Adjusts the limit check: since a single CDC timestamp bucket can now produce multiple output records, the limit may be slightly exceeded to avoid breaking mid-batch. Fixes #28439 Fixes: SCYLLADB-540 Closes scylladb/scylladb#28452 github.com:scylladb/scylladb: alternator/test: explain why 'always' write isolation mode is used in tests alternator/test: add scylla_only to always write isolation fixture alternator: fix BatchWriteItem squashed Streams entries alternator: add BatchWriteItem test (failing)	2026-04-07 17:49:23 +03:00

1 2 3 4 5 ...

630 Commits