scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 00:02:37 +00:00

Author	SHA1	Message	Date
Szymon Malewski	6b2fce03f9	alternator: optional stripping of http response headers In Alternator's HTTP API, response headers can dominate bandwidth for small payloads. The Server, Date, and Content-Type headers were sent on every response but many clients never use them. This patch introduces three Alternator config options: - alternator_http_response_server_header, - alternator_http_response_disable_date_header, - alternator_http_response_disable_content_type_header, which allow customizing or suppressing the respective HTTP response headers. All three options support live update (no restart needed). The Server header is no longer sent by default; the Date and Content-Type defaults preserve the existing behavior. The Server and Date header suppression uses Seastar's set_server_header() and set_generate_date_header() APIs added in https://github.com/scylladb/seastar/pull/3217. This patch also fixes deprecation warnings from older Seastar HTTP APIs. Tests are in test/alternator/test_http_headers.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70 Closes scylladb/scylladb#28288	2026-05-19 10:47:13 +03:00
Nadav Har'El	63927e07ea	Merge 'alternator/streams: keep disabled streams usable and purge on re-enable' from Piotr Szymaniak When an Alternator stream is disabled, the data should continue to be accessible so that consumers can finish reading. When the stream is later re-enabled, a new StreamArn is produced and only then the old data is purged. On disable, the existing CDC options (including preimage and postimage) are preserved so that DescribeStream can still report StreamViewType. All stream APIs continue to work on the disabled stream, with all shards reported as closed (EndingSequenceNumber set). No new CDC records are written; existing data expires via TTL after 24 hours. On re-enable, the old CDC log table is dropped as a separate Raft group0 schema change and a fresh one is created with a new UUID, giving a new StreamArn. This is Alternator-specific — CQL CDC keeps reusing the log table. Re-enabling is the only way to immediately purge old stream data. Old stream data is removed immediately upon re-enable (a discrepancy with DynamoDB, which keeps it readable for 24 hours through the old StreamArn). Tests updated to cover the new disable and re-enable behavior. Fixes #7239 Fixes SCYLLADB-523 Closes scylladb/scylladb#29413 * github.com:scylladb/scylladb: alternator/streams: remove dead next_iter in get_records test/alternator: fix stream wait timeouts to use wall-clock time docs/alternator: document stream disable/re-enable behavior alternator/streams: keep disabled streams usable and purge on re-enable	2026-05-10 22:04:35 +03:00
Piotr Szymaniak	744848a85f	test/alternator: fix stream wait timeouts to use wall-clock time Both disable_stream and wait_for_active_stream used time.process_time() for their timeouts, but process_time measures CPU time, not wall-clock time. Since these loops spend most of their time sleeping and waiting on API calls, the timeouts could last far longer than intended. Use time.time() instead to enforce actual wall-clock deadlines.	2026-05-07 14:45:42 +02:00
Piotr Szymaniak	38bd068f78	alternator/streams: keep disabled streams usable and purge on re-enable Previously, disabling Alternator Streams would create a blank cdc::options with only enabled=false, which meant losing access also to stored Streams's data (including preimage and postimage). Now, when a stream is disabled: - The existing CDC options are preserved (only 'enabled' is flipped to false), so StreamViewType remains available. - DescribeStream enumerates all shards with EndingSequenceNumber set, indicating they are closed. - GetRecords omits NextShardIterator for disabled streams. - DescribeTable (supplement_table_stream_info) reports the stream ARN and StreamEnabled: false when the CDC log table still exists. - ListStreams uses get_base_table instead of is_log_for_some_table so that disabled streams whose log table still exists are listed. When a stream is re-enabled on an Alternator table that has an existing (disabled) CDC log table, the old log table is dropped and a fresh one is created with a new UUID, producing a new StreamArn. This is Alternator-specific behavior; CQL CDC tables continue to reuse the existing log table. The old stream data is lost immediately upon re-enable. DynamoDB keeps it readable for 24 hours. Tests: - test_streams_closed_read, test_streams_disabled_stream: remove xfail now that disabled streams are usable. - test_streams_reenable: new test verifying that re-enabling produces a new ARN and the old data is still readable via the old ARN (xfail because Scylla currently purges old data on re-enable). Fixes scylladb/scylladb#7239	2026-05-07 14:45:42 +02:00
Botond Dénes	eb3326b417	Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta should be merged after #29235 Complete the typed skip markers migration started in the plugin PR. Every bare `@pytest.mark.skip` decorator and `pytest.skip()` runtime call across the test suite is replaced with a typed equivalent, making skip reasons machine-readable in JUnit XML and Allure reports. 62 files changed across 8 commits, covering ~127 skip sites in total. Bare `pytest.skip` provides only a free-text reason string. CI dashboards (JUnit, Allure) cannot distinguish between a test skipped due to a known bug, a missing feature, a slow test, or an environment limitation. This makes it hard to track skip debt, prioritize fixes, or filter dashboards by skip category. The typed markers (`skip_bug`, `skip_not_implemented`, `skip_slow`, `skip_env`) introduced by the `skip_reason_plugin` solve this by embedding a `skip_type` field into every skip report entry. \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_bug` \| 24 \| 16 \| Skip reason references a known bug/issue \| \| `skip_not_implemented` \| 10 \| 5 \| Feature not yet implemented in Scylla \| \| `skip_slow` \| 4 \| 3 \| Test too slow for regular CI runs \| \| `skip_not_implemented` (bare) \| 2 \| 1 \| Bare `@pytest.mark.skip` with no reason (COMPACT STORAGE, #3882) \| \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_env` \| ~85 \| 34 \| Feature/config/topology not available at runtime \| \| `skip_bug` \| 2 \| 2 \| Known bugs: Streams on tablets (#23838), coroutine task not found (#22501) \| - Comments: 7 comments/docstrings across 5 files updated from `pytest.skip()` to `skip()` - Plugin hardened: `warnings.warn()` → `pytest.UsageError` for bare `@pytest.mark.skip` at collection time — bare skips are now a hard error, not a warning - Guard tests: New `test/pylib_test/test_no_bare_skips.py` with 3 tests that prevent regression: - AST scan for bare `@pytest.mark.skip` decorators - AST scan for bare `pytest.skip()` runtime calls - Real `pytest --collect-only` against all Python test directories Runtime skip sites use the convenience wrappers from `test.pylib.skip_types`: ```python from test.pylib.skip_types import skip_env ``` Usage: ```python skip_env("Tablets not enabled") ``` 1. test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs — 24 decorator sites, 16 files 2. test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented — 10 decorator sites, 5 files 3. test: migrate @pytest.mark.skip to @pytest.mark.skip_slow — 4 decorator sites, 3 files 4. test: migrate bare @pytest.mark.skip to skip_not_implemented — 2 bare decorators, 1 file 5. test: migrate runtime pytest.skip() to typed skip_env() — ~85 sites, 34 files 6. test: migrate runtime pytest.skip() to typed skip_bug() — 2 sites, 2 files 7. test: update comments referencing pytest.skip() to skip() — 7 comments, 5 files 8. test/pylib: reject bare pytest.mark.skip and add codebase guards — plugin hardening + 3 guard tests - All 60 plugin + guard tests pass (`test/pylib_test/`) - No bare `@pytest.mark.skip` or `pytest.skip()` calls remain in the codebase - `pytest --collect-only` succeeds across all test directories with the hardened plugin SCYLLADB-1349 Closes scylladb/scylladb#29305 * github.com:scylladb/scylladb: test/alternator: replace bare pytest.skip() with typed skip helpers test: migrate new bare skips introduced by upstream after rebase test/pylib: reject bare pytest.mark.skip and add codebase guards test: update comments referencing pytest.skip() to skip_env() test: migrate runtime pytest.skip() to typed skip_bug() test: migrate runtime pytest.skip() to typed skip_env() test: migrate bare @pytest.mark.skip to skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_slow test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs	2026-04-22 15:48:27 +03:00
Artsiom Mishuta	dce0c24a02	test/alternator: replace bare pytest.skip() with typed skip helpers	2026-04-19 17:34:41 +02:00
Avi Kivity	9fb67e3e96	Revert "alternator: optional stripping of http response headers" This reverts commit `73f0deef6d`. It prevents `2943d30b0c`, which causes high flakiness, from being reverted.	2026-04-19 15:14:48 +03:00
Szymon Malewski	73f0deef6d	alternator: optional stripping of http response headers In Alternator's HTTP API, response headers can dominate bandwidth for small payloads. The Server, Date, and Content-Type headers were sent on every response but many clients never use them. This patch introduces three Alternator config options: - alternator_http_response_server_header, - alternator_http_response_disable_date_header, - alternator_http_response_disable_content_type_header, which allow customizing or suppressing the respective HTTP response headers. All three options support live update (no restart needed). The Server header is no longer sent by default; the Date and Content-Type defaults preserve the existing behavior. The Server and Date header suppression uses Seastar's set_server_header() and set_generate_date_header() APIs added in https://github.com/scylladb/seastar/pull/3217. This patch also fixes deprecation warnings from older Seastar HTTP APIs. Tests are in test/alternator/test_http_headers.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70 Closes scylladb/scylladb#28288	2026-04-19 09:22:04 +03:00
Nadav Har'El	0d05e3b4a4	alternator: fix ListStreams paging if table is deleted during paging Currently, ListStreams paging works by looking in the list of tables for ExclusiveStartStreamArn and starting there. But it's possible that during the paging process, one of the tables got deleted and ExclusiveStartStreamArn no longer points to an existing table. In the current implementation this caused the paging to stop (think it reached the end). The solution is simple: ListStreams will now sort the list of tables by name (it anyway needs to be sorted by something to be consistent across pages), and will look with std::upper_bound for the first table after the ExclusiveStartStreamArn - we don't need to find that table name itself. The patch also includes a test reproducing this bug. As usual, the test passes on DynamoDB, fails on Alternator before this patch, and passes with the patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:02 +03:00
Nadav Har'El	930fb4c330	test/alternator: test DescribeStream on non-existent table We already had a test for DescribeStream being called on a bogus ARN returns a ValidationException. But if the stream is more legitimate- looking but refers to a non-existent table (e.g., an ARN taken in the past from a table that no longer exists), we should return ResourceNotFoundException. In this patch we add a test that verifies we indeed do this correctly. Moreover, Alternator's current stream ARNs include both a keyspace name and a table name, and either one being incorrect should lead to ResourceNotFoundException, and indeed the new test validates that it works as expected - there is no bug here (AI guessed we have a bug in the missing keyspace case, but this guess was wrong).	2026-04-19 09:12:02 +03:00
Nadav Har'El	02d474fca8	alternator: ListStreams: on last page, avoid LastEvaluatedStreamArn When ListStreams is on its last page and ran out streams to list, it shouldn't return a paging cookie (LastEvaluatedStreamArn) at all. Before this patch it does, and forces the user to make another call just to get another empty page, which is silly. This patch includes a fix and a reproducer test (that, as usual, passes on DynamoDB and fails on Alternator before the patch and succeeds after). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:02 +03:00
Nadav Har'El	1ac910c2ab	alternator: fix ListStreams to return real ARN as LastEvaluatedStreamArn Alternator Streams' "ListStreams" does paging by returning a "cookie" LastEvaluatedStreamArn from one request, that the user passes to the next request as ExclusiveStartStreamArn. In the past, Alternator's stream ARNs were UUIDs, but we recently changed them to match DynamoDB's ARN format which the KCL library requires. However, we didn't change ListStream's cookie format, and it remained UUIDs. This, however, goes against the documentation of DynamoDB, which states that LastEvaluatedStreamArn should be "the stream ARN of the item where the operation stopped". It shouldn't be some weird opaque cookie. So in this patch we add a test that confirms that indeed, in DynamoDB the LastEvaluatedStreamARN is really the last returned ARN and not an opaque cookie. The new test passes on DynamoDB, and fails on Alternator before the simple fix that this patch then does. Fixes SCYLLADB-539.	2026-04-19 09:12:01 +03:00
Nadav Har'El	31e0315710	Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski Fix cdc writing unnecesary entries to it's log, like for example when Alternator deletes an item which in reality doesn't exist. Originally @wps0 tackled this issue. This patch is an extension of his work. His work involved adding `should_skip` function to cdc, which would process a `mutation` object and decide, wherever changes in the object should be added to cdc log or not. The issue with his approach is that `mutation` object might contain changes for more than one row. If - for example - the `mutation` object contains two changes, delete of non-existing row and create of non-existing row, `should_skip` function will detect changes in second item and allow whole `mutation` (BOTH items) to be added. For example (using python's boto3) running this on empty table: ``` with table.batch_writer() as batch: batch.put_item({'p': 'p', 'c': 'c0'}) batch.delete_item(Key={'p': 'p', 'c': 'c1'}) ``` will emit two events ("put" event and "delete" event), even though the item with `c` set to `c1` does not exist (thus can't be deleted). Note, that both entries in batch write must use the same partition key, otherwise upper layer with split them into separate `mutation` objects and the issue will not happen. The solution is to do similar processing, but consider each change separated from others. This is tricky to implement due to a way cdc works. When cdc processes `mutation` object (containing X changes), it emits cdc entries in phases. Phase 1 - emit `preimage` (old state) for each change (if requested). Phase 2 - for each change emit actual "diff" (update / delete and so on). Phase 3 - emit `postimage` (new state). We will know if change needs to be skipped during phase 2. By that time phase 1 is completed and preimage for the change is emited. At that moment we set a flag that the change (identified by clustering key value) needs to be skipped - we add a clustering key to a `ignore-rows` set (`_alternator_clustering_keys_to_ignore` variable) and continue normally. Once all phases finish we add a `postprocess` phase (`clean_up_noop_rows` function). It will go through generated cdc mutations and skip all modifications, for which clustering key is in `ignore-rows` set. After skipping we need to do a "cleanup" operation - each generated cdc mutation contain index (incremented by one), if we skipped some parts, the index is not consecutive anymore, so we reindex final changes. There's a special case worth mentioning - Alternator tables without clustering keys. At that point `mutation` object passed to cdc can contain exactly one change (since different partition keys are splitted by upper layers and Alternator will never emit `mutation` object containing two (or more) changes with the same primary key. Here, when we decide the change is to be skipped we add empty `bytes` object to `ignore-rows` set. When checking `ignore-rows` set, we check if it's empty or not (we don't check for presence of empty `bytes` object). Note: there might be some confusion between this patch and #28452 patch. Both started from the same error observation and use similar tests for validation, as both are easily triggered by BatchWrite commands (both needs `mutation` object passed to cdc to contain more than one single change). This issue tho is about wrong data written in cdc log and is fixed at cdc, where #28452 is about wrong way of parsing correct cdc data and is fixed at Alternator side of things. Note, that we need #28452 to truly verify (otherwise we will emit correct cdc entries, but Alternator will incorrectly parse them). Note: to benefit / notice this patch you need `alternator_streams_increased_compatibility` flag turned on. Note: rework is quite "broad" and covers a lot of ground - every operation, that might result in a no-change to the database state should be tested. An additional test was added - trying to remove a column from non-existing item, as well as trying to remove non-existing column from existing item. Fixes: #28368 Fixes: SCYLLADB-1528 Fixes: SCYLLADB-538 Closes scylladb/scylladb#28544 * github.com:scylladb/scylladb: alternator: remove unnecesary code alternator: fix Alternator writing unnecesary cdc entries alternator: add failing tests for Streams	2026-04-18 00:07:51 +03:00
Radosław Cybulski	9a6aed721b	alternator: add streams with tablets tests Add tests for Streams, when table uses tablets underneath. One test verifies filtering using CHILD_SHARDS feature. Other one makes sure we get read all data while the table undergoes tablet count change. Add `--tablet-load-stats-refresh-interval-in-seconds=1` to `alternator/run` script, as otherwise newly added tests will fail. The setting changes how often scylla refreshes tablet metadata. This can't be done using `scylla_config_temporary`, as 1) default is 60 seconds 2) scylla will wait full timeout (60s) to read configuration variable again.	2026-04-17 18:58:27 +02:00
Radosław Cybulski	6e5aaa85b6	alternator: fix Alternator writing unnecesary cdc entries Work in this patch is a result of two bugs - spurious MODIFY event, when remove column is used in `update_item` on non-existing item and spurious events, when batch write item mixed noop operations with operations involving actual changes (the former would still emit cdc log entries). The latter issue required rework of Piotr Wieczorek's algorithm, which fixed former issue as well. Piotr Wieczorek previously wrote checks, that should prevent unnecesary cdc events from being written. His implementation missed the fact, that a single `mutation` object passed to cdc code to be analysed for cdc log entries can contain modifications for multiple rows (with the same timestamp - for example as a result to BatchWriteItem call). His code tries to skip whole `mutation`, which in such case is not possible, because BatchWriteItem might have one item that does nothing and second item that does modification (this is the reason for the second bug). His algorithm was extended and moved. Originally it was working as follows - user would sent a `mutation` object with some changes to be "augmented". The cdc would process those changes and built a set of cdc log changes based on them, that would be added to cdc log table. Piotr added a `should_skip` function, which processes user changes and tried to determine if they all should be dropped or not. New version, instead of trying to skip adding rows to cdc log `mutation` object, builds a rows-to-ignore set. After whole cdc log `mutation` object is completed, it processes it and go through it row by row. Any row that was previously added to a `rows_to_ignore` set will now be removed. Remaining rows are written to new cdc log `mutation` with new clustering key (`cdc$batch_seq_no` index value should probably be consecutive - we just want to be safe here) and returns new `mutation` object to be sent to cdc log table. The first bug is fixed as a side effect of new algorithm, which contains more precise checks detecting, if given mutation actually made a difference. Fixes: #28368 Fixes: SCYLLADB-538 Fixes: SCYLLADB-1528 Refs: #28452	2026-04-17 18:00:25 +02:00
Radosław Cybulski	2894542e57	alternator: add failing tests for Streams Add failing tests for Streams functionality. Trying to remove column from non-existing item is producing a MODIFY event (while it should none). Doing batch write with operations working on the same partition, where one operation is without side effects and second with will produce events for both operations, even though first changes nothing. First test has two versions - with and without clustering key. Second has only with clustering key, as we can't produce batch write with two items for the same partition - batch write can't use primary key more than once in single call. We also add a test for batch write, where one of three operations has no observable side effects and should not show up in Streams output, but in current scylla's version it does show.	2026-04-17 16:28:14 +02:00
Radosław Cybulski	c5ed6b22ae	alternator: add CHILD_SHARDS filtering Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user need to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only list of stream shards, that are direct descendants of passed parent stream shard. Each stream shard cover a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is a Amazon's functionality and is required by KCL. Add unit tests. Fixes: #25160 Closes scylladb/scylladb#28189	2026-04-16 18:27:55 +03:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Radosław Cybulski	1dc20cc8f9	alternator/test: explain why 'always' write isolation mode is used in tests Improve test comments for test_streams_batchwrite_into_the_same_partition_deletes_existing_items and test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data to explain why 'always' write isolation mode is required: in always_use_lwt mode all items in a batch get the same CDC timestamp, which triggers the squashing bug. In other modes each item gets a separate timestamp so the bug doesn't manifest. Also fix the example in the second test comment to use cleaner key values and correct event type (INSERT, not MODIFY, since items are inserted into an empty table), and fix the issue reference from #28452 (the PR) to #28439 (the issue).	2026-03-25 15:15:20 +01:00
Radosław Cybulski	ded62b2c5e	alternator/test: add scylla_only to always write isolation fixture Add scylla_only fixture dependency to the test_table_ss_new_and_old_images_write_isolation_always fixture. This ensures all tests using the 'always' write isolation mode are skipped when running against DynamoDB (--aws), since the system:write_isolation tag is a Scylla-only feature.	2026-03-25 12:38:09 +01:00
Radosław Cybulski	7d404cdd51	alternator: fix BatchWriteItem squashed Streams entries BatchWriteItem with items for the same partition (and write isolation set to always) will trigger LWT and run different cdc code path, which will result in wrong Streams data being returned to the user - changes will be randomly squashed together. For example batch write: batch.put_item(Item={'p': 'p', 'c': 'c0'}) batch.put_item(Item={'p': 'p', 'c': 'c1'}) batch.put_item(Item={'p': 'p', 'c': 'c2'}) instead of producing 3 modify / insert events will produce one: type=INSERT, key={'c': {'S': 'c0'}, 'p': {'S': 'p'}}, old_image=None, new_image={'c': {'S': 'c2'}, 'p': {'S': 'p'}} with `new_image` having different `c` key from `key` field. This happens because BatchWriteItem (when using LWT) emits it's changes to cdc under the same timestamp. This results in in all log entries being put in single cdc "bucket" (under the same cdc$timestamp key). Previous parsing algorithm would interpret those changes as a change to a single item and squash them together. The patch rewrites algorithm to use `std::unordered_map` for records based on value of clustering key, that is added to every cdc log entry. This allows rebuilding all item modifications. Fixes #28439 Fixes: SCYLLADB-540	2026-03-25 11:40:53 +01:00
Radosław Cybulski	85da03c88d	alternator: add BatchWriteItem test (failing) Add additional BatchWriteItem tests (some failing): - `test_streams_batchwrite_no_clustering_deletes_non_existing_items` `test_streams_batchwrite_no_clustering_deletes_existing_items` - those tests pass, we add it here for completness, as non clustering tables trigger different paths. - `test_streams_batchwrite_into_the_same_partition_deletes_existing_items` - failing test, that checks combinations of puts and deletes in a single batch write (so for example 3 items, 2 puts and 1 delete). - `test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data` - failing simple test. Tests fail, because current implementation, when writing cdc log entries will squash all changes done to the same partition together. The data is still there, but when GetRecords is called and we parse cdc log entries, we don't correctly recover it (see issue #28439 for more details).	2026-03-25 11:40:53 +01:00
Nadav Har'El	92ee959e9b	test/alternator: speed up test_streams.py by using module-scope fixtures Previously, all stream-table fixtures in this test file used scope="function", forcing a fresh table to be created for every test, slowing down the test a bit (though not much), and discouraging writing small new tests. This was a workaround for a DynamoDB quirk (that Alternator doesn't have): LATEST shard iterators have a time slack and may point slightly before the true stream head, causing leftover events from a previous test to appear in the next test's reads. We fix this by draining the stream inside latest_iterators() and shards_and_latest_iterators() after obtaining the LATEST iterators: fetch records in a loop until two consecutive polling rounds both return empty, guaranteeing the iterators are positioned past all pre-existing events before the caller writes anything. With this guarantee in place, all stream-table fixtures can safely use scope="module". After this patch, test_streams.py continues to pass on DynamoDB. On Alternator, the test file's run time went down a bit, from 20.2 seconds to 17.7 seconds. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-10 17:14:04 +02:00
Nadav Har'El	6ac1f1333f	test/alternator: test_streams.py don't use fixtures in 4 tests In the next patch, we plan to make the fixtures in test_streams.py shared between tests. Most tests work well with shared tables, but two (test_streams_trim_horizon and test_streams_starting_sequence_number) were written to expect a new table with an empty history, and two other (test_streams_closed_read and test_streams_disabled_stream) want to disable streaming and would break a shared table. So this patch we modify these four tests to create their own new table instead of using a fixture. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-10 17:12:33 +02:00
Nadav Har'El	16e7a88a02	test/alternator: fix do_test() in test_streams.py Many tests in test/alternator/test_streams.py use a do_test() function which performs a user-defined function that runs some write requests, and then verifies that the expected output appears on the stream. Because DynamoDB drops do-nothing changes from the stream - such as writing to an item a value that it already has - these tests need to write to a different item each time, so do_test() invents a random key and passes it to the user-defined function to use. But... we had a bug, the random number generation was done only once, instead of every time. The fix is to do the random number generation on every call. We never noticed this bug when each test used a brand new table. But the next patch will make the tests share the test table, and tests start to fail. It's especially visible if you run the same test twice against DynamoDB, e.g., test/alternator/run --count 2 --aws \ test_streams.py::test_streams_putitem_keys_only Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-09 19:21:53 +02:00
Marcin Maliszkiewicz	6eca74b7bb	Merge 'More Alternator tests for BatchWriteItem' from Nadav Har'El The goal of this small pull request is to reproduce issue #28439, which found a bug in the Alternator Streams output when BatchWriteItem is called to write multiple items in the same partition, and always_use_lwt write isolation mode is used. * The first patch reproduces this specific bug in Alternator Streams. * The second patch adds missing (Fixes #28171) tests for BatchWriteItem in different write modes, and shows that BatchWriteItem itself works correctly - the bug is just in Alternator Streams' reporting of this write. Closes scylladb/scylladb#28528 * github.com:scylladb/scylladb: test/alternator: add test for BatchWriteItem with different write isolations test/alternator: reproducer for Alternator Streams bug	2026-02-05 10:07:29 +01:00
Nadav Har'El	c63f43975f	test/alternator: reproducer for Alternator Streams bug This patch adds a reproducer for an Alternator Streams bug described in issue #28439, where the stream returns the wrong events (and fewer of them) in the following specific combination of the following circumstances: 1. A BatchWriteItem operation writing multiple items to the same partition. 2. The "always_use_lwt" write isolation mode is used. (the bug doesn't occur in other write isolation modes). We didn't catch this bug earlier because the Alternator Streams test we had for BatchWriteItem had multiple items in multiple partitions, and we missed the multiple-items-in-one-partition case. Moreover, today we run all the tests in only_rmw_uses_lwt mode (in the past, we did use always_use_lwt, but changed recently in commit `e7257b1393` following commit `76a766c` that changed test.py). As issue #28439 explains, the underlying cause of the bug is that the always_use_lwt causes the multiple items to be written with the same timestamp, which confused the Alternator Streams code reading the CDC log. The bug is not in BatchWriteItem itself, or in ScyllaDB CDC, but just in the Alternator Streams layer. The test in this patch is parameterized to run on each of the four write isolation modes, and currently fails (and so marked xfail) just for the one mode 'always_use_lwt'. The test is scylla_only, as its purpose is to checks the different write isolation mode - which don't exist in AWS DynamoDB. Refs #28439 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-04 09:17:48 +02:00
Radosław Cybulski	03ff091bee	alternator: improve events output when test failed Improve events printing, when test in test_streams.py failed. New code will print both expected and received events (keys, previous image, new image and type). New code will explicitly mark, at which output event comparison failed. Fixes #28455 Closes scylladb/scylladb#28476	2026-02-03 21:55:07 +02:00
Nadav Har'El	5c2ca56adf	test/alternator: fix test passing a spurious parameter The test test_streams.py::test_streams_putitem_new_item_overrides_old_lsi failed on DynamoDB (Refs #26079) because we passed an unused parameter NonKeyAttributes to the Projection setting an LSI. NonKeyAttributes is only allowed when ProjectionType=INCLUDE, but we used ProjectionType=ALL. DynamoDB refuses to create an LSI with such inconsistent parameters, and we just need to remove this unnecessary parameter from this test. The reason why this test didn't fail on Alternator is that Alternator doesn't yet support or even parse the Projection parameter (Refs #5036). We also add an xfailing test (passes on DynamoDB, fails on Alternator) checking that a spurious NonKeyAttributes parameter is rejected. When we get around to implement the projection feature (#5036), this will be yet another acceptance test for this feature. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-01-05 13:51:01 +02:00
Nadav Har'El	84df5cfaf8	test/alternator: delete unnecessary "pass" Fixing something that never bothered anyone but our automated "code quality" tool: there's an unnecessary call to "pass" in one of our tests. Just remove it. Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> Closes scylladb/scylladb#27645	2025-12-16 19:29:23 +03:00
Pavel Emelyanov	31f90c089c	Merge 'test/alternator: remove unused variable assignments and statements' from Nadav Har'El Copilot found in test/alternator a bunch of places where we unnecessarily assign a variable that we don't use, or had a duplicated statement which doesn't do anything. This patch fixes all of them. AI still doesn't know how to prepare a patch that looks anything close to reasonable, so I did this part manually, and also carefully investigated each and every change (this took a lot of human time). These patches don't change anything in the functionality of any of the tests. It's all cosmetic. Closes scylladb/scylladb#27655 * github.com:scylladb/scylladb: test/alternator: remove unnecessary duplicate statement test/alternator: remove unused variable assignments	2025-12-16 19:23:34 +03:00
Nadav Har'El	a3959fe3db	test/alternator: remove unused variable assignments copilot noticed in that in in many of Alternator tests, we have some unnecessary assignments. For example, in a few places, we use the idiom: with pytest.raises(...): ret = ... The "ret=" part is unnecessary, as this test expects the statement to fail (hence the raises()), and ret is never assigned. The assignment was only there because we copied this statement from another place in the test, which does expect the statement to pass and wants to validate the returned value. So we should just drop the "ret=" from these tests. Another common occurance is that we used the idiom response = table.do_something() Without checking the response and no intention to check it (either we know it will work, or we just want to check it doesn't throw). So we can drop the "response=" here too. All of the unused variables in this patch were discovered by Copilot, but I reviewed each of them carefully myself and prepared this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-12-15 18:07:05 +02:00
Nadav Har'El	db15c212a6	test/alternator: reproducer for issue 27375 This patch adds a reproducer for issue #27375, where even with alternator_streams_increased_compatibility set to true, if an attribute is set to the same value it had but using a different JSON representation - a Alternator Streams event is unduly produced. For example, if a map {'dog': 1, 'cat': 2} is changed to {'cat': 2, 'dog': 1}, this non-change should not be reported. The new test added in this patch passes on DynamoDB (an event is not generated) but fails on Alternator (an event is generated), so the new test is marked with xfail. Refs #27375. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-12-11 11:34:19 +02:00
Piotr Dulikowski	44c605e59c	Merge 'Fix the types of change events in Alternator Streams' from Piotr Wieczorek This patch increases the compatibility with DynamoDB Streams by integrating the DynamoDB's event type rules (described in https://github.com/scylladb/scylladb/issues/6918) into Alternator. The main changes are: - introduce a new flag `alternator_streams_strict_compatibility`, meant as a guard of performance-intensive operations that increase the compatibility with DynamoDB Streams. If enabled, Alternator always performs a RBW before a data-modifying operation, and propagates its result to CDC. Then, the old item is compared to the new one, to determine the mutation type (INSERT vs MODIFY). This option is a no-op for tables with disabled Alternator Streams, - reduce splitting of simple Alternator mutations, - correctly distinguish event types described in #6918, except for item deletes. Deleting a missing item with DeleteItem, BatchWriteItem, or a missing field with UpdateItem still emit REMOVEs. To summarize, the emitted events of the data manipulation operations should be as follows: - DeleteItem/BatchWriteItem.DeleteItem of existing item: REMOVE (OK) - DeleteItem of nonexistent item: nothing (OK) - BatchWriteItem.DeleteItem of nonexistent item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and not equal item: MODIFY (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of existing and equal item: nothing (OK) - PutItem/UpdateItem/BatchWriteItem.PutItem of nonexistent item: INSERT (OK) No backport is necessary. Refs https://github.com/scylladb/scylladb/pull/26149 Refs https://github.com/scylladb/scylladb/pull/26396 Refs https://github.com/scylladb/scylladb/issues/26382 Fixes https://github.com/scylladb/scylladb/issues/6918 Closes scylladb/scylladb#26121 * github.com:scylladb/scylladb: test/alternator: Enable the tests failing because of #6918 alternator, cdc: Don't emit events for no-op removes alternator, cdc: Don't emit an event for equal items alternator/streams, cdc: Differentiate item replace and item update in CDC alternator: Change the return type of rmw_operation_return config: Add alternator_streams_strict_compatibility flag cdc: Don't split a row marker away from row cells	2025-11-30 07:20:22 +01:00
Piotr Szymaniak	63897370cb	alternator: Fix tag name to request vnodes The tag was lately renamed from `experimental:initial_tablets` to `system::initial_tablets`. This commit fixes both the tests as well as the exceptions sent to the user instructing how to create table with vnodes.	2025-11-09 12:52:29 +02:00
Piotr Wieczorek	0398bc0056	test/alternator: Enable the tests failing because of #6918 The tests pass only with alternator_streams_strict_compatibility flag enabled, because of a suspected non-negligible performance impact (i.e. an additional entire-item comparison and type conversions). Refs https://github.com/scylladb/scylladb/issues/6918	2025-10-30 08:38:31 +01:00
Piotr Wieczorek	2812e67f47	cdc: Emit a preimage for non-clustered tables Until this patch, CDC haven't fetched a preimage for mutations containing only a partition tombstone. Therefore, single-row deletions in a table witout a clustering key didn't include a preimage, which was inconsistent with single-row clustered deletions. This commit addresses this inconsistency. Second reason is compatibility with DynamoDB Streams, which doesn't support entire-partition deletes. Alternator uses partition tombstones for single-row deletions, though, and in these cases the 'OldImage' was missing from REMOVE records. Fixes https://github.com/scylladb/scylladb/issues/26382 Closes scylladb/scylladb#26578	2025-10-29 17:54:58 +02:00
Piotr Wieczorek	15c399ed40	test/alternator: Add more Streams tests for UpdateItem and BatchWriteItem This commit adds tests to `test_streams.py` (i.e. Alternator Streams) checking the following cases: * putting an item with BatchWriteItem shouldn't emit a log if the old item and the new item are identical, * deleting an item with BatchWriteItem shouldn't emit a log if the item doesn't exist, * UpdateItem shouldn't emit a log if the old item and the new item are identical. These cases haven't been tested until this commit. Refs https://github.com/scylladb/scylladb/issues/6918 Closes scylladb/scylladb#26396	2025-10-16 09:34:12 +03:00
Nadav Har'El	06108ea020	test/alternator: a small cleanup for a test in test_streams.py This patch makes three small mostly-cosmetic improvements to a test in test/alternator/test_streams.py: 1. The test is renamed "test_streams_deleteitem_old_image_no_ck" to emphasize its focus on the combination of deleteitem, old image, and no ck. The "putitem" we had in the name was not relevant, and the "old_image" was missing and important. 2. Moreover, using PutItem in this test just to set up the test scenario mixed the bug which the test tries to reproduced with a different only-recently-fixed bug (that PutItem also generated a spurious "REMOVE" event). So I changed the use of PutItem by using UpdateItem, to make this test indepedent of the other bug. Test independence is important because it allows us - if we want - to backport a fix for just one bug independently of the fix to the other bug. 3. Also improved the comment in front of the test to mention where we already tested the with-ck case, and also to mention issue 26382 which this test reproduces (the xfail line also mentions it, but the xfail line will be removed when the bug is fixed - but the mention in the comment will remain - and should remain. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#26526	2025-10-13 17:42:31 +03:00
Piotr Wieczorek	8cd9f5d271	test/alternator: Add a Streams test reproducing #26382 This commit adds a test that reproduces an issue, wherein OldImage isn't included in the REMOVE events produced by Alternator Streams. Refs https://github.com/scylladb/scylladb/issues/26382 Closes scylladb/scylladb#26383	2025-10-12 11:09:57 +03:00
Piotr Wieczorek	4be0bdbc07	alternator: Don't emit a redundant REMOVE event in Alternator Streams for PutItem calls Until now, every PutItem operation appeared in the Alternator Streams as two events - a REMOVE and a MODIFY. DynamoDB Streams emits only INSERT or MODIFY, depending on whether a row was replaced, or created anew. A related issue scylladb#6918 concerns distinguishing the mutation type properly. This was because each call to PutItem emitted the two CDC rows, returned by GetRecords. Since this patch, we use a collection tombstone for the `:attrs` column, and a separate tombstone for each regular column in the table's schema. We don't expect that new tables would have any other regular column, except for the `:attrs` and keys, but we may encounter them in in upgraded tables which had old GSIs or LSIs. Fixes: scylladb#6930. Closes scylladb/scylladb#24991	2025-09-30 13:12:16 +03:00
Michael Litvak	65351fda29	alternator: update references to alternator streams issue update all the references about the issue of tablets support for alternator streams to issue #23838 instead of #16317. The issue #16317 is about support of CDC with tablets, but it is now closed and it didn't address alternator streams. the remaining issues about alternator streams should be addressed as part of #23838, so fix the references in order for them not to be missed.	2025-09-22 09:56:23 +02:00
Piotr Wieczorek	5add43e15c	alternator: streams: Address minor incompatibilities with DynamoDB in GetRecords response. This commit adds missing fields to GetRecords responses: `awsRegion` and `eventVersion`. We also considered changing `eventSource` from `scylladb:alternator` to `aws:dynamodb` and setting `SizeBytes` subfield inside the `dynamodb` field. We set `awsRegion` to the datacenter's name of the node that received the request. This is in line with the AWS documentation, except that Scylla has no direct equivalent of a region, so we use the datacenter's name, which is analogous to DynamoDB's concept of region. The field `eventVersion` determines the structure of a Record. It is updated whenever the structure changes. We think that adding a field `userIdentity` bumped the version from `1.0` to `1.1`. Currently, Scylla doesn't support this field (#11523), hence we use the older 1.0 version. We have decided to leave `eventSource` as is, since it's easy to modify it in case of problems to `aws:dynamodb` used by DynamoDB. Not setting `SizeBytes` subfield inside the `dynamodb` field was dictated by the lack of apparent use cases. The documentation is unclear about how `SizeBytes` is calculated and after experimenting a little bit, I haven't found an obvious pattern. Fixes: #6931 Closes scylladb/scylladb#24903	2025-08-31 14:55:47 +03:00
Nadav Har'El	3ed8e269f9	alternator: don't crash when adding Streams to long table name Currently, in Alternator it is possible to create a table whose name has 222 characters, and then trying to add Streams to that table results in an attempt to create a CDC log table with the same name plus a 15-character suffix "_scylla_cdc_log", which resulted (Ref #24598) in an IO-error and a Scylla shutdown. This patch adds code to the Stream-adding operations (both CreateTable and UpdateTable) that validates that the table's name, plus that 15 character suffix, doesn't exceed max_auxiliary_table_name_length, i.e., 222. After this patch, if you have a table whose name is between 207 and 222 characters, attempting to enable Streams on it will fail with: "Streams cannot be added if the table name is longer than 207 characters." Note that in the future, if we lower max_table_name_length to below 207, e.g., to 192, then it will always be possible to add a stream to any legal table, and the new checks we had here will be mostly redundant. But only "mostly" - not entirely: Checking in UpdateTable is still important because of the possibility that an upgrading user might have a pre-existing table whose name is longer than the new limit, and might try to enable Streams. After this patch, the crash reported in #24598 can no longer happen, so in this sense the bug is solved. However, we still want to lower max_table_name_length from 222 to 192, so that it will always be possible to enable streams on any table with a legal name length. We'll do this in the next patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-07-07 11:58:13 +03:00
Nadav Har'El	50d370f06e	test/alternator: reproducer for streams bug with long table name The two tests in this patch reproduce issue #24598: When enabling Alternator streams on an Alternator table with a very long name, such as the maximum allowed name length 222, the result is an I/O error and a Scylla shutdown. The two tests are currently marked "skip", otherwise they would crash the Scylla being tested. Refs #24598 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-06-29 11:40:55 +03:00
Nadav Har'El	84fd52315f	alternator: in GetRecords, enforce Limit to be <= 1000 Alternator Streams' "GetRecords" operation has a "Limit" parameter on how many records to return. The DynamoDB documentations says that the upper limit on this Limit parameter is 1000 - but Alternator didn't enforce this. In this patch we begin enforcing this highest Limit, and also add a test for verifying this enforcement. As usual, the new test passes on DynamoDB, and after this patch - also on Alternator. The reason why it's useful to have some upper limit on Limit is that the existing executor::get_records() implementation does not really have preemption points in all the necessary places. In particular, we have a loop on all returned records without preemption points. We also store the returned records in a RapidJson vector, which requires a contiguous allocation. Even before this patch, GetRecords had a hard limit of 1 MB of results. But still, in some cases 1 MB of results may be a lot of results, and we can see stalls in the aforementioned places being O(number of results). Fixes #23534 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#23547	2025-04-07 12:52:03 +03:00
Nadav Har'El	c0821842de	alternator: document the state of tablet support in Alternator In commit `c24bc3b` we decided that creating a new table in Alternator will by default use vnodes - not tablets - because of all the missing features in our tablets implementation that are important for Alternator, namely - LWT, CDC and Alternator TTL. We never documented this, or the fact that we support a tag `experimental:initial_tablets` which allows to override this decision and create an Alternator table using tablets. We also never documented what exactly doesn't work when Alternator uses tablet. This patch adds the missing documentation in docs/alternator/new-apis.md (which is a good place for describing the `experimental:initial_tablets` tag). The patch also adds a new test file, test_tablets.py, which includes tests for all the statements made in the document regarding how `experimental:initial_tablets` works and what works or doesn't work when tablets are enabled. Two existing tests - for TTL and Streams non-support with tablets - are moved to the new test file. When the tablets feature will finally be completed, both the document and the tests will need to be modified (some of the tests should be outright deleted). But it seems this will not happen for at least several months, and that is too long to wait without accurate documentation. Fixes #21629 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#22462	2025-03-14 14:03:15 +03:00
Piotr Szymaniak	c1f186c98a	alternator: re-enabling/changing existing stream's StreamViewType as well as disabling the nonexistent stream Table updates that try to enable stream (while changing or not the StreamViewType) on a table that already has the stream enabled will result in ValidationError. Table updates that try to disable stream on a table that does not have the stream enabled will result in ValidationError. Add two tests to verify the above. Mark the test for changing the existing stream's StreamViewType not to xfail. Fixes scylladb/scylladb#6939 Closes scylladb/scylladb#22827	2025-02-16 09:57:49 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Nadav Har'El	6391550bbc	test/alternator: add another check to test_stream_list_tables The test test_streams.py::test_stream_list_tables reproduces a bug where enabling streams added a spurious result to ListTables. A reviewer of that patch asked to also add a check that name of the table itself doesn't disappear from ListTables when a stream is enabled, so this is what this patch adds. This theoretical scenario (a table's name disappearing from ListTables) never happened, so the new check doesn't reproduce any known bug, but I guess it never hurts to make the test stronger for regression testing. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#19934	2024-08-29 08:45:22 +03:00

1 2

94 Commits