scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 19:10:42 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	741969cf4c	test: boost: add auth cache tests The cache is covered already with general auth dtests but some cases are more tricky and easier to express directly as calls to cache class. For such tests boost test file was added.	2026-02-17 18:18:40 +01:00
Marcin Maliszkiewicz	c11eb73a59	auth: add cache size metrics	2026-02-17 18:18:40 +01:00
Marcin Maliszkiewicz	a059798de9	docs: conf: update permissions cache documentation	2026-02-17 18:18:40 +01:00
Marcin Maliszkiewicz	a23e503e7b	auth: remove old permissions cache	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	9d9184e5b7	auth: use unified cache for permissions	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	7eedf50c12	auth: ldap: add permissions reload to unified cache The LDAP server may change role-chain assignments without notifying Scylla. As a result, effective permissions can change, so some form of polling is required. Currently, this is handled via cache expiration. However, the unified cache is designed to be consistent and does not support expiration. To provide an equivalent mechanism for LDAP, we will periodically reload the permissions portion of the new cache at intervals matching the previously configured expiration time.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	10996bd0fb	auth: add permissions cache to auth/cache We want to get rid of loading cache because its periodic refresh logic generates a lot of internal load when there is many entries. Also our operation procedures involve tweaking the config while new unified cache is supposed to work out of the box.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	03c4e4bb10	auth: add service::revoke_all as main entry point In the following commit we'll need to add some cache related logic (removing resource permissions). This logic doesn't depend on authorizer so it should be managed by the service itself.	2026-02-17 17:56:27 +01:00
Marcin Maliszkiewicz	070d0bfc4c	auth: explicitly life-extend resource in auth_migration_listener Otherwise it's easy to trigger use-after-free when code slightly changes.	2026-02-17 17:56:27 +01:00
Łukasz Paszkowski	f45465b9f6	test_out_of_space_prevention.py: Lower the critical disk utilization threshold After PR https://github.com/scylladb/scylladb/pull/28396 reduced the test volumes to 20MiB to speed up test_out_of_space_prevention.py, keeping the original 0.8 critical disk utilization threshold can make the tests flaky: transient disk usage (e.g. commitlog segment churn) can push the node into ENOSPC during the run. These tests do not write much data, so reduce the critical disk utilization threshold to 0.5. With 20MiB volumes this leaves ~10MiB of headroom for temporary growth during the test. Fixes: https://github.com/scylladb/scylladb/issues/28463 Closes scylladb/scylladb#28593	2026-02-16 15:10:18 +02:00
Andrei Chekun	e26cf0b2d6	test/cluster: fix two flaky tests test_maintenance_socket with new way of running is flaky. Looks like the driver tries to reconnect with an old maintenance socket from previous driver and fails. This PR adds white list for connection that stabilize the test test_no_removed_node_event_on_ip_change was flaky on CI, while the issue never reproduced locally. The assumption that under load we have race condition and trying to check the logs before message is arrived. Small for loop to retry added to avoid such situation. Closes scylladb/scylladb#28635	2026-02-16 14:50:54 +02:00
Patryk Jędrzejczak	0693091aff	test: test_restart_leaving_replica_during_cleanup: reconnect driver after restart The test can currently fail like this: ``` > await cql.run_async(f"ALTER TABLE {ks}.test WITH tablets = {{'min_tablet_count': 1}}") E cassandra.cluster.NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 127.158.27.9:9042 datacenter1>: <Error from server: code=0000 [Server error] message="Failed to apply group 0 change due to concurrent modification">}) ``` The following happens: - node A is restarted and becomes the group0 leader, - the driver sends the ALTER TABLE request to node B, - the request hits group 0 concurrent modification error 10 times and fails because node A performs tablet migrations at the the same time. What is unexpected is that even though the driver session uses the default retry policy, the driver doesn't retry the request on node A. The request is guaranteed to succeed on node A because it's the only node adding group0 entries. The driver doesn't retry the request on node A because of a missing `wait_for_cql_and_get_hosts` call. We add it in this commit. We also reconnect the driver just in case to prevent hitting scylladb/python-driver#295. Moreover, we can revert the workaround from `4c9efc08d8`, as the fix from this commit also prevents DROP KEYSPACE failures. The commit has been tested in byo with `_concurrent_ddl_retries{0}` to verify that node A really can't hit group 0 concurrent modification error and always receives the ALTER TABLE request from the driver. All 300 runs in each build mode passed. Fixes #25938 Closes scylladb/scylladb#28632	2026-02-16 12:56:18 +01:00
Marcin Maliszkiewicz	6a4aef28ae	Merge 'test: explicitly set compression algorithm in test_autoretrain_dict' from Andrzej Jackowski When `test_autoretrain_dict` was originally written, the default `sstable_compression_user_table_options` was `LZ4Compressor`. The test assumed (correctly) that initially the compression doesn't use a trained dictionary, and later in the test scenario, it changed the algorithm to one with a dictionary. However, the default `sstable_compression_user_table_options` is now `LZ4WithDictsCompressor`, so the old assumption is no longer correct. As a result, the assertion that data is initially not compressed well may or may not fail depending on dictionary training timing. To fix this, this commit explicitly sets `ZstdCompressor` as the initial `sstable_compression_user_table_options`, ensuring that the assumption that initial compression is without a dictionary is always met. Note: `ZstdCompressor` differs from the former default `LZ4Compressor`. However, it's a better choice — the test aims to show the benefit of using a dictionary, not the benefit of Zstd over LZ4 (and the test uses ZstdWithDictsCompressor as the algorithm with the dictionary). Fixes: https://github.com/scylladb/scylladb/issues/28204 Backport: 2025.4, as test already failed there (and also backport to 2026.1 to make everything consistent). Closes scylladb/scylladb#28625 * github.com:scylladb/scylladb: test: explicitly set compression algorithm in test_autoretrain_dict test: remove unneeded semicolons from python test	2026-02-16 11:38:24 +01:00
Ernest Zaslavsky	034c6fbd87	s3_client: limit multipart upload concurrency Prevent launching hundreds or thousands of fibers during multipart uploads by capping concurrent part submissions to 16. Closes scylladb/scylladb#28554	2026-02-16 13:32:58 +03:00
Botond Dénes	9f57d6285b	Merge 'test: improve error reporting and retries in get_scylla_2025_1_executable' from Marcin Maliszkiewicz Harden get_scylla_2025_1_executable() by improving error reporting when subprocesses fail, increasing curl's retry count for more resilient downloads, and enabling --retry-all-errors to retry on all failures. Fixes https://github.com/scylladb/scylladb/issues/27745 Backport: no, it's not a bug fix Closes scylladb/scylladb#28628 * github.com:scylladb/scylladb: test: pylib: retry on all errors in get_scylla_2025_1_executable curl's call test: pylib: increase curl's number of retries when downloading scylla test: pylib: improve error reporting in get_scylla_2025_1_executable	2026-02-16 10:09:17 +02:00
Andrei Chekun	8c5c1096c2	test: ensure that that table used it cqlpy/test_tools have at least 3 pk One of the tests check that amount of the PK should be more than 2, but the method that creates it can return table with less keys. This leads to flakiness and to avoid it, this PR ensures that table will have at least 3 PK Closes scylladb/scylladb#28636	2026-02-16 09:50:58 +02:00
Anna Mikhlin	33cf97d688	.github/workflows: ignore quoted comments for trigger CI prevent CI from being triggered when trigger-ci command appears inside quoted (>) comment text Fixes: https://scylladb.atlassian.net/browse/RELENG-271 Closes scylladb/scylladb#28604	2026-02-16 09:33:16 +02:00
Andrei Chekun	e144d5b0bb	test.py: fix JUnit double test case records Move the hook for overwriting the XML reporter to be the first, to avoid double records. Closes scylladb/scylladb#28627	2026-02-15 19:02:24 +02:00
Jenkins Promoter	69249671a7	Update pgo profiles - aarch64	2026-02-15 05:22:17 +02:00
Jenkins Promoter	27aaafb8aa	Update pgo profiles - x86_64	2026-02-15 04:26:36 +02:00
Piotr Dulikowski	9c1e310b0d	Merge 'vector_search: Fix flaky vector_store_client_https_rewrite_ca_cert' from Karol Nowacki Most likely, the root cause of the flaky test was that the TLS handshake hung for an extended period (60s). This caused the test case to fail because the ANN request duration exceeded the test case timeout. The PR introduces two changes: * Mitigation of the hanging TLS handshake: This issue likely occurred because the test performed certificate rewrites simultaneously with ANN requests that utilize those certificates. * Production code fix: This addresses a bug where the TLS handshake itself was not covered by the connection timeout. Since tls::connect does not perform the handshake immediately, the handshake only occurs during the first write operation, potentially bypassing connect timeout. Fixes: #28012 Backport to 2026.01 and 2025.04 is needed, as these branches are also affected and may experience CI flakiness due to this test. Closes scylladb/scylladb#28617 * github.com:scylladb/scylladb: vector_search: Fix missing timeout on TLS handshake vector_search: test: Fix flaky cert rewrite test	2026-02-13 19:03:50 +01:00
Marcin Maliszkiewicz	1b0a68d1de	test: pylib: retry on all errors in get_scylla_2025_1_executable curl's call It's difficult to say if our download backend would always return transient error correctly so that the curl could retry. Instead it's more robust to always retry on error.	2026-02-12 16:18:52 +01:00
Marcin Maliszkiewicz	8ca834d4a4	test: pylib: increase curl's number of retries when downloading scylla By default curl does exponential backoff, and we want to keep that but there is time cap of 10 minutes, so with 40 retries we'd wait long time, instead we set the cap to 60 seconds. Total waiting time (excluding receiving request time): before - 17m after - 35m	2026-02-12 16:18:52 +01:00
Marcin Maliszkiewicz	70366168aa	test: pylib: improve error reporting in get_scylla_2025_1_executable Curl or other tools this function calls will now log error in the place they fail instead of doing plain assert.	2026-02-12 16:18:52 +01:00
Andrzej Jackowski	9ffa62a986	test: explicitly set compression algorithm in test_autoretrain_dict When `test_autoretrain_dict` was originally written, the default `sstable_compression_user_table_options` was `LZ4Compressor`. The test assumed (correctly) that initially the compression doesn't use a trained dictionary, and later in the test scenario, it changed the algorithm to one with a dictionary. However, the default `sstable_compression_user_table_options` is now `LZ4WithDictsCompressor`, so the old assumption is no longer correct. As a result, the assertion that data is initially not compressed well may or may not fail depending on dictionary training timing. To fix this, this commit explicitly sets `ZstdCompressor` as the initial `sstable_compression_user_table_options`, ensuring that the assumption that initial compression is without a dictionary is always met. Note: `ZstdCompressor` differs from the former default `LZ4Compressor`. However, it's a better choice — the test aims to show the benefit of using a dictionary, not the benefit of Zstd over LZ4 (and the test uses ZstdWithDictsCompressor as the algorithm with the dictionary). Fixes: scylladb/scylladb#28204	2026-02-12 14:58:39 +01:00
Andrzej Jackowski	e63cfc38b3	test: remove unneeded semicolons from python test	2026-02-12 14:49:17 +01:00
Aleksandra Martyniuk	f955a90309	test: fix test_remove_node_violating_rf_rack_with_rack_list test_remove_node_violating_rf_rack_with_rack_list creates a cluster with four nodes. One of the nodes is excluded, then another one is stopped, excluded, and removed. If the two stopped nodes were both voters, the majority is lost and the cluster loses its raft leader. As a result, the node cannot be removed and the operation times out. Add the 5th node to the cluster. This way the majority is always up. Fixes: https://github.com/scylladb/scylladb/issues/28596. Closes scylladb/scylladb#28610	2026-02-12 12:58:48 +02:00
Ferenc Szili	4ca40929ef	test: add read barrier to test_balance_empty_tablets The test creates a single node cluster, then creates 3 tables which remain empty. Then it adds another node with half the disk capacity of the first one, and then it waits for the balancer to migrate tablets to the newly added node by calling the quiesce topology API. The number of tablets on the smaller node should be exactly half the number of tablets on the larger node. After waiting for quiesce topology, we could have a situation where we query the number of tablets from the node which still hasn't processed the last tablet migrations and updated system.tablets. This patch adds a read barrier so that both nodes see the same tablets metadata before we query the number of tablets. Fixes: SCYLLADB-603 Closes scylladb/scylladb#28598	2026-02-12 11:16:34 +02:00
Karol Nowacki	079fe17e8b	vector_search: Fix missing timeout on TLS handshake Currently the TLS handshake in the vector search client does not have a timeout. This is because tls::connect does not perform handshake itself; the handshake is deferred until the first read/write operation is performed. This can lead to long hangs on ANN requests. This commit calls tls::check_session_is_resumed() after tls::connect to force the handshake to happen immediately and to run under with_timeout.	2026-02-12 10:08:37 +01:00
Karol Nowacki	aef5ff7491	vector_search: test: Fix flaky cert rewrite test The test is flaky most likely because when TLS certificate rewrite happens simultaneously with an ANN request, the handshake can hang for a long time (~60s). This leads to a timeout in the test case. This change introduces a checkpoint in the test so that it will wait for the certificate rewrite to happen before sending an ANN request, which should prevent the handshake from hanging and make the test more reliable. Fixes: #28012	2026-02-12 09:58:54 +01:00
Piotr Dulikowski	38c4a14a5b	Merge 'test: cluster: Fix test_sync_point' from Dawid Mędrek The test `test_sync_point` had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. --- Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. As a bonus, we rewrite the auxiliary code responsible for fetching metrics and manipulating sync points. Now it's asynchronous and uses the existing standard mechanisms available to developers. Furthermore, we reduce the time needed for executing `test_sync_point` by 27 seconds. --- The total difference in time needed to execute the whole test file (on my local machine, in dev mode): Before: CPU utilization: 0.9% real 2m7.811s user 0m25.446s sys 0m16.733s After: CPU utilization: 1.1% real 1m40.288s user 0m25.218s sys 0m16.566s --- Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203 Backport: This improves the stability of our CI, so let's backport it to all supported versions. Closes scylladb/scylladb#28602 * github.com:scylladb/scylladb: test: cluster: Reduce wait time in test_sync_point test: cluster: Fix test_sync_point test: cluster: Await sync points asynchronously test: cluster: Create sync points asynchronously test: cluster: Fetch hint metrics asynchronously	2026-02-12 09:34:09 +01:00
Dawid Mędrek	f83f911bae	test: cluster: Reduce wait time in test_sync_point If everything is OK, the sync point will not resolve with node 3 dead. As a result, the waiting will use all of the time we allocate for it, i.e. 30 seconds. That's a lot of time. There's no easy way to verify that the sync point will NOT resolve, but let's at least reduce the waiting to 3 seconds. If there's a bug, it should be enough to trigger it at some point, while reducing the average time needed for CI.	2026-02-10 17:05:02 +01:00
Dawid Mędrek	a256ba7de0	test: cluster: Fix test_sync_point The test had a few shortcomings that made it flaky or simply wrong: 1. We were verifying that hints were written by checking the size of in-flight hints. However, that could potentially lead to problems in rare situations. For instance, if all of the hints failed to be written to disk, the size of in-flight hints would drop to zero, but creating a sync point would correspond to the empty state. In such a situation, we should fail immediately and indicate what the cause was. 2. A sync point corresponds to the hints that have already been written to disk. The number of those is tracked by the metric `written`. It's a much more reliable way to make sure that hints have been written to the commitlog. That ensures that the sync point we'll create will really correspond to those hints. 3. The auxiliary function `wait_for` used in the test works like this: it executes the passed callback and looks at the result. If it's `None`, it retries it. Otherwise, the callback is deemed to have finished its execution and no further retries will be attempted. Before this commit, we simply returned a bool, and so the code was wrong. We improve it. Note that this fixes scylladb/scylladb#28203, which was a manifestation of scylladb/scylladb#25879. We created a sync point that corresponded to the empty state, and so it immediately resolved, even when node 3 was still dead. Refs scylladb/scylladb#25879 Fixes scylladb/scylladb#28203	2026-02-10 17:05:02 +01:00
Dawid Mędrek	c5239edf2a	test: cluster: Await sync points asynchronously There's a dedicated HTTP API for communicating with the cluster, so let's use it instead of yet another custom solution.	2026-02-10 17:05:02 +01:00
Dawid Mędrek	ac4af5f461	test: cluster: Create sync points asynchronously There's a dedicated HTTP API for communicating with the nodes, so let's use it instead of yet another custom solution.	2026-02-10 17:05:01 +01:00
Dawid Mędrek	628e74f157	test: cluster: Fetch hint metrics asynchronously There's a dedicated API for fetching metrics now. Let's use it instead of developing yet another solution that's also worse.	2026-02-10 17:04:59 +01:00
Pawel Pery	81d11a23ce	Revert "Merge 'vector_search: add validator tests' from Pawel Pery" This reverts commit `bcd1758911`, reversing changes made to `b2c2a99741`. There is a design decision to not introduce additional test orchestration tool for scylladb.git (see comments for #27499). One commit has already been reverted in `55c7bc7`. Last CI runs made validator test flaky, so it is a time to remove all remaining validator tests. It needs a backport to 2026.1 to remove remaining validator tests from there. Fixes: VECTOR-497 Closes scylladb/scylladb#28568	2026-02-08 16:29:58 +02:00
Avi Kivity	bb99bfe815	test: scylla_gdb: tighten check for Error output from gdb When running a gdb command, we check that the string 'Error' does not appear within the output. However, if the command output includes the string 'Error' as part of its normal operation, this generates a false positive. In fact the task_histogram can include the string 'error::Error' from the Rust core::error module. Allow for that and only match 'Error' that isn't 'error::Error'. Fixes #28516. Closes scylladb/scylladb#28574	2026-02-08 09:48:23 +02:00
Anna Stuchlik	dc8f7c9d62	doc: replace the OS Support page with a link to the new location We've moved that page to another place; see https://github.com/scylladb/scylladb/issues/28561. This commit replaces the page with the link to the new location and adds a redirection. Fixes https://github.com/scylladb/scylladb/issues/28561 Closes scylladb/scylladb#28562	2026-02-06 11:38:21 +02:00
Avi Kivity	7a3ce5f91e	test: minio: disable web console minio starts a web console on a random port. This was seen to interfere with the nodetool tests when the web console port clashed with the mock API port. Fix by disabling the web console. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-496 Closes scylladb/scylladb#28492	2026-02-05 20:11:32 +02:00
Nikos Dragazis	5d1e6243af	test/cluster: Remove short_tablet_stats_refresh_interval injection The test `test_size_based_load_balancing.py::test_balance_empty_tablets` waits for tablet load stats to be refreshed and uses the `short_tablet_stats_refresh_interval` injection to speed up the refresh interval. This injection has no effect; it was replaced by the `tablet_load_stats_refresh_interval_in_seconds` config option (patch: `1d6808aec4`), so the test currently waits for 60 seconds (default refresh interval). Use the config option. This reduces the execution time to ~8 seconds. Fixes SCYLLADB-556. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#28536	2026-02-05 20:11:32 +02:00
Pavel Emelyanov	10c278fff7	database: Remove _flush_sg member from replica::database This field is only used to initialize the following _memtable_controller one. It's simpler just to do the initialization with whatever value the field itself is initialized and drop the field itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28539	2026-02-05 13:02:35 +02:00
Petr Hála	a04dbac369	open-coredump: Change to use new backtrace * This is a breaking change, which removes compatibility with the old backtrace - See https://staging.backtrace.scylladb.com/api/docs#/default/search_by_build_id_search_build_id_post for the APIDoc * Add timestamp field to log * Tested locally Closes scylladb/scylladb#28325	2026-02-05 11:50:47 +02:00
Marcin Maliszkiewicz	0753d9fae5	Merge 'test: remove xfail marker from a few passing tests' from Nadav Har'El This patch fixes the few remaining cases of XPASS in test/cqlpy and test/alternator. These are tests which, when written, reproduced a bug and therefore were marked "xfail", but some time later the bug was fixed and we either did not notice it was ever fixed, or just forgot to remove the xfail marker. Removing the no-longer-needed xfail markers is good for test hygiene, but more importantly is needed to avoid regressions in those already-fixed areas (if a test is already marked xfail, it can start to fail in a new way and we wouldn't notice). Backport not needed, xpass doesn't bother anyone. Closes scylladb/scylladb#28441 * github.com:scylladb/scylladb: test/cqlpy: remove xfail from tests for fixed issue 7972 test/cqlpy: remove xfail from tests for fixed issue 10358 test/cqlpy: remove xfail from passing test testInvalidNonFrozenUDTRelation test/alternator: remove xfail from passing test_update_item_increases_metrics_for_new_item_size_only	2026-02-05 10:10:43 +01:00
Marcin Maliszkiewicz	6eca74b7bb	Merge 'More Alternator tests for BatchWriteItem' from Nadav Har'El The goal of this small pull request is to reproduce issue #28439, which found a bug in the Alternator Streams output when BatchWriteItem is called to write multiple items in the same partition, and always_use_lwt write isolation mode is used. * The first patch reproduces this specific bug in Alternator Streams. * The second patch adds missing (Fixes #28171) tests for BatchWriteItem in different write modes, and shows that BatchWriteItem itself works correctly - the bug is just in Alternator Streams' reporting of this write. Closes scylladb/scylladb#28528 * github.com:scylladb/scylladb: test/alternator: add test for BatchWriteItem with different write isolations test/alternator: reproducer for Alternator Streams bug	2026-02-05 10:07:29 +01:00
Yaron Kaikov	b30ecb72d5	ci: fix PR number extraction for unlabeled events When the workflow is triggered by removing the 'conflicts' label (pull_request_target unlabeled event), github.event.issue.number is not available. Use github.event.pull_request.number as fallback. Fixes: https://scylladb.atlassian.net/browse/RELENG-245 Closes scylladb/scylladb#28543	2026-02-05 08:41:43 +02:00
Michał Hudobski	6b9fcc6ca3	auth: add CDC streams and timestamps to vector search permissions It turns out that the cdc driver requires permissions to two additional system tables. This patch adds them to VECTOR_SEARCH_INDEXING and modifies the unit tests. The integration with vector store was tested manually, integration tests will be added in vector-store repository in a follow up PR. Fixes: SCYLLADB-522 Closes scylladb/scylladb#28519	2026-02-04 09:10:08 +01:00
Nadav Har'El	47e827262f	test/alternator: add test for BatchWriteItem with different write isolations Alternator's various write operations have different code paths for the different write isolation modes. Because most of the test suite runs in only a single write mode (currently - only_rmw_uses_lwt), we already introduced a test file test/alternator/test_write_isolation.py for checking the different write operations in all four write isolation modes. But we missed testing one write operation - BatchWriteItem. This operation isn't very "interesting" because it doesn't support any read-modify-option option (it doesn't support UpdateExpression, ConditionExpression or ReturnValues), but even without those, the pure write code still has different code paths with and without LWT, and should be tested. So we add the missing test here - and it passes. In issue #28439 we discovered a bug that can be seen in Alternator Streams in the case of BatchWriteItem with multiple writes to the same partition and always_use_lwt mode. The fact that the test added here passes shows that the bug is NOT in BatchWriteItem itself, which works correctly in this case - but only in the Alternator Streams layer. Fixes #28171 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-04 09:24:29 +02:00
Nadav Har'El	c63f43975f	test/alternator: reproducer for Alternator Streams bug This patch adds a reproducer for an Alternator Streams bug described in issue #28439, where the stream returns the wrong events (and fewer of them) in the following specific combination of the following circumstances: 1. A BatchWriteItem operation writing multiple items to the same partition. 2. The "always_use_lwt" write isolation mode is used. (the bug doesn't occur in other write isolation modes). We didn't catch this bug earlier because the Alternator Streams test we had for BatchWriteItem had multiple items in multiple partitions, and we missed the multiple-items-in-one-partition case. Moreover, today we run all the tests in only_rmw_uses_lwt mode (in the past, we did use always_use_lwt, but changed recently in commit `e7257b1393` following commit `76a766c` that changed test.py). As issue #28439 explains, the underlying cause of the bug is that the always_use_lwt causes the multiple items to be written with the same timestamp, which confused the Alternator Streams code reading the CDC log. The bug is not in BatchWriteItem itself, or in ScyllaDB CDC, but just in the Alternator Streams layer. The test in this patch is parameterized to run on each of the four write isolation modes, and currently fails (and so marked xfail) just for the one mode 'always_use_lwt'. The test is scylla_only, as its purpose is to checks the different write isolation mode - which don't exist in AWS DynamoDB. Refs #28439 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-02-04 09:17:48 +02:00
Radosław Cybulski	03ff091bee	alternator: improve events output when test failed Improve events printing, when test in test_streams.py failed. New code will print both expected and received events (keys, previous image, new image and type). New code will explicitly mark, at which output event comparison failed. Fixes #28455 Closes scylladb/scylladb#28476	2026-02-03 21:55:07 +02:00

1 2 3 4 5 ...

51930 Commits