scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-25 09:11:10 +00:00

Author	SHA1	Message	Date
Tomasz Grabiec	0e0f9f41b3	sstables: mx: index_reader: Keep index_entry directly in the vector Partition index entries are relatively small, and if the workload has small partitions, index pages have a lot of elements. Currently, index entries are indirected via managed_ref, which causes increased cost of LSA eviction and compaction. This patch amortizes this cost by storing them dierctly in the managed_chunked_vector. This gives about 23% improvement in throughput in perf-simple-query for a workload where the index doesn't fit in memory: scylla perf-simple-query -c1 -m200M --duration 6000 --partitions=1000000 Before: 7774.96 tps (166.0 allocs/op, 521.7 logallocs/op, 54.0 tasks/op, 802428 insns/op, 430457 cycles/op, 0 errors) 7511.08 tps (166.1 allocs/op, 527.2 logallocs/op, 54.0 tasks/op, 804185 insns/op, 430752 cycles/op, 0 errors) 7740.44 tps (166.3 allocs/op, 526.2 logallocs/op, 54.2 tasks/op, 805347 insns/op, 432117 cycles/op, 0 errors) 7818.72 tps (165.2 allocs/op, 517.6 logallocs/op, 53.7 tasks/op, 794965 insns/op, 427751 cycles/op, 0 errors) 7865.49 tps (165.1 allocs/op, 513.3 logallocs/op, 53.6 tasks/op, 788898 insns/op, 425171 cycles/op, 0 errors) After: 9560.42 tps (172.2 allocs/op, 19.6 logallocs/op, 57.7 tasks/op, 567741 insns/op, 345158 cycles/op, 0 errors) 9445.95 tps (173.1 allocs/op, 19.7 logallocs/op, 58.1 tasks/op, 579075 insns/op, 352173 cycles/op, 0 errors) 9576.75 tps (172.2 allocs/op, 19.6 logallocs/op, 57.6 tasks/op, 572004 insns/op, 347373 cycles/op, 0 errors) 9597.16 tps (172.2 allocs/op, 19.6 logallocs/op, 57.6 tasks/op, 569615 insns/op, 346618 cycles/op, 0 errors) 9454.07 tps (173.5 allocs/op, 19.8 logallocs/op, 58.3 tasks/op, 579213 insns/op, 351569 cycles/op, 0 errors) Disabling the partition index doesn't improve the throuhgput beyond that.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	b6bfdeb111	dht: Introduce raw_token Most tokens stored in data structures are for key-scoped tokens, and we don't need to pay for token::kind storage.	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	3775593e53	test: perf_simple_query: Add 'sstable-format' command-line option	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	6ee9bc63eb	test: perf_simple_query: Add 'sstable-summary-ratio' command-line option	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	38d130d9d0	test: perf-simple-query: Add option to disable index cache	2026-03-18 16:25:20 +01:00
Tomasz Grabiec	5ee61f067d	test: cql_test_env: Respect enable-index-cache config Mirrors the code in main.cc	2026-03-18 16:25:20 +01:00
Aleksandra Martyniuk	d4fdeb4839	tasks: pass token_metadata_ptr to task_manager::virtual_task::impl::get_children In get_children we get the vector of alive nodes with get_nodes. Yet, between this and sending rpc to those nodes there might be a preemption. Currently, the liveness of a node is checked once again before the rpcs (only with gossiper not in topology - unlike get_nodes). Modify get_children, so that it keeps a token_metadata_ptr, preventing topology from changing between get_nodes and rpcs. Remove test_get_children as it checked if the get_children method won't fail if a node is down after get_nodes - which cannot happen currently.	2026-03-18 15:37:24 +01:00
Calle Wilund	0013f22374	memtable_test::memtable_flush_period: Change sleep to use injection signal instead Fixes: SCYLLADB-942 Adds an injection signal _from_ table::seal_active_memtable to allow us to reliably wait for flushing. And does so. Closes scylladb/scylladb#29070	2026-03-18 16:23:13 +02:00
Botond Dénes	ae17596c2a	Merge 'Demote log level on split failure during shutdown' from Raphael Raph Carvalho Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Fixes https://github.com/scylladb/scylladb/issues/24850. Only 2026.1 is affected. Closes scylladb/scylladb#29032 * github.com:scylladb/scylladb: replica: Demote log level on split failure during shutdown service: Demote log level on split failure during shutdown	2026-03-18 16:21:05 +02:00
Pavel Emelyanov	d68c92ec04	test: Replace a bunch of ternary operators with an if-else block A followup of the merge of two test cases that happened in the previous patch. Both used `foo = N if domain == bar else M` to evaluate the parameters for topology. Using if-else block makes it immediately obvious which topology and scope apply for each domain value without having to evaluate multiple inline conditionals. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-18 13:08:36 +03:00
Pavel Emelyanov	b1d4fc5e6e	test: Squash test_restore_primary_replica_same\|different_domain tests The two tests differ only in the way they set up the topology for the cluster and the post-restore checks against the resulting streams. The merge happens with the help of a "scope_is_same" boolean parameter and corresponding updates in the topology setup and post-checks. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-18 13:08:36 +03:00
Pavel Emelyanov	21c603a79e	test: Use the same regexp in test_restore_primary_replica_different\|same_domain-s The one in "different domain" test is simpler because the test performs less checks. Next patch will merge both tests and making regexp-s look identical makes the merge even smother. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-18 13:07:09 +03:00
Benny Halevy	c2a6d1e930	test/boost/database_test: add test_snapshot_ctl_details_exception_handling Verify that the directory listers opened by get_snapshot_details are properly closed when handling an (injected) exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-18 09:37:44 +02:00
Dario Mirovic	71e6918f28	test: use NetworkTopologyStrategy in maintenance socket tests NetworkTopologyStrategy is the preferred choice. We should not use SimpleStrategy anymore. This patch changes the topology strategy for all the maintenance socket tests. Refs SCYLLADB-1070	2026-03-17 20:20:47 +01:00
Dario Mirovic	278535e4e3	test: use cleanup fixture in maintenance socket auth tests Add a cql_clusters pytest fixture that tracks CQL driver Cluster objects and shuts them down automatically after test completion. This replaces manual shutdown() calls at the end of each test. Also consolidate shutdown() calls in retry helpers into finally blocks for consistent cleanup. Refs SCYLLADB-1070	2026-03-17 20:15:30 +01:00
Dario Mirovic	2e4b72c6b9	auth: add maintenance_socket_authorizer GRANT/REVOKE fails on the maintenance socket connections, because maintenance_auth_service uses allow_all_authorizer. allow_all_authorizer allows all operations, but not GRANT/REVOKE, because they make no sense in its context. This has been observed during PGO run failure in operations from ./pgo/conf/auth.cql file. This patch introduces maintenance_socket_authorizer that supports the capabilities of default_authorizer ('CassandraAuthorizer') without needing authorization. Refs SCYLLADB-1070	2026-03-17 19:19:41 +01:00
Botond Dénes	172c786079	Merge 'perf-alternator: wait for alternator port before running workload' from Marcin Maliszkiewicz This patch is mostly for the purpose of running pgo CI job. We may receive connection error if asyncio.sleep(5) in pgo.py is not sufficient waiting time. In pgo.py we do wait for port but only for cql, anyway it's better to have high level check than trying to wait for alternator port there. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1071 Backport: 2026.1 - it failed on CI for that build Closes scylladb/scylladb#29063 * github.com:scylladb/scylladb: perf: add abort_source support to wait-for-port loops perf-alternator: wait for alternator port before running workload	2026-03-17 18:38:11 +02:00
Botond Dénes	5d868dcc55	Merge 's3_client: fix s3::range max value for object size' from Ernest Zaslavsky - fix s3::range max value for object size which is 50TiB and not 5. - refactor constants to make it accessible for all interested parties, also reuse these constants in tests No need to backport, doubt we will encounter an object larger than 5TiB Closes scylladb/scylladb#28601 * github.com:scylladb/scylladb: s3_client: reorganize tests in part_size_calculation_test s3_client: switch using s3 limits constants in tests s3_client: fix the s3::range max object size s3_client: remove "aws" prefix from object limits constants s3_client: make s3 object limits accessible	2026-03-17 16:34:42 +02:00
Dawid Mędrek	a8dd13731f	Merge 'Improve debuggability of test/cluster/test_data_resurrection_in_memtable.py' from Botond Dénes This test was observed to fail in CI recently but there is not enough information in the logs to figure out what went wrong. This PR makes a few improvements to make the next investigation easier, should it be needed: * storage-service: add table name to mutation write failure error messages. * database: the `database_apply` error injection used to cause trouble, catching writes to bystander tables, making tests flaky. To eliminate this, it gained a filter to apply only to non-system keyspaces. Unfortunately, this still allows it to catch writes to the trace tables. While this should not fail the test, it reduces observability, as some traces disappear. Improve this error injection to only apply to selected table. Also merge it with the `database_apply_wait` error injection, to streamline the code a bit. * test/test_data_resurrection_in_memtable.py: dump data from the datable, before the checks for expected data, so if checks fail, the data in the table is known. Refs: SCYLLADB-812 Refs: SCYLLADB-870 Fixes: SCYLLADB-1050 (by restricting `database_apply` error injection, so it doesn't affect writes to system traces) Backport: test related improvement, no backport Closes scylladb/scylladb#28899 * github.com:scylladb/scylladb: test/cluster/test_data_resurrection_in_memtable.py: dump rows before check replica/database: consolidate the two database_apply error injections service/storage_proxy: add name of table to error message for write errors	2026-03-17 13:35:19 +01:00
Botond Dénes	318aa07158	Merge ' test/alternator: use module-scope fixtures in test_streams.py ' from Nadav Har'El Previously, all stream-table fixtures in test_streams.py used scope="function", forcing a fresh table to be created for every test, slowing down the test a bit (though not much), and discouraging writing small new tests. This was a workaround for a DynamoDB quirk (that Alternator doesn't have): LATEST shard iterators have a time slack and may point slightly before the true stream head, causing leftover events from a previous test to appear in the next test's reads. The first two tests in this series fix small problems that turn up once we start sharing test tables in test_streams.py. The final patch fixes the "LATEST" problem and enables sharing the test table by using "module" scope fixtures instead of "function". After this series, test_streams.py run time went down a bit, from 20.2 seconds to 17.7 seconds. Closes scylladb/scylladb#28972 * github.com:scylladb/scylladb: test/alternator: speed up test_streams.py by using module-scope fixtures test/alternator: test_streams.py don't use fixtures in 4 tests test/alternator: fix do_test() in test_streams.py	2026-03-17 13:56:16 +02:00
Botond Dénes	dbe70cddca	test/boost/querier_cache_test: make test_time_based_cache_eviction less sensitive to timing This test relies on the cache entry being evicted after 200ms past the TTL. This may not happen on a busy CI machine. Make the test less reliant on timing by using eventually_true(). Simplify the test by dropping the second entry, it doesn't add anything to the test. Fixes: SCYLLADB-811 Closes scylladb/scylladb#28958	2026-03-17 10:32:23 +01:00
Botond Dénes	0fd51c4adb	test/nodetool: rest_api_mock_server: add retry for status code 404 This fixtures starts the mock server and immediately connects to it to setup the expected requests. The connection attempt might be too early, so there is a retry loop with a timeout. The loop currently checks for requests.exception.ConnectionError. We've seen a case where the connection is successful but the request fails with 404. The mock started the server but didn't setup the routes yet. Add a retry for http 404 to handle this. Fixes: SCYLLADB-966 Closes scylladb/scylladb#29003	2026-03-17 10:30:23 +01:00
Botond Dénes	035aa90d4b	Merge 'Alternator: add per-table batch latency metrics and test coverage' from Amnon Heiman This series fixes a metrics visibility gap in Alternator and adds regression coverage. Until now, BatchGetItem and BatchWriteItem updated global latency histograms but did not consistently update per-table latency histograms. As a result, table-level latency dashboards could miss batch traffic. It updates the batch read/write paths to compute request duration once and record it in both global and per-table latency metrics. Add the missing tests, including a metric-agnostic helper and a dedicated per-table latency test that verifies latency counters increase for item and batch operations. This change is metrics-only (no API/behavior change for requests) and improves observability consistency between global and per-table views. Fixes #28721 We assume the alternator per-table metrics exist, but the batch ones are not updated Closes scylladb/scylladb#28732 * github.com:scylladb/scylladb: test(alternator): add per-table latency coverage for item and batch ops alternator: track per-table latency for batch get/write operations	2026-03-16 17:18:00 +02:00
Botond Dénes	9de8d6798e	Merge 'reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory' from Łukasz Paszkowski Permits in the `waiting_for_memory` state represent already-executing reads that are blocked on memory allocation. Preemptively aborting them is wasteful -- these reads have already consumed resources and made progress, so they should be allowed to complete. Restrict the preemptive abort check in maybe_admit_waiters() to only apply to permits in the `waiting_for_admission` state, and tighten the state validation in `on_preemptive_aborted()` accordingly. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1016 Backport not needed. The commit introducing replica load shedding is not part of 2026.1 Closes scylladb/scylladb#29025 * github.com:scylladb/scylladb: reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory reader_concurrency_semaphore_test: detect memory leak on preemptive abort of waiting_for_memory permit	2026-03-16 17:14:25 +02:00
Marcin Maliszkiewicz	9318c80203	perf: add abort_source support to wait-for-port loops Check abort_source on each retry iteration in wait_for_alternator and wait_for_cql so the wait can be interrupted on shutdown. Didn't use sleep_abortable as the sleep is very short anyway.	2026-03-16 16:14:10 +01:00
Calle Wilund	a5df2e79a7	storage_service: Wait for snapshot/backup before decommission Fixes: SCYLLADB-244 Disables snapshot control such that any active ops finish/fail before proceeding with decommission. Note: snapshot control provided as argument, not member ref due to storage_service being used from both main and cql_test_env. (The latter has no snapshot_ctl to provide). Could do the snapshot lockout on API level, but want to do pre-checks before this. Note: this just disables backup/snapshot fully. Could re-enable after decommission, but this seems somewhat pointless. v2: * Add log message to snapshot shutdown * Make test use log waiting instead of timeouts Closes scylladb/scylladb#28980	2026-03-16 17:12:57 +02:00
Marcin Maliszkiewicz	edf0148bee	perf-alternator: wait for alternator port before running workload This patch is mostly for the purpose of running pgo CI job. We may receive connection error if asyncio.sleep(5) in pgo.py is not sufficient waiting time. In pgo.py we do wait for port but only for cql, anyway it's better to have high level check than trying to wait for alternator port there.	2026-03-16 16:07:52 +01:00
bitpathfinder	85d5073234	test: Fix non-awaited coroutine in test_gossiper_empty_self_id_on_shadow_round The line with the error was not actually needed and has therefore been removed. Fixes: SCYLLADB-906 Closes scylladb/scylladb#28884	2026-03-16 17:07:36 +02:00
Botond Dénes	3e4e0c57b8	Merge 'Relax rf-rack-valid-keyspace option in backup/restore tests' from Pavel Emelyanov Some tests, when create a cluster, configure nodes with the rf-rack-valid option, because sometimes they want to have it OFF. For that the option is explicitly carried around, but the cluster creating helper can guess this option itself -- out of the provided topology and replication factor. Removing this option simplifies the code and (which a nicer outcome) the test "signature" that's used e.g. in command-line to run a specific test. Improving tests, not backporting Closes scylladb/scylladb#28860 * github.com:scylladb/scylladb: test: Relax topology_rf_validity parameter for some tests test: Auto detect rf-rack-valid option in create_cluster()	2026-03-16 17:06:46 +02:00
Raphael S. Carvalho	ee87b66033	replica: Demote log level on split failure during shutdown Dtest failed with: table - Failed to load SSTable .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db of origin memtable due to std::runtime_error (Cannot split .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db because manager has compaction disabled, reason might be out of space prevention), it will be unlinked... The reason is that the error above is being triggered when the cause is shutdown, not out of space prevention. Let's distinguish between the two cases and log the error with warning level on shutdown. Fixes https://github.com/scylladb/scylladb/issues/24850. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 12:03:17 -03:00
Patryk Jędrzejczak	526e5986fe	test: test_raft_no_quorum: decrease group0_raft_op_timeout_in_ms after quorum loss `test_raft_no_quorum.py::test_cannot_add_new_node` is currently flaky in dev mode. The bootstrap of the first node can fail due to `add_entry()` timing out (with the 1s timeout set by the test case). Other test cases in this test file could fail in the same way as well, so we need a general fix. We don't want to increase the timeout in dev mode, as it would slow down the test. The solution is to keep the timeout unchanged, but set it only after quorum is lost. This prevents unexpected timeouts of group0 operations with almost no impact on the test running time. A note about the new `update_group0_raft_op_timeout` function: waiting for the log seems to be necessary only for `test_quorum_lost_during_node_join_response_handler`, but let's do it for all test cases just in case (including `test_can_restart` that shouldn't be flaky currently). Fixes https://scylladb.atlassian.net/browse/SCYLLADB-913 Closes scylladb/scylladb#28998	2026-03-16 16:58:15 +02:00
Raphael S. Carvalho	b508f3dd38	service: Demote log level on split failure during shutdown Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 11:52:00 -03:00
Artsiom Mishuta	755d528135	test.py: fix warnings changes in this commit: 1)rename class from 'TestContext' to 'Context' so pytest will not consider this class as a test 2)extend pytest filterwarnings list to ignore warnings from external libs 3) use datetime.datetime.now(datetime.UTC) unstead datetime.datetime.utcnow() 4) use ResultSet.one() instead ResultSet[0] Fixes SCYLLADB-904 Fixes SCYLLADB-908 Related SCYLLADB-902 Closes scylladb/scylladb#28956	2026-03-15 12:00:10 +02:00
Karol Nowacki	7659a5b878	vector_search: test: fix flaky test The test assumes that the sleep duration will be at least the value of the sleep parameter. However, the actual sleep time can be slightly less than requested (e.g., a 100ms sleep request might result in a 99ms sleep). This commit adjusts the test's time comparison to be more lenient, preventing test flakiness.	2026-03-13 16:28:22 +01:00
Karol Nowacki	5474cc6cc2	vector_search: fix race condition on connection timeout When a `with_connect` operation timed out, the underlying connection attempt continued to run in the reactor. This could lead to a crash if the connection was established/rejected after the client object had already been destroyed. This issue was observed during the teardown phase of a upcoming high-availability test case. This commit fixes the race condition by ensuring the connection attempt is properly canceled on timeout. Additionally, the explicit TLS handshake previously forced during the connection is now deferred to the first I/O operation, which is the default and preferred behavior. Fixes: SCYLLADB-832	2026-03-13 16:28:22 +01:00
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Pavel Emelyanov	d544d8602d	test: Relax topology_rf_validity parameter for some tests Tests that call create_cluster() helper no longer need to carry the rf-validity parameter. This simplifies the code and test signature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Pavel Emelyanov	313985fed7	test: Auto detect rf-rack-valid option in create_cluster() The helper accepts its as boolean argument, but it can easily estimate one from the provided topology. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Łukasz Paszkowski	4c4d043a3b	reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory Permits in the `waiting_for_memory` state represent already-executing reads that are blocked on memory allocation. Preemptively aborting them is wasteful -- these reads have already consumed resources and made progress, so they should be allowed to complete. Restrict the preemptive abort check in maybe_admit_waiters() to only apply to permits in the `waiting_for_admission` state, and tighten the state validation in `on_preemptive_aborted()` accordingly. Adjust the following tests: + test_reader_concurrency_semaphore_abort_preemptively_aborted_permit no longer relies on requesting memory + test_reader_concurrency_semaphore_preemptive_abort_requested_memory_leak adjusted to the fix Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1016	2026-03-13 09:50:05 +01:00
Botond Dénes	fc8cebd671	Merge 'Verify components digests during component load and scrub in validate mode' from Taras Veretilnyk This PR adds integrity verification for SSTable component files during loading. When component digests are present in Scylla metadata, the loader now validates each component's CRC32 digest against the stored expected value, catching silent corruption of component files. Index, Rows and Partitions components digests are also validated duriung scrub in validate mode Added corruption tests that write an SSTable, flip a bit in a specific component file, then verify that reloading the SSTable detects the corruption and throws the expected exception. Depends on https://github.com/scylladb/scylladb/pull/28338 Backport is not required, this is new feature Fixes https://github.com/scylladb/scylladb/issues/20103 Closes scylladb/scylladb#28761 * github.com:scylladb/scylladb: test/cqlpy: test --ignore-component-digest-mismatch flag in scylla sstable upgrade docs: document --ignore-component-digest-mismatch flag for scylla sstable upgrade sstables: propagate ignore_component_digest_mismatch config to all load sites sstables: add option to ignore component digest mismatches sstable_compaction_test: Add scrub validate test for corrupted index sstables: add tests for component digest validation on corrupted SSTables sstables: validate index components digests during SSTable scrub in validate mode sstables: verify component digests on SSTable load sstables: add digest_file_random_access_reader for CRC32 digest computation	2026-03-13 09:55:55 +02:00
Avi Kivity	ae8a418744	Merge 'Await async calls in test tablets migration' from Benny Halevy Fix several test cases that did not await async tasks: - test_restart_leaving_replica_during_cleanup - test_restart_in_cleanup_stage_after_cleanup - test_tablet_back_and_forth_migration - test_staging_backlog_is_preserved_with_file_based_streaming Fixes SCYLLADB-910 * Minor fixes, no backport needed Closes scylladb/scylladb#28908 * github.com:scylladb/scylladb: test_tablets_migration: test_staging_backlog_is_preserved_with_file_based_streaming: convert for loop to asyncio.gather test_tablets_migration: test_tablet_back_and_forth_migration: await move_tablet test_tablets_migration: test_restart_in_cleanup_stage_after_cleanup: await move_task test_tablets_migration: test_restart_leaving_replica_during_cleanup: await move_task test_tablets_migration: drop unused imports from cassandra.query	2026-03-13 00:20:29 +02:00
Avi Kivity	b228eb26e6	Merge 'dbuild: Use slirp4netns network in dbuild nested containers' from Calle Wilund Fixes #25084 Add slirp4netns and use for nested containers. This will allow nested container port aliasing, helping CI stability. Note: this contains and updated Dockerfile for dbuild image, but since chicken and eggs, right now will force install slirp4netns before anything in dbuild script. Updates the mock server handling to use ephemeral ports and query from container, ensuring we don't get port collisions. (boost as well as pytest). Includes a timeout up, and a tweak to our scylla_cluster handling, ensuring we don't deadlock when pipe size is less than requires for our sys notify messages. Closes scylladb/scylladb#28727 * github.com:scylladb/scylladb: gcs_fixture: Change to use docker helper aws_kms_fixture: Modify to use docker helper test/lib/proc_util: Add docker helper pytest: use ephemeral port publish for docker mock servers dbuild: Use container network in dbuild nested containers scylla_cluster: Read notify sock in background to prevent deadlock	2026-03-12 23:49:25 +02:00
Nadav Har'El	ad832c263e	test/cluster: mark test_alternator_concurrent_rmw_same_partition_different_server not strictly xfail A few days ago, in commit `7b30a39` we added to pytest.ini the option xfail_strict. This option causes every time a test XPASSes, i.e., an xfail test actually passes - to be considered an error and fail the test. But some tests demonstrate a timing-related bug and do not reproduce the bug every single time. An example we noticed in one CI run is: test/cluster/test_alternator.py::test_alternator_concurrent_rmw_same_partition_different_server This test reproduces a timing-related bug (if you do an LWT write to one partition on to two different coordinators "at the same time", you can get a failure), but only most of the time, not 100% of the time. The solution is to add "strict=False" for the xfail marker on this specific test. This undoes the xfail_strict for this specific test, accepting that this specific test can either pass or fail. Note that this does NOT make this test worthless - we still see this test failing most of the time, and when a developer finally fixes this issue, the test will begin to pass all the time. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-941 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29016	2026-03-12 23:46:23 +02:00
Tomasz Grabiec	1256a9faa7	tablets: Fix deadlock in background storage group merge fiber When it deadlocks, groups stop merging and compaction group merge backlog will run-away. Also, graceful shutdown will be blocked on it. Found by flaky unit test test_merge_chooses_best_replica_with_odd_count, which timed-out in 1 in 100 runs. Reason for deadlock: When storage groups are merged, the main compaction group of the new storage group takes a compaction lock, which is appended to _compaction_reenablers_for_merging, and released when the merge completion fiber is done with the whole batch. If we accumulate more than 1 merge cycle for the fiber, deadlock occurs. Lock order will be this Initial state: cg0: main cg1: main cg2: main cg3: main After 1st merge: cg0': main [locked], merging_groups=[cg0.main, cg1.main] cg1': main [locked], merging_groups=[cg2.main, cg3.main] After 2nd merge: cg0'': main [locked], merging_groups=[cg0'.main [locked], cg0.main, cg1.main, cg1'.main [locked], cg2.main, cg3.main] merge completion fiber will try to stop cg0'.main, which will be blocked on compaction lock. which is held by the reenabler in _compaction_reenablers_for_merging, hence deadlock. The fix is to wait for background merge to finish before we start the next merge. It's achieved by holding old erm in the background merge, and doing a topology barrier from the merge finalizing transition. Background merge is supposed to be a relatively quick operation, it's stopping compaction groups. So may wait for active requests. It shouldn't prolong the barrier indefinitely. Tablet boost unit tests which trigger merge need to be adjusted to call the barrier, otherwise they will be vulnerable to the deadlock. Two cluster tests were removed because they assumed that merge happens in the backgournd. Now that it happens as part of merge finalization, and blocks topology state machine, those tests deadlock because they are unable to make topology changes (node bootstrap) while background merge is blocked. The test "test_tablets_merge_waits_for_lwt" needed to be adjusted. It assumed that merge finalization doesn't wait for the erm held by the LWT operation, and triggered tablet movement afterwards, and assumed that this migration will issue a barrier which will block on the LWT operation. After this commit, it's the barrier in merge finalization which is blocked. The test was adjusted to use an earlier log mark when waiting for "Got raft_topology_cmd::barrier_and_drain", which will catch the barrier in merge finalization. Fixes SCYLLADB-928	2026-03-12 22:45:01 +01:00
Tomasz Grabiec	582a4abeb6	test: boost: tablets_test: Save tablet metadata when ACKing split resize decision Needs to be ordered before split finalization, because storage_group must be in split mode already at finalization time. There must be split-ready compaction groups, otherwise finalization fails with this error: Found 0 split ready compaction groups, but expected 2 instead. Exposed by increased split activity in tests.	2026-03-12 22:45:01 +01:00
Avi Kivity	e2eeef3e01	Merge 'service level: remove remnants of version 1 service level' from Gleb Natapov can_use_effective_service_level_cache() always returns true now, so the function can be dropped entirely and all the code that assumes it may return false can be dropped as well. Also drop async versions of find_effective_service_level and get_user_scheduling_group since they are unused. No need to backport, code removal, Closes scylladb/scylladb#29002 * github.com:scylladb/scylladb: service level: make maybe_update_per_service_level_params synchronous service level: remove unused get_user_scheduling_group function service level: drop async find_effective_service_level service level: remove remnants of version 1 service level	2026-03-12 23:39:41 +02:00
Avi Kivity	e8a6706d6e	Merge 'shorten some sleeps to speed up bootstrap in tests' from Patryk Jędrzejczak This PR shortens two sleeps from 1s to 100ms to speed up bootstrap in tests. The changed sleeps are: - the pause duration in group0 discovery, - the retry period in `wait_for_cql`. Refs: https://scylladb.atlassian.net/browse/SCYLLADB-918 No backport: performance improvements mostly relevant to tests. Closes scylladb/scylladb#29020 * github.com:scylladb/scylladb: test: pylib: util: wait for CQL being ready with a shorter period group0: discovery: shorten the pause duration	2026-03-12 21:17:05 +02:00
Wojciech Mitros	32974770b0	test: add tests for CQL forwarding Add basic cluster tests for CQL forwarding. The test cases include: - basic reads and writes - prepared statements with binds - forwarding from a non-replica - exception passthrough during forwarding (using an injection) - re-preparing a statement on the target node, even if the user query is also an EXECUTE request on a prepared statement - verification metric updates The existing test_basic_write_read was modified so that a few extra cases could be validated on the same cluster.	2026-03-12 19:43:35 +01:00
Wojciech Mitros	916a9995c1	transport: enable CQL forwarding for strong consistency statements We enable CQL forwarding by starting to return the bounce_to_node result message in redirect_statement() instead of throwing. The forwarding code introduced in the preceding patches reacts to these messages, allowing the requests to be forwarded. With the update, some tests assuming that requests can't be forwarded need to be adjusted, so we do that as well.	2026-03-12 19:43:35 +01:00
Avi Kivity	76b6784c1a	Merge 'cql3: track CQL parsing memory cost and use it for admission control' from Marcin Maliszkiewicz Use rolling_max_tracker to record gross bytes allocated during each CQL parse. The rolling maximum is then added to the memory estimate for incoming QUERY and PREPARE requests so that the admission control in the CQL transport layer accounts for parsing overhead. The measured memory footprint serves as upper bound rather than exact number but it's purpose is to prevent OOMs under unprepared statements heavy load. In benchmark 1G memory node shows decrease of non-LSA memory usage from peak 320MB (our coordinator budget is 10% of 1G) to 96MB. While tps drops from 1.2 kops to 0.8 kops. Drop in tps is expected as memory admission kicks in trying to prevent OOM. This is phase 1 of OOM prevention, potential next steps: - add second admission in query_processor::get_statement trying to prevent potential thundering herd problem - decrease cql_server memory pool size - count reads in the memory pool - add per service level memory pool and a shared one Related https://scylladb.atlassian.net/browse/SCYLLADB-740 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-938 Backport: no, new feature, but we may reconsider if some customer needs it Closes scylladb/scylladb#28919 * github.com:scylladb/scylladb: cql3: track CQL parsing memory cost and use it for admission control utils: add rolling max tracker	2026-03-12 19:59:52 +02:00

... 12 13 14 15 16 ...

11801 Commits