scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Dario Mirovic	249a6cec1b	test: cluster: dtest: remove old audit tests Since audit tests have been migrated to test/cluster/test_audit.py, old tests in test/cluster/dtest/audit_test.py have to be removed. Refs SCYLLADB-573	2026-03-19 16:12:13 +01:00
Dario Mirovic	adc790a8bf	test: cluster: group migrated audit tests for cluster reuse This patch reorganizes the execution flow of the test functions. They are grouped to enable cluster reuse between specific test functions. One of the main contributors to the test execution time is the cluster preparation. This patch significantly reduces the total test execution time by having way less new cluster preparation calls and more cluster reuse. Performance increase on the developer machine is around 38%: - before: 4m 29s - after: 2m 47s Fixes SCYLLADB-573	2026-03-19 16:11:47 +01:00
Dario Mirovic	967b7ff6bf	test: cluster: enable migrated audit tests and make them work Make audit tests from test/cluster/dtest to test/cluster. test/cluster environment has less overhead, and audit tests are heavy, their execution taking lots of time. This patch is part of an effort to improve audit test suite performance. This patch refactors the tests so that they execute correctly, as well as enables them. A follow up patch will remove the audit tests in test/cluster/dtest. All the tests are confirmed to be running after the change. No dead code present. Test test_audit_categories_invalid is not parametrized anymore. It never used the parametrized helper class, so it just ran the same logic three times. This is why there are now 74, and not 76, test executions. Refs SCYLLADB-573	2026-03-19 16:07:28 +01:00
Dario Mirovic	8367509b3b	test: pylib: manager_client: specify AuthProvider in get_cql_exclusive This patch allows ManagerClient.get_cql_exclusive to accept AuthProvider as parameter. This will be used in a follow up patch which migrates audit test suite to test/cluster and requires this functionality for some tests. Refs SCYLLADB-573	2026-03-19 15:35:24 +01:00
Dario Mirovic	0a7a69345c	test: pylib: scylla cluster after_test log fix Before any test, a pool of ScyllaCluster objects is created. At the beginning of a test suite, a ScyllaClusterManager is created, and given a reference to the pool. At the end of a test suite, the ScyllaClusterManager is destroyed. Before each test case: - ManagerClient is constructed and connected to the ScyllaClusterManager of that test suite - A ScyllaCluster object is fetched from the pool - If the pool is empty, a new ScyllaCluster object is created - If the pool is not empty, a cached ScyllaCluster object is returned After each test case: - Return ScyllaCluster object from ManagerClient to the pool - If the cluster is dirty, the pool destroys it - If the cluster is clean, the pool caches it - ManagerClient is destroyed Many actions mark a cluster as dirty. Normal test execution will always make the cluster be destroyed upon returning to the pool. ManagerClient.mark_clean is not used in the tests. When it is used, the flow with cluster reuse happens. The bug is that the log file is closed even if cluster is not dirty. This causes an error when trying to log to a reused cluster server. The solution in this patch is to not close the log file if the cluster is not dirty. Upon cluster reuse the log file will be open and functional. Another approach would be to reopen the log file if closed, but this approach seems more clean. Refs SCYLLADB-573	2026-03-19 15:35:24 +01:00
Dario Mirovic	899ae71349	test: audit: copy audit test from dtest This patch just copies the audit test suite from dtest and disables it in the test config file. Later patches will update the code and enable the test suite. Refs SCYLLADB-573	2026-03-19 15:35:24 +01:00
Anna Stuchlik	88b98fac3a	doc: update the warning about shared dictionary training This commit updates the inadequate warning on the Advanced Internode (RPC) Compression page. The warning is replaced with a note about how training data is encrypted. Fixes https://github.com/scylladb/scylladb/issues/29109 Closes scylladb/scylladb#29111	2026-03-18 19:35:18 +02:00
Avi Kivity	46a6f8e1d3	Merge 'auth: add maintenance_socket_authorizer' from Dario Mirovic GRANT/REVOKE fails on the maintenance socket connections, because maintenance_auth_service uses allow_all_authorizer. allow_all_authorizer allows all operations, but not GRANT/REVOKE, because they make no sense in its context. This has been observed during PGO run failure in operations from ./pgo/conf/auth.cql file. This patch introduces maintenance_socket_authorizer that supports the capabilities of default_authorizer ('CassandraAuthorizer') without needing authorization. Refs SCYLLADB-1070 This is an improvement, no need for backport. Closes scylladb/scylladb#29080 * github.com:scylladb/scylladb: test: use NetworkTopologyStrategy in maintenance socket tests test: use cleanup fixture in maintenance socket auth tests auth: add maintenance_socket_authorizer	2026-03-18 19:29:57 +02:00
Pavel Emelyanov	d6c01be09b	s3/client: Don't reconstruct regex on every parse_content_range call Make the pattern static const so it is compiled once at first call rather than on every Content-Range header parse. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29054	2026-03-18 17:56:33 +02:00
Calle Wilund	0013f22374	memtable_test::memtable_flush_period: Change sleep to use injection signal instead Fixes: SCYLLADB-942 Adds an injection signal _from_ table::seal_active_memtable to allow us to reliably wait for flushing. And does so. Closes scylladb/scylladb#29070	2026-03-18 16:23:13 +02:00
Botond Dénes	ae17596c2a	Merge 'Demote log level on split failure during shutdown' from Raphael Raph Carvalho Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Fixes https://github.com/scylladb/scylladb/issues/24850. Only 2026.1 is affected. Closes scylladb/scylladb#29032 * github.com:scylladb/scylladb: replica: Demote log level on split failure during shutdown service: Demote log level on split failure during shutdown	2026-03-18 16:21:05 +02:00
Pavel Emelyanov	8b1ca6dcd6	database: Rate limit all tokens from a range The limiter scans ranges to decide whether or not to rate-limit the query. However, when considering each range only the front one's token is accounted. This looks like a misprint. The limiter was introduced in `cc9a2ad41f` Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29050	2026-03-18 13:50:48 +01:00
Emil Maskovsky	34f3916e7d	.github: update test instructions for unified pytest runner Update test running instructions to reflect unified pytest-based runner. The test.py now requires full test paths with file extensions for both C++ and Python tests. No backport: The change is only relevant for recent test.py changes in master. Closes scylladb/scylladb#29062	2026-03-18 09:28:28 +01:00
Dario Mirovic	71e6918f28	test: use NetworkTopologyStrategy in maintenance socket tests NetworkTopologyStrategy is the preferred choice. We should not use SimpleStrategy anymore. This patch changes the topology strategy for all the maintenance socket tests. Refs SCYLLADB-1070	2026-03-17 20:20:47 +01:00
Dario Mirovic	278535e4e3	test: use cleanup fixture in maintenance socket auth tests Add a cql_clusters pytest fixture that tracks CQL driver Cluster objects and shuts them down automatically after test completion. This replaces manual shutdown() calls at the end of each test. Also consolidate shutdown() calls in retry helpers into finally blocks for consistent cleanup. Refs SCYLLADB-1070	2026-03-17 20:15:30 +01:00
Dario Mirovic	2e4b72c6b9	auth: add maintenance_socket_authorizer GRANT/REVOKE fails on the maintenance socket connections, because maintenance_auth_service uses allow_all_authorizer. allow_all_authorizer allows all operations, but not GRANT/REVOKE, because they make no sense in its context. This has been observed during PGO run failure in operations from ./pgo/conf/auth.cql file. This patch introduces maintenance_socket_authorizer that supports the capabilities of default_authorizer ('CassandraAuthorizer') without needing authorization. Refs SCYLLADB-1070	2026-03-17 19:19:41 +01:00
Botond Dénes	172c786079	Merge 'perf-alternator: wait for alternator port before running workload' from Marcin Maliszkiewicz This patch is mostly for the purpose of running pgo CI job. We may receive connection error if asyncio.sleep(5) in pgo.py is not sufficient waiting time. In pgo.py we do wait for port but only for cql, anyway it's better to have high level check than trying to wait for alternator port there. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1071 Backport: 2026.1 - it failed on CI for that build Closes scylladb/scylladb#29063 * github.com:scylladb/scylladb: perf: add abort_source support to wait-for-port loops perf-alternator: wait for alternator port before running workload	2026-03-17 18:38:11 +02:00
Botond Dénes	5d868dcc55	Merge 's3_client: fix s3::range max value for object size' from Ernest Zaslavsky - fix s3::range max value for object size which is 50TiB and not 5. - refactor constants to make it accessible for all interested parties, also reuse these constants in tests No need to backport, doubt we will encounter an object larger than 5TiB Closes scylladb/scylladb#28601 * github.com:scylladb/scylladb: s3_client: reorganize tests in part_size_calculation_test s3_client: switch using s3 limits constants in tests s3_client: fix the s3::range max object size s3_client: remove "aws" prefix from object limits constants s3_client: make s3 object limits accessible	2026-03-17 16:34:42 +02:00
Dawid Mędrek	a8dd13731f	Merge 'Improve debuggability of test/cluster/test_data_resurrection_in_memtable.py' from Botond Dénes This test was observed to fail in CI recently but there is not enough information in the logs to figure out what went wrong. This PR makes a few improvements to make the next investigation easier, should it be needed: * storage-service: add table name to mutation write failure error messages. * database: the `database_apply` error injection used to cause trouble, catching writes to bystander tables, making tests flaky. To eliminate this, it gained a filter to apply only to non-system keyspaces. Unfortunately, this still allows it to catch writes to the trace tables. While this should not fail the test, it reduces observability, as some traces disappear. Improve this error injection to only apply to selected table. Also merge it with the `database_apply_wait` error injection, to streamline the code a bit. * test/test_data_resurrection_in_memtable.py: dump data from the datable, before the checks for expected data, so if checks fail, the data in the table is known. Refs: SCYLLADB-812 Refs: SCYLLADB-870 Fixes: SCYLLADB-1050 (by restricting `database_apply` error injection, so it doesn't affect writes to system traces) Backport: test related improvement, no backport Closes scylladb/scylladb#28899 * github.com:scylladb/scylladb: test/cluster/test_data_resurrection_in_memtable.py: dump rows before check replica/database: consolidate the two database_apply error injections service/storage_proxy: add name of table to error message for write errors	2026-03-17 13:35:19 +01:00
Botond Dénes	318aa07158	Merge ' test/alternator: use module-scope fixtures in test_streams.py ' from Nadav Har'El Previously, all stream-table fixtures in test_streams.py used scope="function", forcing a fresh table to be created for every test, slowing down the test a bit (though not much), and discouraging writing small new tests. This was a workaround for a DynamoDB quirk (that Alternator doesn't have): LATEST shard iterators have a time slack and may point slightly before the true stream head, causing leftover events from a previous test to appear in the next test's reads. The first two tests in this series fix small problems that turn up once we start sharing test tables in test_streams.py. The final patch fixes the "LATEST" problem and enables sharing the test table by using "module" scope fixtures instead of "function". After this series, test_streams.py run time went down a bit, from 20.2 seconds to 17.7 seconds. Closes scylladb/scylladb#28972 * github.com:scylladb/scylladb: test/alternator: speed up test_streams.py by using module-scope fixtures test/alternator: test_streams.py don't use fixtures in 4 tests test/alternator: fix do_test() in test_streams.py	2026-03-17 13:56:16 +02:00
Ernest Zaslavsky	7f597aca67	cmake: fix broken build Add raft_util.idl.hh to cmake to generate the code properly Closes scylladb/scylladb#29055	2026-03-17 10:35:34 +01:00
Botond Dénes	dbe70cddca	test/boost/querier_cache_test: make test_time_based_cache_eviction less sensitive to timing This test relies on the cache entry being evicted after 200ms past the TTL. This may not happen on a busy CI machine. Make the test less reliant on timing by using eventually_true(). Simplify the test by dropping the second entry, it doesn't add anything to the test. Fixes: SCYLLADB-811 Closes scylladb/scylladb#28958	2026-03-17 10:32:23 +01:00
Botond Dénes	0fd51c4adb	test/nodetool: rest_api_mock_server: add retry for status code 404 This fixtures starts the mock server and immediately connects to it to setup the expected requests. The connection attempt might be too early, so there is a retry loop with a timeout. The loop currently checks for requests.exception.ConnectionError. We've seen a case where the connection is successful but the request fails with 404. The mock started the server but didn't setup the routes yet. Add a retry for http 404 to handle this. Fixes: SCYLLADB-966 Closes scylladb/scylladb#29003	2026-03-17 10:30:23 +01:00
Asias He	6cb263bab0	repair: Prevent CPU stall during cross-shard row copy and destruction When handling `repair_stream_cmd::end_of_current_rows`, passing the foreign list directly to `put_row_diff_handler` triggered a massive synchronous deep copy on the destination shard. Additionally, destroying the list triggered a synchronous deallocation on the source shard. This blocked the reactor and triggered the CPU stall detector. This commit fixes the issue by introducing `clone_gently()` to copy the list elements one by one, and leveraging the existing `utils::clear_gently()` to destroy them. Both utilize `seastar::coroutine::maybe_yield()` to allow the reactor to breathe during large cross-shard transfers and cleanups. Fixes SCYLLADB-403 Closes scylladb/scylladb#28979	2026-03-17 11:05:15 +02:00
Botond Dénes	035aa90d4b	Merge 'Alternator: add per-table batch latency metrics and test coverage' from Amnon Heiman This series fixes a metrics visibility gap in Alternator and adds regression coverage. Until now, BatchGetItem and BatchWriteItem updated global latency histograms but did not consistently update per-table latency histograms. As a result, table-level latency dashboards could miss batch traffic. It updates the batch read/write paths to compute request duration once and record it in both global and per-table latency metrics. Add the missing tests, including a metric-agnostic helper and a dedicated per-table latency test that verifies latency counters increase for item and batch operations. This change is metrics-only (no API/behavior change for requests) and improves observability consistency between global and per-table views. Fixes #28721 We assume the alternator per-table metrics exist, but the batch ones are not updated Closes scylladb/scylladb#28732 * github.com:scylladb/scylladb: test(alternator): add per-table latency coverage for item and batch ops alternator: track per-table latency for batch get/write operations	2026-03-16 17:18:00 +02:00
Michał Hudobski	40d180a7ef	docs: update vector search filtering to reflect primary key support only Remove outdated references to filtering on columns provided in the index definition, and remove the note about equal relations (= and IN) being the only supported operations. Vector search filtering currently supports WHERE clauses on primary key columns only. Closes scylladb/scylladb#28949	2026-03-16 17:16:16 +02:00
Botond Dénes	9de8d6798e	Merge 'reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory' from Łukasz Paszkowski Permits in the `waiting_for_memory` state represent already-executing reads that are blocked on memory allocation. Preemptively aborting them is wasteful -- these reads have already consumed resources and made progress, so they should be allowed to complete. Restrict the preemptive abort check in maybe_admit_waiters() to only apply to permits in the `waiting_for_admission` state, and tighten the state validation in `on_preemptive_aborted()` accordingly. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1016 Backport not needed. The commit introducing replica load shedding is not part of 2026.1 Closes scylladb/scylladb#29025 * github.com:scylladb/scylladb: reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory reader_concurrency_semaphore_test: detect memory leak on preemptive abort of waiting_for_memory permit	2026-03-16 17:14:25 +02:00
Marcin Maliszkiewicz	9318c80203	perf: add abort_source support to wait-for-port loops Check abort_source on each retry iteration in wait_for_alternator and wait_for_cql so the wait can be interrupted on shutdown. Didn't use sleep_abortable as the sleep is very short anyway.	2026-03-16 16:14:10 +01:00
Calle Wilund	a5df2e79a7	storage_service: Wait for snapshot/backup before decommission Fixes: SCYLLADB-244 Disables snapshot control such that any active ops finish/fail before proceeding with decommission. Note: snapshot control provided as argument, not member ref due to storage_service being used from both main and cql_test_env. (The latter has no snapshot_ctl to provide). Could do the snapshot lockout on API level, but want to do pre-checks before this. Note: this just disables backup/snapshot fully. Could re-enable after decommission, but this seems somewhat pointless. v2: * Add log message to snapshot shutdown * Make test use log waiting instead of timeouts Closes scylladb/scylladb#28980	2026-03-16 17:12:57 +02:00
Marcin Maliszkiewicz	edf0148bee	perf-alternator: wait for alternator port before running workload This patch is mostly for the purpose of running pgo CI job. We may receive connection error if asyncio.sleep(5) in pgo.py is not sufficient waiting time. In pgo.py we do wait for port but only for cql, anyway it's better to have high level check than trying to wait for alternator port there.	2026-03-16 16:07:52 +01:00
bitpathfinder	85d5073234	test: Fix non-awaited coroutine in test_gossiper_empty_self_id_on_shadow_round The line with the error was not actually needed and has therefore been removed. Fixes: SCYLLADB-906 Closes scylladb/scylladb#28884	2026-03-16 17:07:36 +02:00
Botond Dénes	3e4e0c57b8	Merge 'Relax rf-rack-valid-keyspace option in backup/restore tests' from Pavel Emelyanov Some tests, when create a cluster, configure nodes with the rf-rack-valid option, because sometimes they want to have it OFF. For that the option is explicitly carried around, but the cluster creating helper can guess this option itself -- out of the provided topology and replication factor. Removing this option simplifies the code and (which a nicer outcome) the test "signature" that's used e.g. in command-line to run a specific test. Improving tests, not backporting Closes scylladb/scylladb#28860 * github.com:scylladb/scylladb: test: Relax topology_rf_validity parameter for some tests test: Auto detect rf-rack-valid option in create_cluster()	2026-03-16 17:06:46 +02:00
Raphael S. Carvalho	ee87b66033	replica: Demote log level on split failure during shutdown Dtest failed with: table - Failed to load SSTable .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db of origin memtable due to std::runtime_error (Cannot split .../me-3gyn_0qwi_313gw2n2y90v2j4fcv-big-Data.db because manager has compaction disabled, reason might be out of space prevention), it will be unlinked... The reason is that the error above is being triggered when the cause is shutdown, not out of space prevention. Let's distinguish between the two cases and log the error with warning level on shutdown. Fixes https://github.com/scylladb/scylladb/issues/24850. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 12:03:17 -03:00
Patryk Jędrzejczak	526e5986fe	test: test_raft_no_quorum: decrease group0_raft_op_timeout_in_ms after quorum loss `test_raft_no_quorum.py::test_cannot_add_new_node` is currently flaky in dev mode. The bootstrap of the first node can fail due to `add_entry()` timing out (with the 1s timeout set by the test case). Other test cases in this test file could fail in the same way as well, so we need a general fix. We don't want to increase the timeout in dev mode, as it would slow down the test. The solution is to keep the timeout unchanged, but set it only after quorum is lost. This prevents unexpected timeouts of group0 operations with almost no impact on the test running time. A note about the new `update_group0_raft_op_timeout` function: waiting for the log seems to be necessary only for `test_quorum_lost_during_node_join_response_handler`, but let's do it for all test cases just in case (including `test_can_restart` that shouldn't be flaky currently). Fixes https://scylladb.atlassian.net/browse/SCYLLADB-913 Closes scylladb/scylladb#28998	2026-03-16 16:58:15 +02:00
Raphael S. Carvalho	b508f3dd38	service: Demote log level on split failure during shutdown Since commit `509f2af8db`, gate_closed_exception can be triggered for ongoing split during shutdown. The commit is correct, but it causes split failure on shutdown to log an error, which causes CI instability. Previously, aborted_exception would be triggered instead which is logged as warning. Let's do the same. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-951. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2026-03-16 11:52:00 -03:00
Dani Tweig	bc0952781a	Update Jira sync calling workflow to consolidated view Replaced multiple per-action workflow jobs with a single consolidated call to main_pr_events_jira_sync.yml. Added 'edited' event trigger. This makes CI actions in PRs more readable and workflow execution faster. Fixes:PM-253 Closes scylladb/scylladb#29042	2026-03-16 08:25:32 +02:00
Artsiom Mishuta	755d528135	test.py: fix warnings changes in this commit: 1)rename class from 'TestContext' to 'Context' so pytest will not consider this class as a test 2)extend pytest filterwarnings list to ignore warnings from external libs 3) use datetime.datetime.now(datetime.UTC) unstead datetime.datetime.utcnow() 4) use ResultSet.one() instead ResultSet[0] Fixes SCYLLADB-904 Fixes SCYLLADB-908 Related SCYLLADB-902 Closes scylladb/scylladb#28956	2026-03-15 12:00:10 +02:00
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Tomasz Grabiec	518470e89e	Merge 'load_stats: improve tablet filtering for load stats' from Ferenc Szili When computing table sizes via load_stats to determine if a split/merge is needed, we are filtering tablets which are being migrated, in order to avoid counting them twice (both on leaving and pending replica) in the total table size. The tablets are filtered so that they are counted on the leaving replica until the streaming stage, and on the pending replica after the streaming stage. Currently, the procedure for collecting tablet sizes for load balancing also uses this same filter. This should be changed, because the load balancer needs to have as much information about tablet sizes as possible, and could ignore a node due to missing tablet sizes for tablets in the `write_both_read_new` and `use_new` stages. For tablet size collection, we should include all the tablets which are currently taking up disk space. This means: - on leaving replica, include all tablets until the `cleanup` stage - on pending replica, include all tablets starting with the `write_both_read_new` and later stages While this is an improvement, it causes problems with some of the tests, and therefore needs to be backported to 2026.1 Fixes: SCYLLADB-829 Closes scylladb/scylladb#28587 * github.com:scylladb/scylladb: load_stats: add filtering for tablet sizes load_stats: move tablet filtering for table size computation load_stats: bring the comment and code in sync	2026-03-13 13:08:11 +01:00
Pavel Emelyanov	d544d8602d	test: Relax topology_rf_validity parameter for some tests Tests that call create_cluster() helper no longer need to carry the rf-validity parameter. This simplifies the code and test signature. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Pavel Emelyanov	313985fed7	test: Auto detect rf-rack-valid option in create_cluster() The helper accepts its as boolean argument, but it can easily estimate one from the provided topology. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-13 14:30:32 +03:00
Gleb Natapov	fae5282c82	service level: fix crash during migration to driver server level Before `b59b3d4` the migration code checked that service level controller is on v2 version before migration and the check also implicitly checked that _sl_data_accessor field is already initialized, but now that the check is gone the migration can start before service level controller is fully initialized. Re add the check, but to a different place. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1049 Closes scylladb/scylladb#29021	2026-03-13 11:24:26 +01:00
Łukasz Paszkowski	4c4d043a3b	reader_concurrency_semaphore: skip preemptive abort for permits waiting for memory Permits in the `waiting_for_memory` state represent already-executing reads that are blocked on memory allocation. Preemptively aborting them is wasteful -- these reads have already consumed resources and made progress, so they should be allowed to complete. Restrict the preemptive abort check in maybe_admit_waiters() to only apply to permits in the `waiting_for_admission` state, and tighten the state validation in `on_preemptive_aborted()` accordingly. Adjust the following tests: + test_reader_concurrency_semaphore_abort_preemptively_aborted_permit no longer relies on requesting memory + test_reader_concurrency_semaphore_preemptive_abort_requested_memory_leak adjusted to the fix Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1016	2026-03-13 09:50:05 +01:00
Dani Tweig	aa46a0f4e0	Add VECTOR to the list of synced milestones in scylladb.git - Added VECTOR to the comma-separated list of Jira project keys in `call_sync_milestone_to_jira.yml`. - The `jira_project_keys` value changed from `SCYLLADB,CUSTOMER,SMI,RELENG` to `SCYLLADB,CUSTOMER,SMI,RELENG,VECTOR`. - The VECTOR project needs to sync with scylladb.git milestones, so that when a GitHub milestone is created or closed in scylladb/scylladb, the corresponding Jira release is also created or released in the VECTOR project. - Previously only SCYLLADB, CUSTOMER, SMI, and RELENG projects were synced. Fixes:PM-220 Closes scylladb/scylladb#29014	2026-03-13 09:58:41 +02:00
Botond Dénes	fc8cebd671	Merge 'Verify components digests during component load and scrub in validate mode' from Taras Veretilnyk This PR adds integrity verification for SSTable component files during loading. When component digests are present in Scylla metadata, the loader now validates each component's CRC32 digest against the stored expected value, catching silent corruption of component files. Index, Rows and Partitions components digests are also validated duriung scrub in validate mode Added corruption tests that write an SSTable, flip a bit in a specific component file, then verify that reloading the SSTable detects the corruption and throws the expected exception. Depends on https://github.com/scylladb/scylladb/pull/28338 Backport is not required, this is new feature Fixes https://github.com/scylladb/scylladb/issues/20103 Closes scylladb/scylladb#28761 * github.com:scylladb/scylladb: test/cqlpy: test --ignore-component-digest-mismatch flag in scylla sstable upgrade docs: document --ignore-component-digest-mismatch flag for scylla sstable upgrade sstables: propagate ignore_component_digest_mismatch config to all load sites sstables: add option to ignore component digest mismatches sstable_compaction_test: Add scrub validate test for corrupted index sstables: add tests for component digest validation on corrupted SSTables sstables: validate index components digests during SSTable scrub in validate mode sstables: verify component digests on SSTable load sstables: add digest_file_random_access_reader for CRC32 digest computation	2026-03-13 09:55:55 +02:00
Avi Kivity	ae8a418744	Merge 'Await async calls in test tablets migration' from Benny Halevy Fix several test cases that did not await async tasks: - test_restart_leaving_replica_during_cleanup - test_restart_in_cleanup_stage_after_cleanup - test_tablet_back_and_forth_migration - test_staging_backlog_is_preserved_with_file_based_streaming Fixes SCYLLADB-910 * Minor fixes, no backport needed Closes scylladb/scylladb#28908 * github.com:scylladb/scylladb: test_tablets_migration: test_staging_backlog_is_preserved_with_file_based_streaming: convert for loop to asyncio.gather test_tablets_migration: test_tablet_back_and_forth_migration: await move_tablet test_tablets_migration: test_restart_in_cleanup_stage_after_cleanup: await move_task test_tablets_migration: test_restart_leaving_replica_during_cleanup: await move_task test_tablets_migration: drop unused imports from cassandra.query	2026-03-13 00:20:29 +02:00
Avi Kivity	b228eb26e6	Merge 'dbuild: Use slirp4netns network in dbuild nested containers' from Calle Wilund Fixes #25084 Add slirp4netns and use for nested containers. This will allow nested container port aliasing, helping CI stability. Note: this contains and updated Dockerfile for dbuild image, but since chicken and eggs, right now will force install slirp4netns before anything in dbuild script. Updates the mock server handling to use ephemeral ports and query from container, ensuring we don't get port collisions. (boost as well as pytest). Includes a timeout up, and a tweak to our scylla_cluster handling, ensuring we don't deadlock when pipe size is less than requires for our sys notify messages. Closes scylladb/scylladb#28727 * github.com:scylladb/scylladb: gcs_fixture: Change to use docker helper aws_kms_fixture: Modify to use docker helper test/lib/proc_util: Add docker helper pytest: use ephemeral port publish for docker mock servers dbuild: Use container network in dbuild nested containers scylla_cluster: Read notify sock in background to prevent deadlock	2026-03-12 23:49:25 +02:00
Nadav Har'El	ad832c263e	test/cluster: mark test_alternator_concurrent_rmw_same_partition_different_server not strictly xfail A few days ago, in commit `7b30a39` we added to pytest.ini the option xfail_strict. This option causes every time a test XPASSes, i.e., an xfail test actually passes - to be considered an error and fail the test. But some tests demonstrate a timing-related bug and do not reproduce the bug every single time. An example we noticed in one CI run is: test/cluster/test_alternator.py::test_alternator_concurrent_rmw_same_partition_different_server This test reproduces a timing-related bug (if you do an LWT write to one partition on to two different coordinators "at the same time", you can get a failure), but only most of the time, not 100% of the time. The solution is to add "strict=False" for the xfail marker on this specific test. This undoes the xfail_strict for this specific test, accepting that this specific test can either pass or fail. Note that this does NOT make this test worthless - we still see this test failing most of the time, and when a developer finally fixes this issue, the test will begin to pass all the time. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-941 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29016	2026-03-12 23:46:23 +02:00
Avi Kivity	03186ce60d	Merge 'Cleanup after auth v1 and default superuser code removal' from Marcin Maliszkiewicz This is short cleanup after recent removal of creating default cassandra superuser and auth-v1 code removal. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1036 Backport: no, just code cleanup Closes scylladb/scylladb#29004 * github.com:scylladb/scylladb: auth: remove DEFAULT_SUPERUSER_NAME constant and dead DEFAULT_USER_PASSWORD auth: use configurable default_superuser in describe_roles auth: move default_superuser to common, remove _superuser member auth: use LOCAL_ONE for all auth queries auth: remove get_auth_ks_name indirection	2026-03-12 23:44:32 +02:00
Avi Kivity	e2eeef3e01	Merge 'service level: remove remnants of version 1 service level' from Gleb Natapov can_use_effective_service_level_cache() always returns true now, so the function can be dropped entirely and all the code that assumes it may return false can be dropped as well. Also drop async versions of find_effective_service_level and get_user_scheduling_group since they are unused. No need to backport, code removal, Closes scylladb/scylladb#29002 * github.com:scylladb/scylladb: service level: make maybe_update_per_service_level_params synchronous service level: remove unused get_user_scheduling_group function service level: drop async find_effective_service_level service level: remove remnants of version 1 service level	2026-03-12 23:39:41 +02:00

1 2 3 4 5 ...

52635 Commits