scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-08 16:03:20 +00:00

Author	SHA1	Message	Date
Andrzej Jackowski	ec42fdfd01	test: add cluster tests for write CL guardrails Most of the functionality is tested in cqlpy tests located in `test_guardrail_write_consistency_level.py`. Add two tests that require the cluster framework: - `test_invalid_write_cl_guardrail_config` checks the node startup path when incorrect `write_consistency_levels_warned` and `write_consistency_levels_disallowed` values are used. - `test_write_cl_default` checks the behavior of the default configuration using a multi-node cluster. Tests execution time: - Dev: 10s - Debug: 18s Refs: SCYLLADB-259	2026-03-04 08:00:17 +01:00
Andrzej Jackowski	446539f12f	test: implement test_guardrail_write_consistency_level Implement basic tests for write consistency level guardrails, verifying that they work for each type of write request (inserts, updates, deletes, logged batches, unlogged batches, conditional batches, and counter operations). All tests are marked as Scylla-only because they currently don't pass with Cassandra due to differences in handling superusers (see: SCYLLADB-882). Tests execution time: - Dev: 3s - Debug: 14s Refs: SCYLLADB-259 Refs: SCYLLADB-882	2026-03-04 08:00:13 +01:00
Andrzej Jackowski	bb359b3b78	cql3: start using write CL guardrails Enable verification of write consistency level guardrails in `modification_statement` and `batch_statement`. Neither guardrail is enabled by default, so as not to disrupt clusters that are currently using any of the CLs for writes. The warning guardrail may seem harmless, as it only adds a warning to the CQL response; however, enabling it can significantly increase network traffic (as a warning message is added to each response) and also decrease throughput due to additional allocations required to prepare the warning. Therefore, both guardrails should be enabled with care. The newly added `writes_per_consistency_level` metric, which is incremented unconditionally, can help decide whether a guardrail can be safely enabled in an existing cluster. This commit adds additional `if` instructions on the critical path. However, based on the `perf_simple_query` benchmark for writes, the difference is marginal (~40 additional instructions, which is a relative difference smaller than 0.001). BEFORE: ``` 291443.35 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48067 insns/op, 18885 cycles/op, 0 errors) throughput: mean= 289743.07 standard-deviation=6075.60 median= 291424.69 median-absolute-deviation=1702.56 maximum=292498.27 minimum=261920.06 instructions_per_op: mean= 48072.30 standard-deviation=21.15 median= 48074.49 median-absolute-deviation=12.07 maximum=48119.87 minimum=48019.89 cpu_cycles_per_op: mean= 18884.09 standard-deviation=56.43 median= 18877.33 median-absolute-deviation=14.71 maximum=19155.48 minimum=18821.57 ``` AFTER: ``` 290108.83 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.2 tasks/op, 48121 insns/op, 18988 cycles/op, 0 errors) throughput: mean= 289105.08 standard-deviation=3626.58 median= 290018.90 median-absolute-deviation=1072.25 maximum=291110.44 minimum=274669.98 instructions_per_op: mean= 48117.57 standard-deviation=18.58 median= 48114.51 median-absolute-deviation=12.08 maximum=48162.18 minimum=48087.18 cpu_cycles_per_op: mean= 18953.43 standard-deviation=28.76 median= 18945.82 median-absolute-deviation=20.84 maximum=19023.93 minimum=18916.46 ``` Fixes: SCYLLADB-259	2026-03-04 07:26:00 +01:00
Andrzej Jackowski	371cdb3c81	cql3/query_processor: implement metrics to track CL of writes Add `write_consistency_levels_disallowed_violations` and `write_consistency_levels_warned_violations` metrics to track violations of write_consistency_levels guardrails. Add `writes_per_consistency_level` to track what CL is used by writes, regardless of the guardrails configuration. Data gathered by this metric can be used to decide whether enabling a particular write consistency level guardrail in a particular existing cluster is safe. Refs: SCYLLADB-259	2026-03-03 21:18:11 +01:00
Andrzej Jackowski	3606934458	db: cql3/query_processor: add write_consistency_levels enum_sets Add enum_sets to query_processor that track the configuration values of `write_consistency_levels_warned` and `write_consistency_levels_disallowed`. Refs: SCYLLADB-259	2026-03-03 20:28:57 +01:00
Andrzej Jackowski	e2c4b0a733	config: add write_consistency_levels_* guardrails configuration Add guardrails configuration that can be used later in this patch series. Refs: SCYLLADB-259	2026-02-25 10:30:03 +01:00
Andrei Chekun	1b92b140ee	test.py: improve stdout output for boost test The current way of checking the boost's stdout can have a race condition when pytest will try to read the file before it was really flushed. So this PR should eliminate this possibility. Closes scylladb/scylladb#28783	2026-02-25 00:50:25 +01:00
Ferenc Szili	f70ca9a406	load_stats: implement the optimized sum of tablet sizes PR #28703 was merged into master but not with the latest version of the changes. This patch is an incremental fix for this. Currently, the elements of the tablet_sizes_per_shard vector are incremented in separate shards. This is prone to false sharing of cache lines, and ping-pong of memory, which leads to reduced performance. In this patch, in order to avoid cache line collisions while updating the sum of tablet sizes per shard, we align the counter to 64 bytes. Fixes: SCYLLADB-678 Closes scylladb/scylladb#28757	2026-02-24 22:19:25 +01:00
Alex	5557770b59	test_mv_build_during_shutdown started two async CREATE MATERIALIZED VIEW operations and never awaited them (asyncio.gather(...) without await). This pr adds await for each one of the tasks to wait for the MV schema to be added successfully and then to start the server shutdown With this change we dont need will not get the shutdown races. Closes scylladb/scylladb#28774	2026-02-24 17:25:05 +01:00
Anna Stuchlik	64b1798513	doc: remove reduntant Java-related information This commit removes: - Instructions to install scylla-jmx (and all references) - The Java 11 requirement for Ubuntu. Fixes https://github.com/scylladb/scylladb/issues/28249 Fixes https://github.com/scylladb/scylladb/issues/28252 Closes scylladb/scylladb#28254	2026-02-24 14:37:39 +01:00
Anna Stuchlik	e2333a57ad	doc: remove the tablets limitation for Alternator This commit removes the information that Alternator doesn't support tablets. The limitation is no longer valid. Fixes SCYLLADB-778 Closes scylladb/scylladb#28781	2026-02-24 14:24:30 +02:00
Andrzej Jackowski	cd4caed3d3	test: fix configuration of test_autoretrain_dict `test_autoretrain_dict` sporadically fails because the default compression algorithm was changed after the test was written. `9ffa62a986815709d0a09c705d2d0caf64776249` was an attempt to fix it by changing the compression configuration during node startup. However, the configuration change had an incorrect YAML format and was ignored by ScyllaDB. This commit fixes it. Fixes: scylladb/scylladb#28204 Closes scylladb/scylladb#28746	2026-02-24 12:08:44 +01:00
Botond Dénes	067bb5f888	test/scylla_gdb: skip coroutine tests if coroutine frame is not found For a while, we have seen coroutine related tests (those that use the coroutine_task fixture) fail occasionally, because no coroutine frame is found. Multiple attempts were made to make this problem self-diagnosing and dump enough information to be able to debug this post-mortem. To no avail so far. A lot of time was invested into this this benign issue: See the long discussion at https://github.com/scylladb/scylladb/issues/22501. It is not known if the bug is in gdb, or the gdb script trying to find the coroutine frame. In any case, both are only used for debugging, so we can tolerate occasional failures -- we are forced to do so when working with gdb anyway. Instead of piling on more effor there, just skip these tests when the problem occurs. This solves the CI flakyness. Fixes: #22501 Closes scylladb/scylladb#28745	2026-02-24 10:12:03 +01:00
Marcin Maliszkiewicz	d5684b98c8	test: cluster: add continue-after-error to perf tool tests Add --continue-after-error true to perf-cql-raw and perf-alternator tests, and --stop-on-error false to perf-simple-query test, so that tests don't abort on the first error. Reason for this is that tests are flaky with example failure: Perf test failed: std::runtime_error (server returned ERROR to EXECUTE) When CPU is starved on CI we can return timeouts and/or other errors. The change should make tests more robust on the expense of smaller test scope. But those tests were written mostly to test startup sequence as it differs from Scylla's starup. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-759 Closes scylladb/scylladb#28767	2026-02-24 11:08:34 +02:00
Avi Kivity	0add130b9d	lua: avoid undefined behavior when converting doubles to integers Lua doesn't have separate integer and floating point numbers, so we check if a number can fit in an integer and if so convert it to an integer. The conversion routine invokes undefined behavior (and even acknowledges it!). More recent compilers changed their behavior when casting infinities, breaking test_user_function_double_return which tests this conversion. Fix by tightening the conversion to not invoke undefined behavior. Closes scylladb/scylladb#28503	2026-02-24 10:41:21 +02:00
Botond Dénes	1d5b8cc562	Merge 'Fix use after free in auth cache' from Marcin Maliszkiewicz This patchset: - ensures the loading semaphore is acquired in cross-shard callbacks - fixes iterator invalidation problem when reloading all cached permissions Fixes https://scylladb.atlassian.net/browse/SCYLLADB-780 Backport: no, affected code not released yet Closes scylladb/scylladb#28766 * github.com:scylladb/scylladb: auth: cache: fix permissions iterator invalidation in reload_all_permissions auth/cache: acquire _loading_sem in cross-shard callbacks	2026-02-24 10:35:46 +02:00
Pavel Emelyanov	5a5eb67144	vector_search/dns: Use newer seastar get_host_by_name API The hostent::addr_list is deprecated in favor of address_entry::addr field that contains the very same addresses. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28565	2026-02-23 21:28:43 +02:00
Pavel Emelyanov	6b02b50e3d	Merge 'object_storage: add retryable machinery to object storage' from Ernest Zaslavsky - add an overload to the rest http client to accept retry strategy instance as an argument - remove hand rolled error handling from object storage client and replace with common machinery that supports handling and retrying when appropriate No backport neede since it is only refactoring Closes scylladb/scylladb#28161 * github.com:scylladb/scylladb: object_storage: add retryable machinery to object storage rest_client: add `simple_send` overload	2026-02-23 21:28:51 +03:00
Botond Dénes	dcd8de86ee	Merge 'docs: update a documentation of adding/removing DC and rebuilding a node' from Aleksandra Martyniuk Describe a procedure to convert tablet keyspace replication factor to rack list. Update the procedures of adding and removing a node to consider tablet keyspaces. Fixes: [SCYLLADB-398](https://scylladb.atlassian.net/browse/SCYLLADB-398) Fixes: https://github.com/scylladb/scylladb/issues/28306. Fixes: https://github.com/scylladb/scylladb/issues/28307. Fixes: https://github.com/scylladb/scylladb/issues/28270. Needs backport to all live branches as they all include tablets. [SCYLLADB-398]: https://scylladb.atlassian.net/browse/SCYLLADB-398?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28521 * github.com:scylladb/scylladb: docs: update nodetool rebuild docs docs: update a procedure of decommissioning a DC docs: update a procedure of adding a DC docs: describe upgrade to enforce_rack_list option docs: describe conversion to rack-list RF	2026-02-23 16:15:16 +02:00
Andrei Chekun	6ae58c6fa6	test.py: move storage tests to cluster subdirectory Move the storage test suite from test/storage/ to test/cluster/storage/ to consolidate related cluster-based tests.This removes the standalone test/storage/suite.yaml as the tests will use the cluster's test configuration. Initially these tests were in cluster, but to use unshare at first iteration they were moved outside. Now they are using another way to handle volumes without unshare, they should be in cluster Closes scylladb/scylladb#28634	2026-02-23 16:14:15 +02:00
Marcin Maliszkiewicz	c5dc086baf	Merge 'vector_search: return NaN for similarity_cosine with all-zero vectors' from Dawid Pawlik The ANN vector queries with all-zero vectors are allowed even on vector indexes with similarity function set to cosine. When enabling the rescoring option, those queries would fail as the rescoring calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception as all-zero vectors were not allowed matching Cassandra's behaviour. To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass, but return the NaN as the cosine similarity for zero vectors is mathematically incorrect. We decided not to use arbitrary values contrary to USearch, for which the distance (not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while supporting the range of values [0, 2]. If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5, which does not support mathematical reasoning why that would be more similar than for example vectors marking obtuse angles. It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact, therefore we return NaN and eliminate them from best results. Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour. Fixes: SCYLLADB-456 Backport to 2026.1 needed, as it fixes the bug for ANN vector queries using rescoring introduced there. Closes scylladb/scylladb#28609 * github.com:scylladb/scylladb: test/vector_search: add reproducer for rescoring with zero vectors vector_search: return NaN for similarity_cosine with all-zero vectors	2026-02-23 13:10:44 +01:00
Aleksandra Martyniuk	9ccc95808f	docs: update nodetool rebuild docs Update nodetool rebuild docs to mention that the command does not work for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28270.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	e4c42acd8f	docs: update a procedure of decommissioning a DC Update a procedure of decommissioning a DC for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28307.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	1c764cf6ea	docs: update a procedure of adding a DC Update a procedure of adding a DC for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28306.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	e08ac60161	docs: describe upgrade to enforce_rack_list option	2026-02-23 12:44:57 +01:00
Aleksandra Martyniuk	eefe66b2b2	docs: describe conversion to rack-list RF Fixes: SCYLLADB-398	2026-02-23 12:41:33 +01:00
Marcin Maliszkiewicz	54dca90e8c	Merge 'test: move dtest/guardrails_test.py to test_guardrails.py' from Andrzej Jackowski This patch series moves `test/cluster/dtest/guardrails_test.py` to `test/cluster/test_guardrails.py`, and migrates it from `cluster/dtest/` to `cluster/` framework. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file No backport, `dtest/guardrails_test.py` is only on master Closes scylladb/scylladb#28737 * github.com:scylladb/scylladb: test: move dtest/guardrails_test.py to test_guardrails.py test: prepare guardrails_test.py to be moved to test/cluster/	2026-02-23 12:34:43 +01:00
Marcin Maliszkiewicz	1293b94039	auth: cache: fix permissions iterator invalidation in reload_all_permissions The inner loops in reload_all_permissions iterate role's permissions and _anonymous_permissions maps across yield points. Concurrent load_permissions calls (which don't hold _loading_sem) can emplace into those same maps during a yield, potentially triggering a rehash that invalidates the active iterator. We want to avoid adding semaphore acquire in load_permissions because it's on a common path (get_permissions). Fixing by snapshotting the keys into a vector before iterating with yields, so no long-lived map iterator is held across suspension points.	2026-02-23 12:14:22 +01:00
Piotr Dulikowski	a4c389413c	Merge 'Hardens MV shutdown behavior by fixing lifecycle tracking for detached view-builder callbacks' from Alex Dathskovsky This series hardens MV shutdown behavior by fixing lifecycle tracking for detached view-builder callbacks and aligning update handling with the same async dispatch style used by create/drop. Patch 1 refactors on_update_view to use a dedicated coroutine dispatcher (dispatch_update_view), keeping update logic serialized under the existing view-builder lock and consistent with the callback architecture already used for create/drop paths. Patch 2 adds explicit callback lifetime coordination in view_builder: - introduce a seastar::gate member - acquire _ops_gate.hold() when launching detached create/update/drop dispatch futures - keep the hold alive until each detached future resolves - close the gate during view_builder::drain() so shutdown waits for in-flight callback work before final teardown Together, these changes reduce shutdown race exposure in MV event handling while preserving existing behavior for normal operation. Testing: - pytest --test-py-init test/cluster/mv (47 passed, 7 skipped) backport: not required started happening in master fixes: SCYLLADB-687 Closes scylladb/scylladb#28648 * github.com:scylladb/scylladb: db/view: gate detached view-builder callbacks during shutdown db:view: refactor on_update_view to use coroutine dispatcher	2026-02-23 11:28:37 +01:00
Marcin Maliszkiewicz	75d4bc26d3	auth/cache: acquire _loading_sem in cross-shard callbacks distribute_role() modifies _roles on non-zero shards via invoke_on_others() without holding _loading_sem. Similarly, load_all()'s invoke_on_others() callback calls prune_all() without the semaphore. When these run concurrently with reload_all_permissions(), which iterates _roles across yield points, an insertion can trigger absl::flat_hash_map::resize(), freeing the backing storage while an iterator still references it. Fix by acquiring _loading_sem on the target shard in both distribute_role()'s and load_all()'s invoke_on_others callbacks, serializing all _roles mutations with coroutines that iterate the map.	2026-02-23 10:30:03 +01:00
Ernest Zaslavsky	321d4caf0c	object_storage: add retryable machinery to object storage remove hand rolled error handling from object storage client and replace with common machinery that supports exception handling and retrying when appropriate	2026-02-22 14:00:44 +02:00
Ernest Zaslavsky	24972da26d	rest_client: add `simple_send` overload add an overload to rest client `simple_send` to accept a retry_strategy for http's make_request	2026-02-22 14:00:44 +02:00
Patryk Jędrzejczak	e8efcae991	Merge 'Use standard ks/cf/data creation methods in object_store/test_basic.py test' from Pavel Emelyanov The test uses create_ks_and_cf helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities. Also it prepares the ground for testing keyspace storage options with rf=3 Cleaning tests, not backporting Closes scylladb/scylladb#28600 * https://github.com/scylladb/scylladb: test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-20 15:53:38 +01:00
Nadav Har'El	d01915131a	test/cqlpy: make test_indexing_paging_and_aggregation much faster Currently, test_secondary_index.py::test_indexing_paging_and_aggregation is very slow, and the slowest test in the test/cqlpy framework: It takes around 13 seconds on dev build, and because it is CPU-bound (doesn't sleep), it is much slower on debug builds. The reason for this slowness is that it needs to set up and read over 10,000 rows which is the default select_internal_page_size. But after the patches in pull request (#25368), we can configure select_internal_page_size, so in this patch we change the test to temporarily reduce this option to just 50, and then the test can reach the same code paths with just 142 rows instead of 20120 rows before this patch. As a result, the test should now be 140 times faster than it was before. In practice, because of some fixed overheads (the test creates several tables and indexes), in dev build mode the test run speedup is "only" 26-fold (to around half a second). I verified that removing the code added in `bb08af7` indeed makes the new shorter test fail - and this is the only test in test_secondary_index.py that starts to fail besides test_index_paging_group_by which is also related (so my revert didn't just break secondary indexing completely). So the shorter test is still a good regression test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28268	2026-02-20 15:44:53 +02:00
Avi Kivity	92bc5568c5	tools: toolchain: build sanitizers for future toolchain The future toolchain did not build the sanitizers, so debug executables did not link. Fix by not disabling the sanitizers. Closes scylladb/scylladb#28733	2026-02-20 15:44:24 +02:00
Botond Dénes	6c04e02f66	Merge 'Fix restoration test's validation of streaming directions' from Pavel Emelyanov The test_restore_with_streaming_scopes among other things checks how data streams flow while restoring. Whether or not to check the streams is decided based on the min tablet count value, which is compared with a hardcoded 512. This value of 512 matched the tablet count used by this test until it was "optimized" by #27839, where this number changed to 5 and streaming checks became off. Good news is that the very same checks are still performed by test_refresh_with_streaming_scopes. But it's better to have a working restoration test anyway. Minor test fix, not backporting Closes scylladb/scylladb#28607 * github.com:scylladb/scylladb: test: Fix the condition for streaming directions validation test: Split test_backup.py::check_data_is_back() into two	2026-02-20 15:42:10 +02:00
Botond Dénes	6f88c0dbd3	Merge ' test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance' from Tomasz Grabiec Currently, the test assumes that when 'topology_coordinator_pause_before_processing_backlog: waiting' is logged, the task for decommission must be there. This was based on the assumption that topology coordinator is idle and decommission request wakes it up. But if the server is slow enough, it may still be running the load balancer in reaction to table creation, and block on that injection point before decommission request was added. Fix by waiting for the task to appear rather than the injection. Fixes SCYLLADB-715 Only 2026.1 vulnerable. Closes scylladb/scylladb#28688 * github.com:scylladb/scylladb: test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance test: cluster: task_manager_client: Introduce wait_task_appears() tests: pylib: util: Add exponential backoff to wait_for	2026-02-20 15:05:36 +02:00
Pavel Emelyanov	c96420c015	tests: Re-use manager.get_server_exe() There's a bunch of incremental repair tests that want to call scylla sstable command. For that they try to find where scylla binary by scanning /proc directory (see local_process_id and get_scylla_path helpers). There's shorter way -- just call manager.get_server_exe(). Same for backup-restore test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28676	2026-02-20 14:59:30 +02:00
Pavel Emelyanov	a4a0d75eee	test/object_store: Parametrize test_simple_backup_and_restore() There are three tests and a function with a pair of boolean parameters called by those. It's less code if the function becomes a test with parameters. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28677	2026-02-20 14:57:30 +02:00
Pavel Emelyanov	a2e1293f86	test/object_store: Squash two simple-backup tests together The test_backup_simple creates a ks/cf, takes a snapshot, backs it up, then checks that the files were uploaded. The test_backup_move does the same, but also plays with 'move_files' parameter to be true/false. In fact, the "move" test was the copy of "simple" one that dropepd check for scheduling group being "streaming" (backup with --move-files can check the same, it's not bad), and check for destination bucket to contain needed files (same here -- checking that files arrived to bucket after --move-files is good). In the end of the day, after the change backup test is run two times, instead of three, and performs extra checks for --move-files case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28606	2026-02-20 14:49:30 +02:00
Botond Dénes	7e90ed657c	Merge 'Fix `client_options` docs' from Karol Baryła https://github.com/scylladb/scylladb/pull/25746 added a new column to `system.clients`: `client_options frozen<map<text, text>>`. This column stores all options sent by the client in the `STARTUP` message. This PR also added `CLIENT_OPTIONS` to the list of values sent in `SUPPORTED` message, and documented that drivers can send their configuration (as JSON) in `STARTUP` under this key. Documentation for the new column was not added to the description of `system.clients` table, and documentation about the new `STARTUP` key was added in `protocol-extensions.md`, but in the section about shard awareness extension. This PR adds missing `system.clients` column description, moves the documentation of `CLIENT_OPTIONS` into its own section, and expands it a bit. Backport: none, because this fixes internal documentation. Closes scylladb/scylladb#28126 * github.com:scylladb/scylladb: protocol-extensions.md: Fix client_options docs system_keyspace.md: Add client_options column system_keyspace.md: Fix order in system.clients	2026-02-20 14:23:34 +02:00
Pavel Emelyanov	525cb5b3eb	table: Use fmt::to_string() to stringify compation group ID Doing it with format("{}", foo) is correct, but to_string is a bit more lightweight. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28630	2026-02-20 14:13:15 +02:00
Patryk Jędrzejczak	d399a197f5	Merge 'raft: Await instead of returning future in wait_for_state_change' from Dawid Mędrek The `try-catch` expression is pretty much useless in its current form. If we return the future, the awaiting will only be performed by the caller, completely circumventing the exception handling. As a result, instead of handling `raft::request_aborted` with a proper error message, the user will face `seastar::abort_requested_exception` whose message is cryptic at best. It doesn't even point to the root of the problem. Fixes SCYLLADB-665 Backport: This is a small improvement and may help when debugging, so let's backport it to all supported versions. Closes scylladb/scylladb#28624 * https://github.com/scylladb/scylladb: test: raft: Add test_aborting_wait_for_state_change raft: Describe exception types for wait_for_state_change and wait_for_leader raft: Await instead of returning future in wait_for_state_change	2026-02-20 12:17:22 +01:00
Andrzej Jackowski	eb5a564df2	test: move dtest/guardrails_test.py to test_guardrails.py This commit moves `guardrails_test.py`, prepared in the previous commit of this patch series, to `test/cluster/test_guardrails.py`. It also cleans up `suite.yaml`.	2026-02-20 11:39:52 +01:00
Andrzej Jackowski	9df426d2ae	test: prepare guardrails_test.py to be moved to test/cluster/ Disable `test/cluster/dtest/guardrails_test.py` in `suite.yaml` and make it compatible with the `test/cluster/` framework. This will allow moving this file from `test/cluster/dtest/` to `test/cluster/` in the next commit of this patch series. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file	2026-02-20 11:39:43 +01:00
Raphael S. Carvalho	f33f324f77	mutation_compactor: Fix tombstone GC metrics to account for only expired There are 3 metrics (that goes in every compaction_history entry): total_tombstone_purge_attempt total_tombstone_purge_failure_due_to_overlapping_with_memtable total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable When a tombstone is not expired (e.g. doesn't satisfy "gc_before" or grace period), it can be currently accounted as failure due to overlapping with either memtable or uncompacting sstable. So those 2 last metrics have noise of unexpired tombstones. What we should do is to only account for expired tombstones in all those 3 metrics. We lose the info of knowing the amount of tombstones processed by compaction, now we'll only know about the expired ones. But those metrics were primarily added for explaining why expired tombstones cannot be removed. We could have alternatively added a new field purge_failure_due_to_being_unexpired or something, but it requires adding a new field to compaction_history. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-737. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#28669	2026-02-20 10:43:58 +02:00
Botond Dénes	0bf4c68af5	Merge 'docs: fix link to docker build readme in the README.MD' from Marcin Szopa Links were pointing to the `debian` subdirectory. However, there docker build was refactored to use `redhat`: `1abf981a73`, see https://github.com/scylladb/scylladb/pull/22910 No backport, just a README link fixes. Closes scylladb/scylladb#28699 * github.com:scylladb/scylladb: docs: fix path to the build_docker.sh which was moved from debian to redhat subdirectory docs: fix link to docker build README.MD	2026-02-20 08:21:46 +02:00
Avi Kivity	66bef0ed36	lua, tools: adjust for lua 5.5 lua_newstate seed parameter Lua 5.5 adds a seed parameter to lua_newstate(), provide it with a strong random seed. Closes scylladb/scylladb#28734	2026-02-20 06:52:37 +02:00
Avi Kivity	27a5502f14	Merge 'Reapply "main: test: add future and abort_source to after_init_func"' from Marcin Maliszkiewicz The patchset fixes abort_source implementation for perf-alternator and perf-cql-raw. It moves run_standalone function to common code in perf.hh with necessary templating. We also add extensive testing so that it's more difficult to break the tooling in the future. Fixes SCYLLADB-560 Backport: no, internal tooling improvement Closes scylladb/scylladb#28541 * github.com:scylladb/scylladb: test: cluster: add tests for perf tools test: perf: fix port race condition on startup in connect workload test: perf: prepare benchmarks to bind to custom host test: perf: make perf-alterantor remote port configurable test: perf: fix ASAN leak warnings in perf-alternator Reapply "main: test: add future and abort_source to after_init_func"	2026-02-19 19:12:46 +02:00
Dawid Mędrek	c9d192c684	Merge 'raft ropology: prevent crashes of multiple nodes' from Patryk Jędrzejczak Some assertions in the Raft-based topology are likely to cause crashes of multiple nodes due to the consistent nature of the Raft-based code. If the failing assertion is executed in the code run by each follower (e.g., the code reloading the in-memory topology state machine), then all nodes can crash. If the failing assertion is executed only by the leader (e.g., the topology coordinator fiber), then multiple consecutive group0 leaders will chain-crash until there is no group0 majority. Crashing multiple nodes is much more severe than necessary. It's enough to prevent the topology state machine from making more progress. This will naturally happen after throwing a runtime error. The problematic fiber will be killed or will keep failing in a loop. Note that it should be safe to block the topology state machine, but not the whole group0, as the topology state machine is mostly isolated from the rest of group0. We replace some occurrences of `on_fatal_internal_error` and `SCYLLA_ASSERT` with `on_internal_error`. These are not all occurrences, as some fatal assertions make sense, for example, in the bootstrap procedure. We also raise an internal error to prevent a segmentation fault in a few places. Fixes #27987 Backporting this PR is not required, but we can consider it at least for 2026.1 because: - it is LTS, - the changes are low-risk, - there shouldn't be many conflicts. Closes scylladb/scylladb#28558 * github.com:scylladb/scylladb: raft topology: prevent accessing nullptr returned by topology::find raft topology: make some assertions non-crashing	2026-02-19 16:50:03 +01:00

1 2 3 4 5 ...

52099 Commits