scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Botond Dénes	8dbcd8a0b3	tools/scylla-sstable: create_table_in_cql_env(): register UDTs recursively It is not enough to go over all column types and register the UDTs. UDTs might be nested in other types, like collections. One has to do a traversal of the type tree and register every UDT on the way. That is what this patch does. This function is used by the query and write operations, which should now both work with nested UDTs. Add a test which fails before and passes after this patch.	2026-02-25 08:51:25 +02:00
Botond Dénes	cf39a5e610	tools/scylla-sstable: generalize dump_if_user_type Rename to invoke_on_user_type() and make the action taken on user types a function parameter. Enables reuse of the traverse logic by other code.	2026-02-25 08:51:25 +02:00
Botond Dénes	80049c88e9	tools/scylla-sstable: move dump_if_user_type() definition So it can be used by create_table_in_cql_env() code.	2026-02-25 08:51:25 +02:00
Pavel Emelyanov	5a5eb67144	vector_search/dns: Use newer seastar get_host_by_name API The hostent::addr_list is deprecated in favor of address_entry::addr field that contains the very same addresses. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28565	2026-02-23 21:28:43 +02:00
Pavel Emelyanov	6b02b50e3d	Merge 'object_storage: add retryable machinery to object storage' from Ernest Zaslavsky - add an overload to the rest http client to accept retry strategy instance as an argument - remove hand rolled error handling from object storage client and replace with common machinery that supports handling and retrying when appropriate No backport neede since it is only refactoring Closes scylladb/scylladb#28161 * github.com:scylladb/scylladb: object_storage: add retryable machinery to object storage rest_client: add `simple_send` overload	2026-02-23 21:28:51 +03:00
Botond Dénes	dcd8de86ee	Merge 'docs: update a documentation of adding/removing DC and rebuilding a node' from Aleksandra Martyniuk Describe a procedure to convert tablet keyspace replication factor to rack list. Update the procedures of adding and removing a node to consider tablet keyspaces. Fixes: [SCYLLADB-398](https://scylladb.atlassian.net/browse/SCYLLADB-398) Fixes: https://github.com/scylladb/scylladb/issues/28306. Fixes: https://github.com/scylladb/scylladb/issues/28307. Fixes: https://github.com/scylladb/scylladb/issues/28270. Needs backport to all live branches as they all include tablets. [SCYLLADB-398]: https://scylladb.atlassian.net/browse/SCYLLADB-398?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28521 * github.com:scylladb/scylladb: docs: update nodetool rebuild docs docs: update a procedure of decommissioning a DC docs: update a procedure of adding a DC docs: describe upgrade to enforce_rack_list option docs: describe conversion to rack-list RF	2026-02-23 16:15:16 +02:00
Andrei Chekun	6ae58c6fa6	test.py: move storage tests to cluster subdirectory Move the storage test suite from test/storage/ to test/cluster/storage/ to consolidate related cluster-based tests.This removes the standalone test/storage/suite.yaml as the tests will use the cluster's test configuration. Initially these tests were in cluster, but to use unshare at first iteration they were moved outside. Now they are using another way to handle volumes without unshare, they should be in cluster Closes scylladb/scylladb#28634	2026-02-23 16:14:15 +02:00
Marcin Maliszkiewicz	c5dc086baf	Merge 'vector_search: return NaN for similarity_cosine with all-zero vectors' from Dawid Pawlik The ANN vector queries with all-zero vectors are allowed even on vector indexes with similarity function set to cosine. When enabling the rescoring option, those queries would fail as the rescoring calls `similarity_cosine` function underneath, causing an `InvalidRequest` exception as all-zero vectors were not allowed matching Cassandra's behaviour. To eliminate the discrepancy we want the all-zero vector `similarity_cosine` calls to pass, but return the NaN as the cosine similarity for zero vectors is mathematically incorrect. We decided not to use arbitrary values contrary to USearch, for which the distance (not to be confused with similarity) is defined as cos(0, 0) = 0, cos(0, x) = 1 while supporting the range of values [0, 2]. If we wanted to convert that to similarity, that would mean sim_cos(0, x) = 0.5, which does not support mathematical reasoning why that would be more similar than for example vectors marking obtuse angles. It's safe to assume that all-zero vectors for cosine similarity shouldn't make any impact, therefore we return NaN and eliminate them from best results. Adjusted the tests accordingly to check both proper Cassandra and Scylla's behaviour. Fixes: SCYLLADB-456 Backport to 2026.1 needed, as it fixes the bug for ANN vector queries using rescoring introduced there. Closes scylladb/scylladb#28609 * github.com:scylladb/scylladb: test/vector_search: add reproducer for rescoring with zero vectors vector_search: return NaN for similarity_cosine with all-zero vectors	2026-02-23 13:10:44 +01:00
Aleksandra Martyniuk	9ccc95808f	docs: update nodetool rebuild docs Update nodetool rebuild docs to mention that the command does not work for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28270.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	e4c42acd8f	docs: update a procedure of decommissioning a DC Update a procedure of decommissioning a DC for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28307.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	1c764cf6ea	docs: update a procedure of adding a DC Update a procedure of adding a DC for tablet keyspaces. Fixes: https://github.com/scylladb/scylladb/issues/28306.	2026-02-23 12:45:01 +01:00
Aleksandra Martyniuk	e08ac60161	docs: describe upgrade to enforce_rack_list option	2026-02-23 12:44:57 +01:00
Aleksandra Martyniuk	eefe66b2b2	docs: describe conversion to rack-list RF Fixes: SCYLLADB-398	2026-02-23 12:41:33 +01:00
Marcin Maliszkiewicz	54dca90e8c	Merge 'test: move dtest/guardrails_test.py to test_guardrails.py' from Andrzej Jackowski This patch series moves `test/cluster/dtest/guardrails_test.py` to `test/cluster/test_guardrails.py`, and migrates it from `cluster/dtest/` to `cluster/` framework. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file No backport, `dtest/guardrails_test.py` is only on master Closes scylladb/scylladb#28737 * github.com:scylladb/scylladb: test: move dtest/guardrails_test.py to test_guardrails.py test: prepare guardrails_test.py to be moved to test/cluster/	2026-02-23 12:34:43 +01:00
Piotr Dulikowski	a4c389413c	Merge 'Hardens MV shutdown behavior by fixing lifecycle tracking for detached view-builder callbacks' from Alex Dathskovsky This series hardens MV shutdown behavior by fixing lifecycle tracking for detached view-builder callbacks and aligning update handling with the same async dispatch style used by create/drop. Patch 1 refactors on_update_view to use a dedicated coroutine dispatcher (dispatch_update_view), keeping update logic serialized under the existing view-builder lock and consistent with the callback architecture already used for create/drop paths. Patch 2 adds explicit callback lifetime coordination in view_builder: - introduce a seastar::gate member - acquire _ops_gate.hold() when launching detached create/update/drop dispatch futures - keep the hold alive until each detached future resolves - close the gate during view_builder::drain() so shutdown waits for in-flight callback work before final teardown Together, these changes reduce shutdown race exposure in MV event handling while preserving existing behavior for normal operation. Testing: - pytest --test-py-init test/cluster/mv (47 passed, 7 skipped) backport: not required started happening in master fixes: SCYLLADB-687 Closes scylladb/scylladb#28648 * github.com:scylladb/scylladb: db/view: gate detached view-builder callbacks during shutdown db:view: refactor on_update_view to use coroutine dispatcher	2026-02-23 11:28:37 +01:00
Ernest Zaslavsky	321d4caf0c	object_storage: add retryable machinery to object storage remove hand rolled error handling from object storage client and replace with common machinery that supports exception handling and retrying when appropriate	2026-02-22 14:00:44 +02:00
Ernest Zaslavsky	24972da26d	rest_client: add `simple_send` overload add an overload to rest client `simple_send` to accept a retry_strategy for http's make_request	2026-02-22 14:00:44 +02:00
Patryk Jędrzejczak	e8efcae991	Merge 'Use standard ks/cf/data creation methods in object_store/test_basic.py test' from Pavel Emelyanov The test uses create_ks_and_cf helper duplicating the existing code that does the same. This PR patches basic tests to use standard facilities. Also it prepares the ground for testing keyspace storage options with rf=3 Cleaning tests, not backporting Closes scylladb/scylladb#28600 * https://github.com/scylladb/scylladb: test/object_store: Remove create_ks_and_cf() helper test/object_store: Replace create_ks_and_cf() usage with standard methods test/object_store: Shift indentation right for test cases	2026-02-20 15:53:38 +01:00
Nadav Har'El	d01915131a	test/cqlpy: make test_indexing_paging_and_aggregation much faster Currently, test_secondary_index.py::test_indexing_paging_and_aggregation is very slow, and the slowest test in the test/cqlpy framework: It takes around 13 seconds on dev build, and because it is CPU-bound (doesn't sleep), it is much slower on debug builds. The reason for this slowness is that it needs to set up and read over 10,000 rows which is the default select_internal_page_size. But after the patches in pull request (#25368), we can configure select_internal_page_size, so in this patch we change the test to temporarily reduce this option to just 50, and then the test can reach the same code paths with just 142 rows instead of 20120 rows before this patch. As a result, the test should now be 140 times faster than it was before. In practice, because of some fixed overheads (the test creates several tables and indexes), in dev build mode the test run speedup is "only" 26-fold (to around half a second). I verified that removing the code added in `bb08af7` indeed makes the new shorter test fail - and this is the only test in test_secondary_index.py that starts to fail besides test_index_paging_group_by which is also related (so my revert didn't just break secondary indexing completely). So the shorter test is still a good regression test. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#28268	2026-02-20 15:44:53 +02:00
Avi Kivity	92bc5568c5	tools: toolchain: build sanitizers for future toolchain The future toolchain did not build the sanitizers, so debug executables did not link. Fix by not disabling the sanitizers. Closes scylladb/scylladb#28733	2026-02-20 15:44:24 +02:00
Botond Dénes	6c04e02f66	Merge 'Fix restoration test's validation of streaming directions' from Pavel Emelyanov The test_restore_with_streaming_scopes among other things checks how data streams flow while restoring. Whether or not to check the streams is decided based on the min tablet count value, which is compared with a hardcoded 512. This value of 512 matched the tablet count used by this test until it was "optimized" by #27839, where this number changed to 5 and streaming checks became off. Good news is that the very same checks are still performed by test_refresh_with_streaming_scopes. But it's better to have a working restoration test anyway. Minor test fix, not backporting Closes scylladb/scylladb#28607 * github.com:scylladb/scylladb: test: Fix the condition for streaming directions validation test: Split test_backup.py::check_data_is_back() into two	2026-02-20 15:42:10 +02:00
Botond Dénes	6f88c0dbd3	Merge ' test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance' from Tomasz Grabiec Currently, the test assumes that when 'topology_coordinator_pause_before_processing_backlog: waiting' is logged, the task for decommission must be there. This was based on the assumption that topology coordinator is idle and decommission request wakes it up. But if the server is slow enough, it may still be running the load balancer in reaction to table creation, and block on that injection point before decommission request was added. Fix by waiting for the task to appear rather than the injection. Fixes SCYLLADB-715 Only 2026.1 vulnerable. Closes scylladb/scylladb#28688 * github.com:scylladb/scylladb: test_tablets_parallel_decommission: Fix flakiness due to delayed task appearance test: cluster: task_manager_client: Introduce wait_task_appears() tests: pylib: util: Add exponential backoff to wait_for	2026-02-20 15:05:36 +02:00
Pavel Emelyanov	c96420c015	tests: Re-use manager.get_server_exe() There's a bunch of incremental repair tests that want to call scylla sstable command. For that they try to find where scylla binary by scanning /proc directory (see local_process_id and get_scylla_path helpers). There's shorter way -- just call manager.get_server_exe(). Same for backup-restore test. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28676	2026-02-20 14:59:30 +02:00
Pavel Emelyanov	a4a0d75eee	test/object_store: Parametrize test_simple_backup_and_restore() There are three tests and a function with a pair of boolean parameters called by those. It's less code if the function becomes a test with parameters. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28677	2026-02-20 14:57:30 +02:00
Pavel Emelyanov	a2e1293f86	test/object_store: Squash two simple-backup tests together The test_backup_simple creates a ks/cf, takes a snapshot, backs it up, then checks that the files were uploaded. The test_backup_move does the same, but also plays with 'move_files' parameter to be true/false. In fact, the "move" test was the copy of "simple" one that dropepd check for scheduling group being "streaming" (backup with --move-files can check the same, it's not bad), and check for destination bucket to contain needed files (same here -- checking that files arrived to bucket after --move-files is good). In the end of the day, after the change backup test is run two times, instead of three, and performs extra checks for --move-files case. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28606	2026-02-20 14:49:30 +02:00
Botond Dénes	7e90ed657c	Merge 'Fix `client_options` docs' from Karol Baryła https://github.com/scylladb/scylladb/pull/25746 added a new column to `system.clients`: `client_options frozen<map<text, text>>`. This column stores all options sent by the client in the `STARTUP` message. This PR also added `CLIENT_OPTIONS` to the list of values sent in `SUPPORTED` message, and documented that drivers can send their configuration (as JSON) in `STARTUP` under this key. Documentation for the new column was not added to the description of `system.clients` table, and documentation about the new `STARTUP` key was added in `protocol-extensions.md`, but in the section about shard awareness extension. This PR adds missing `system.clients` column description, moves the documentation of `CLIENT_OPTIONS` into its own section, and expands it a bit. Backport: none, because this fixes internal documentation. Closes scylladb/scylladb#28126 * github.com:scylladb/scylladb: protocol-extensions.md: Fix client_options docs system_keyspace.md: Add client_options column system_keyspace.md: Fix order in system.clients	2026-02-20 14:23:34 +02:00
Pavel Emelyanov	525cb5b3eb	table: Use fmt::to_string() to stringify compation group ID Doing it with format("{}", foo) is correct, but to_string is a bit more lightweight. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28630	2026-02-20 14:13:15 +02:00
Patryk Jędrzejczak	d399a197f5	Merge 'raft: Await instead of returning future in wait_for_state_change' from Dawid Mędrek The `try-catch` expression is pretty much useless in its current form. If we return the future, the awaiting will only be performed by the caller, completely circumventing the exception handling. As a result, instead of handling `raft::request_aborted` with a proper error message, the user will face `seastar::abort_requested_exception` whose message is cryptic at best. It doesn't even point to the root of the problem. Fixes SCYLLADB-665 Backport: This is a small improvement and may help when debugging, so let's backport it to all supported versions. Closes scylladb/scylladb#28624 * https://github.com/scylladb/scylladb: test: raft: Add test_aborting_wait_for_state_change raft: Describe exception types for wait_for_state_change and wait_for_leader raft: Await instead of returning future in wait_for_state_change	2026-02-20 12:17:22 +01:00
Andrzej Jackowski	eb5a564df2	test: move dtest/guardrails_test.py to test_guardrails.py This commit moves `guardrails_test.py`, prepared in the previous commit of this patch series, to `test/cluster/test_guardrails.py`. It also cleans up `suite.yaml`.	2026-02-20 11:39:52 +01:00
Andrzej Jackowski	9df426d2ae	test: prepare guardrails_test.py to be moved to test/cluster/ Disable `test/cluster/dtest/guardrails_test.py` in `suite.yaml` and make it compatible with the `test/cluster/` framework. This will allow moving this file from `test/cluster/dtest/` to `test/cluster/` in the next commit of this patch series. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file	2026-02-20 11:39:43 +01:00
Raphael S. Carvalho	f33f324f77	mutation_compactor: Fix tombstone GC metrics to account for only expired There are 3 metrics (that goes in every compaction_history entry): total_tombstone_purge_attempt total_tombstone_purge_failure_due_to_overlapping_with_memtable total_tombstone_purge_failure_due_to_overlapping_with_uncompacting_sstable When a tombstone is not expired (e.g. doesn't satisfy "gc_before" or grace period), it can be currently accounted as failure due to overlapping with either memtable or uncompacting sstable. So those 2 last metrics have noise of unexpired tombstones. What we should do is to only account for expired tombstones in all those 3 metrics. We lose the info of knowing the amount of tombstones processed by compaction, now we'll only know about the expired ones. But those metrics were primarily added for explaining why expired tombstones cannot be removed. We could have alternatively added a new field purge_failure_due_to_being_unexpired or something, but it requires adding a new field to compaction_history. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-737. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#28669	2026-02-20 10:43:58 +02:00
Botond Dénes	0bf4c68af5	Merge 'docs: fix link to docker build readme in the README.MD' from Marcin Szopa Links were pointing to the `debian` subdirectory. However, there docker build was refactored to use `redhat`: `1abf981a73`, see https://github.com/scylladb/scylladb/pull/22910 No backport, just a README link fixes. Closes scylladb/scylladb#28699 * github.com:scylladb/scylladb: docs: fix path to the build_docker.sh which was moved from debian to redhat subdirectory docs: fix link to docker build README.MD	2026-02-20 08:21:46 +02:00
Avi Kivity	66bef0ed36	lua, tools: adjust for lua 5.5 lua_newstate seed parameter Lua 5.5 adds a seed parameter to lua_newstate(), provide it with a strong random seed. Closes scylladb/scylladb#28734	2026-02-20 06:52:37 +02:00
Avi Kivity	27a5502f14	Merge 'Reapply "main: test: add future and abort_source to after_init_func"' from Marcin Maliszkiewicz The patchset fixes abort_source implementation for perf-alternator and perf-cql-raw. It moves run_standalone function to common code in perf.hh with necessary templating. We also add extensive testing so that it's more difficult to break the tooling in the future. Fixes SCYLLADB-560 Backport: no, internal tooling improvement Closes scylladb/scylladb#28541 * github.com:scylladb/scylladb: test: cluster: add tests for perf tools test: perf: fix port race condition on startup in connect workload test: perf: prepare benchmarks to bind to custom host test: perf: make perf-alterantor remote port configurable test: perf: fix ASAN leak warnings in perf-alternator Reapply "main: test: add future and abort_source to after_init_func"	2026-02-19 19:12:46 +02:00
Dawid Mędrek	c9d192c684	Merge 'raft ropology: prevent crashes of multiple nodes' from Patryk Jędrzejczak Some assertions in the Raft-based topology are likely to cause crashes of multiple nodes due to the consistent nature of the Raft-based code. If the failing assertion is executed in the code run by each follower (e.g., the code reloading the in-memory topology state machine), then all nodes can crash. If the failing assertion is executed only by the leader (e.g., the topology coordinator fiber), then multiple consecutive group0 leaders will chain-crash until there is no group0 majority. Crashing multiple nodes is much more severe than necessary. It's enough to prevent the topology state machine from making more progress. This will naturally happen after throwing a runtime error. The problematic fiber will be killed or will keep failing in a loop. Note that it should be safe to block the topology state machine, but not the whole group0, as the topology state machine is mostly isolated from the rest of group0. We replace some occurrences of `on_fatal_internal_error` and `SCYLLA_ASSERT` with `on_internal_error`. These are not all occurrences, as some fatal assertions make sense, for example, in the bootstrap procedure. We also raise an internal error to prevent a segmentation fault in a few places. Fixes #27987 Backporting this PR is not required, but we can consider it at least for 2026.1 because: - it is LTS, - the changes are low-risk, - there shouldn't be many conflicts. Closes scylladb/scylladb#28558 * github.com:scylladb/scylladb: raft topology: prevent accessing nullptr returned by topology::find raft topology: make some assertions non-crashing	2026-02-19 16:50:03 +01:00
Marcin Maliszkiewicz	22c3d8d609	Merge 'db/config: enable table audit by default' from Piotr Smaron In https://github.com/scylladb/scylladb/pull/27262 table audit has been re-enabled by default in `scylla.yaml`, logging certain categories to a table, which should make new Scylla deployments have audit enabled. Now, in the next release, we also want to enable audit in `db/config.cc`, which should enable audit for all deployments, which don't explicitly configure audit otherwise in `scylla.yaml` (or via cmd line). BTW. Because this commit aligns audit's default config values in `db/config.cc` to those of `scylla.yaml`, `docs/reference/configuration-parameters.rst`, which is based on `db/config.cc` will start showing that table audit is the default. Refs: https://github.com/scylladb/scylladb/issues/28355 Refs: https://scylladb.atlassian.net/browse/SCYLLADB-222 No backport: table audit has been enabled in 2026.1 in `scylla.yaml`, and should be always on starting from the next release, which is the release we're currently merging to (2026.2). Closes scylladb/scylladb#28376 * github.com:scylladb/scylladb: docs: decommission: note audit ks may require ALTERing docs: mention table audit enabled by default audit: disable DDL by default db/config: enable table audit by default test/cluster: fix `test_table_desc_read_barrier` assertion test/cluster: adjust audit in tests involving decommissioning its ks audit_test: fix incorrect config in `test_audit_type_none`	2026-02-19 16:30:11 +01:00
Pavel Emelyanov	b4b9b547ce	replica: Remove unused sched groups from keyspace and table configs Compaction and statement groups are carried over on those configs, but are in fact unused. Drop both. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28540	2026-02-19 15:47:31 +01:00
Patryk Jędrzejczak	45115415fb	Merge 'Parametrize and merge several restoration test cases' from Pavel Emelyanov There are four tests that check how restore with primary-replica-only option works in various scopes and topologies. Cases that check same-racks and same-datacenters are very very similar, so are those that check different-racks and different-datacenters. Parametrizing them and merging saves lots of code (+30 lines, -116 lines) It's probably worth merging the resulting same-domain with different-domain tests, because the similarity is still large in both, but the result becomes too if-y, so not done here. Maybe later. Improving tests, not backporting Closes scylladb/scylladb#28569 * https://github.com/scylladb/scylladb: test: Merge test_restore_primary_replica_different_... tests test: Merge test_restore_primary_replica_same_... tests test: Don't specify expected_replicas in test_restore_primary_replica_different_dc_scope_all test: Remove local r_servers variable from test_restore_primary_replica_different_dc_scope_all	2026-02-19 15:42:55 +01:00
Pavel Emelyanov	26372e65df	Merge 's3_perf: Fix the s3 perf test' from Ernest Zaslavsky Fix the build of the test and the upload operation flow No need to backport since it is only a test we barely use Closes scylladb/scylladb#28595 * github.com:scylladb/scylladb: s3_perf: fix upload operation flow s3_perf: fix the CMake build	2026-02-19 15:31:43 +02:00
Avi Kivity	7ec710c250	Merge 'tablets: Reduce per-shard migration concurrency to 2' from Tomasz Grabiec Tablet migration keeps sstable snapshot during streaming, which may cause temporary increase in disk utilization if compaction is running concurrently. SSTables compacted away are kept on disk until streaming is done with them. The more tablets we allow to migrate concurrently, the higher disk space can rise. When the target tablet size is configured correcly, every tablet should own about 1% of disk space. So concurrency of 4 shouldn't put us at risk. But target tablet size is not chosen dynamically yet, and it may not be aligned with disk capacity. Also, tablet sizes can temporarily grow above the target, up to 2x before the split starts, and some more because splits take a while to complete. To reduce the impact from this, reduce concurrency of migration. Concurrency of 2 should still be enough to saturate resources on the leaving shard. Also, reducing concurrency means that load balancing is more responsive to preemption. There will be less bandwidth sharing, so scheduled migrations complete faster. This is important for scale-out, where we bootstrap a node and want to start migrations to that new node as soon as possible. Refs scylladb/siren#15317 Closes scylladb/scylladb#28563 * github.com:scylladb/scylladb: tablets, config: Reduce migration concurrency to 2 tablets: load_balancer: Always accept migration if the load is 0 config, tablets: Make tablet migration concurrency configurable	2026-02-19 15:31:43 +02:00
Dawid Mędrek	fae71f79c2	test: raft: Add test_aborting_wait_for_state_change	2026-02-19 14:21:01 +01:00
Dawid Mędrek	e4f2b62019	raft: Describe exception types for wait_for_state_change and wait_for_leader The methods of `raft::server` are abortable and if the passed `abort_source` is triggered, they throw `raft::request_aborted`. We document that. Although `raft::server` is an interface, this is consistent with the descriptions of its other methods.	2026-02-19 12:47:14 +01:00
Dawid Mędrek	c36623baad	raft: Await instead of returning future in wait_for_state_change The `try-catch` expression is pretty much useless in its current form. If we return the future, the awaiting will only be performed by the caller, completely circumventing the exception handling. As a result, instead of handling `raft::request_aborted` with a proper error message, the user will face `seastar::abort_requested_exception` whose message is cryptic at best. It doesn't even point to the root of the problem. Fixes SCYLLADB-665	2026-02-19 12:47:14 +01:00
Marcin Maliszkiewicz	de4e5e10af	test: perf: fix prepared statements logic in perf-simple-query Due to lack of checks present in process_execute_internal from transport/server.cc needs_authorization bool was always set to true doing some extra work (check_access()) for each request. We mirror the logic in this patch in test env which perf-simple-query uses. This can also potentially improve runtime of unittests (marginally). Note that bug is only in perf tool not scylla itself, the fix decreases insns/op by around 10%: Before: 41065 insns/op After: 37452 insns/op Command: ./build/release/scylla perf-simple-query --duration 5 --smp 1 Fixes https://github.com/scylladb/scylladb/issues/27941 Closes scylladb/scylladb#28704	2026-02-19 12:42:07 +02:00
Avi Kivity	58a662b9db	dist: refresh container base image (ubi9-minimal) Using an outdated image can cause problems when `microdnf update` runs, if the distribution doesn't maintain good update hygiene. Although, I suspect that when update failures happen they're really caused by propagation delay of packages to mirrors. Fix by using --pull=always to get a fresh image. Ref https://scylladb.atlassian.net/browse/SCYLLADB-714 Closes scylladb/scylladb#28680	2026-02-19 12:42:43 +03:00
Ferenc Szili	f1bc17bd4c	load_stats: fix race condition when computing sum_tablet_sizes In storage_service::load_stats_for_tablet_based_tables(), we are passing a reference to sum_tablet_sizes to the lambda which increments this value on each shard via map_reduce0(). This means we could have a race condition because this is executed on separate threads/CPUs. This patch fixed the problem by collecting the sums by shard into a vector, then summing those up. Refs: SCYLLADB-678 Closes scylladb/scylladb#28703	2026-02-19 12:29:25 +03:00
Avi Kivity	dee868b71a	interval: avoid clang 23 warning on throw statement in potentially noexcept function interval_data's move constructor is conditionally noexcept. It contains a throw statemnt for the case that the underlying type's move constructor can throw; that throw statemnt is never executed if we're in the noexept branch. Clang 23 however doesn't understand that, and warns about throwing in a noexcept function. Fix that by rewriting the logic using seastar::defer(). In the noexcept case, the optimizer should eliminate it as dead code. Closes scylladb/scylladb#28710	2026-02-19 12:24:20 +03:00
Ernest Zaslavsky	45d824e0fe	s3_perf: fix upload operation flow Correct the upload operation logic. The previous flow incorrectly checked for the test file on S3 even when performing operations that do not download the file, such as uploads.	2026-02-19 11:14:59 +02:00
Botond Dénes	b637e17b19	db/config: don't use RBNO for scaling Remove bootstrap and decomission from allowed_repair_based_node_ops. Using RBNO over streaming for these operations has no benefits, as they are not exposed to the out-of-date replica problem that replace, removenode and rebuild are. On top of that, RBNO is known to have problems with empty user tables. Using streaming for boostrap and decomission is safe and faster than RBNO in all condition, especially when the table is small. One test needs adjustment as it relies on RBNO being used for all node ops. Fixes: SCYLLADB-105 Closes scylladb/scylladb#28080	2026-02-19 09:51:09 +01:00
Calle Wilund	8e71a6f52a	gcp: Add handling of 429 (too many requests) to exponential backoff Fixes: SCYLLADB-611 Adds http error code 429 to codes handled by exponential backoff. Closes scylladb/scylladb#28588	2026-02-19 09:42:39 +01:00

1 2 3 4 5 ...

52084 Commits