scylladb

Author	SHA1	Message	Date
Łukasz Paszkowski	f06094aa95	topology_coordinator: add write_both_read_old_fallback_cleanup state Yet another barrier-failure scenario exists in the `write_both_read_new` state. When the barrier fails, the tablet is expected to transition to `cleanup_target`, but because barrier execution is asynchronous, the cleanup transition can be skipped entirely and the tablet may continue forward instead. Both `write_both_read_new` and `cleanup_target` modify read and write selectors. In this situation, a barrier is required, and transitioning directly between these states without one is unsafe. Introduce an intermediate `write_both_read_old_fallback_cleanup` state that modifies only a read selector and can be entered without a barrier (there is no need to wait for all nodes to start using the "new" read selector). From there, the tablet can proceed to `cleanup_target`, where the required barriers are enforced. This also avoids changing both selectors in a single step. A direct transition from `write_both_read_new` to `cleanup_target` updates both selectors at once, which can leave coordinators using the old selector for writes and the new selector for reads, causing reads to miss preceding writes. By routing through the fallback state, selectors are updated in order—read first, then write—preserving read-after-write correctness.	2026-01-26 13:14:37 +01:00
Patryk Jędrzejczak	67045b5f17	Merge 'raft_topology, tablets: Drain tablets in parallel with other topology operations' from Tomasz Grabiec Allows other topology operations to execute while tablets are being drained on decommission. In particular, bootstrap on scale-out. This is important for elasticity. Allows multiple decommission/removenode to happen in parallel, which is important for efficiency. Flow of decommission/removenode request: 1) pending and paused, has tablet replicas on target node. Tablet scheduler will start draining tablets. 2) No tablets on target node, request is pending but not paused 3) Request is scheduled, node is in transition 4) Request is done Nodes are considered draining as soon as there is a leave or remove request on them. If there are tablet replicas present on the target node, the request is in a paused state and will not be picked by topology coordinator. The paused state is computed from topology state automatically on reload. When request is not paused, its execution starts in write_both_read_old state. The old tablet_draining state is not entered (it's deprecated now). Tablet load balancing will yield the state machine as soon as some request is no longer paused and ready to be scheduled, based on standard preemption mechanics. Fixes #21452 Closes scylladb/scylladb#24129 * https://github.com/scylladb/scylladb: docs: Document parallel decommission and removenode and relevant task API test: Add tests for parallel decommission/removenode test: util: Introduce ensure_group0_leader_on() test: tablets: Check that there are no migrations scheduled on draining nodes test: lib: topology_builder: Introduce add_draining_request() topology_coordinator, tablets: Fail draining operations when tablet migration fails due to critical disk utilization tablets: topology_coordinator: Refactor to propagate reason for migration rollback tablet_allocator: Skip co-location on draining nodes node_ops: task_manager_module: Populate entity field also for active requests tasks: node_ops: Put node id in the entity field tasks, node_ops: Unify setting of task_stats in get_status() and get_stats() topology: Protect against empty cancelation reason tasks, topology: Make pending node operations abortable doc: topology-over-raft.md: Fix diagram for replacing, tablet_draining is not engaged raft_topology, tablets: Drain tablets in parallel with other topology operations virtual_tables: Show draining and excluded fields in system.cluster_status and system.load_by_node locator: topology: Add "draining" flag to a node topology_coordinator: Extract generate_cancel_request_update() storage_service: Drop dependency in topology_state_machine.hh in the header locator: Extract common code in assert_rf_rack_valid_keyspace() topology_coordinator, storage_service: Validate node removal/decommission at request submission time	2026-01-22 13:06:53 +01:00
Tomasz Grabiec	a009644c7d	raft_topology, tablets: Drain tablets in parallel with other topology operations Allows other topology operations to execute while tablets are being drained on decommission. In particular, bootstrap on scale-out. This is important for elasticity. Allows multiple decommission/removenode to happen in parallel, which is important for efficiency. Flow of decommission/removenode request: 1) pending and paused, has tablet replicas on target node. Tablet scheduler will start draining tablets. 2) No tablets on target node, request is pending but not paused 3) Request is scheduled, node is in transition 4) Request is done Nodes are considered draining as soon as there is a leave or remove request on them. If there are tablet replicas present on the target node, the request is in a paused state and will not be picked by topology coordinator. The paused state is computed from topology state automatically on reload. When request is not paused, its execution starts in write_both_read_old state. The old tablet_draining state is not entered (it's deprecated now). Tablet load balancing will yield the state machine as soon as some request is no longer paused and ready to be scheduled, based on standard preemption mechanics. The test case test_explicit_tablet_movement_during_decommission is removed. It verifies that tablet move API works during tablet draining transition. After this PR, we no longer enter this transition, so the test doesn't work. It loses its purpose, because movement during normal tablet balancing is not special and tested elsewhere.	2026-01-18 15:36:05 +01:00
Calle Wilund	da17e8b18b	gossiper/main: Extend special treatment of node ID resolve for rpc_address Refs #27429 If running with broadcast_address != listen/cql/rpc address, topology gets confused about the varying addresses. Need to special case resolve both addresses as "self". I.e. extend broadcast_address treatment to cql_address as well. Added export of this via gossiper for symmetry.	2026-01-13 14:12:19 +01:00
Botond Dénes	af6cb0d0a4	Merge 'raft topology: preserve IP -> ID mapping of a replacing node on restart' from Patryk Jędrzejczak We currently do it only for a bootstrapping node, which is a bug. The missing IP can cause an internal error, for example, in the following scenario: - replace fails during streaming, - all live nodes are shut down before the rollback of replace completes, - all live nodes are restarted, - live nodes start hitting internal error in all operations that require IP of the replacing node (like client requests or REST API requests coming from nodetool). We fix the bug here, but we do it separately for replace with different IP and replace with the same IP. For replace with different IP, we persist the IP -> host ID mapping in `system.peers` just like for bootstrap. That's necessary, since there is no other way to determine IP of the replacing node on restart. For replace with the same IP, we can't do the same. This would require deleting the row corresponding to the node being replaced from `system.peers`. That's fine in theory, as that node is permanently banned, so its IP shouldn't be needed. Unfortunately, we have many places in the code where we assume that IP of a topology member is always present in the address map or that a topology member is always present in the gossiper endpoint set. Examples of such places: - nodetool operations, - REST API endpoints, - `db::hints::manager::store_hint`, - `group0_voter_handler::update_nodes`. We could fix all those places and verify that drivers work properly when they see a node in the token metadata, but not in `system.peers`. However, that would be too risky to backport. We take a different approach. We recover IP of the replacing node on restart based on the state of the topology state machine and `system.peers` just after loading `system.peers`. We rely on the fact that group 0 is set up at this point. The only case where this assumption is incorrect is a restart in the Raft-based recovery procedure. However, hitting this problem then seems improbable, and even if it happens, we can restart the node again after ensuring that no client and REST API requests come before replace is rolled back on the new topology coordinator. Hence, it's not worth to complicate the fix (by e.g. looking at the persistent topology state instead of the in-memory state machine). Fixes #28057 Backport this PR to all branches as it fixes a problematic bug. Closes scylladb/scylladb#27435 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: make generations of excluded nodes negative test: introduce test_full_shutdown_during_replace utils: error_injection: allow aborting wait_for_message raft topology: preserve IP -> ID mapping of a replacing node on restart	2026-01-09 14:56:16 +02:00
Patryk Jędrzejczak	eee2b6c7af	Merge 'tablets: Make balancing disabling RPC preempt tablet transitions' from Tomasz Grabiec Disabling of balancing waits for topology state machine to become idle, to guarantee that no migrations are happening or will happen after the call returns. But it doesn't interrupt the scheduler, which means the call can take arbitrary amount of time. It may wait for tablet repair to be finished, which can take many hours. We should do it via topology request, which will interrupt the tablet scheduler. Enabling of balancing can be immediate. Fixes https://github.com/scylladb/scylladb/issues/27647 Fixes #27210 Closes scylladb/scylladb#27736 * https://github.com/scylladb/scylladb: test: Verify that repair doesn't block disabling of tablet load balancing tablets: Make balancing disabling call preempt tablet transitions	2026-01-08 21:55:19 +02:00
Asias He	4f77dd058d	repair: Add tablet repair progress report support This patch adds tablet repair progress report support so that the user could use the /task_manager/task_status API to query the progress. In order to support this, a new system table is introduced to record the user request related info, i.e, start of the request and end of the request. The progress is accurate when tablet split or merge happens in the middle of the request, since the tokens of the tablet are recorded when the request is started and when repair of each tablet is finished. The original tablet repair is considered as finished when the finished ranges cover the original tablet token ranges. After this patch, the /task_manager/task_status API will report correct progress_total and progress_completed. Fixes #22564 Fixes #26896 Closes scylladb/scylladb#27679	2026-01-08 21:55:18 +02:00
Tomasz Grabiec	ccdb301731	tablets: Make balancing disabling call preempt tablet transitions This patch modifies RESTful API handler which disables tablet balancing to use topology request to wait for already running tablet transitions. Before, it was just waiting for topology to be idle, so it could wait much longer than necessary, also for operations which are not affected by the flag, like repair. And repair can take hours. New request type is introduced for this synchronization: noop_request. It will preempt the tablet scheduler, and when the request executes, we know all later tablet transitions will respect the "balancing disabled" flag, and only things which are unuaffected by the flag, like repair, will be scheduled. Fixes #27647	2026-01-05 13:22:08 +01:00
Patryk Jędrzejczak	0fed9f94f8	gossiper: add_saved_endpoint: make generations of excluded nodes negative The explanation is in the new comment in `gossiper::add_saved_endpoint`. We add a test for this change. It's "extremely white-box", but it's better than nothing.	2025-12-29 19:13:55 +01:00
Ferenc Szili	b7ebd73e53	load_balancer: add cluster feature for size based balancing This patch adds a cluster feature size_based_load_balancing which, until enabled, will force capacity based balancing. This is needed because during rolling upgrades some of the nodes will have incomplete data in load_stats (missing tablet sizes and effective_capacity) which are needed for size based balancing to make good decisions and issue correct migrations.	2025-12-27 11:39:08 +01:00
Emil Maskovsky	ba6fabfc88	features: add feature flag for removenode via left token ring To improve the behavior of the removenode operation, we want to issue a global topology barrier after the removenode has been applied. However, this requires changing the topology state machine to add a new state (left_token_ring) to the removenode flow, which is not supported by older nodes. To allow rolling upgrades, we add a feature flag REMOVENODE_WITH_LEFT_TOKEN_RING that controls whether the new removenode flow is used.	2025-12-17 13:31:11 +01:00
Andrzej Jackowski	2e7070d3b7	gms: add CLIENT_ROUTES feature The feature will be used later in this patch series: - To avoid unnecessary operations when the feature is not enabled - To guard new API endpoints from being used before the cluster is ready to use them. - To implement update tests (by disabling/enabling the feature) Ref: scylladb/scylla-enterprise#5699	2025-12-15 13:08:04 +01:00
Avi Kivity	24264e24bb	Revert "repair: Add tablet repair progress report support" This reverts commit `faad0167d7`. It causes a regression in test_two_tablets_concurrent_repair_and_migration_repair_writer_level in debug mode (with ~5%-10% probability). Fixes #27510. Closes scylladb/scylladb#27560	2025-12-11 12:18:11 +02:00
Asias He	faad0167d7	repair: Add tablet repair progress report support This patch adds tablet repair progress report support so that the user could use the /task_manager/task_status API to query the progress. In order to support this, a new system table is introduced to record the user request related info, i.e, start of the request and end of the request. The progress is accurate when tablet split or merge happens in the middle of the request, since the tokens of the tablet are recorded when the request is started and when repair of each tablet is finished. The original tablet repair is considered as finished when the finished ranges cover the original tablet token ranges. After this patch, the /task_manager/task_status API will report correct progress_total and progress_completed. Fixes #22564 Fixes #26896 Closes scylladb/scylladb#26924	2025-12-08 13:35:19 +02:00
Pavel Emelyanov	54edb44b20	code: Stop using seastar::compat::source_location And switch to std::source_location. Upcoming seastar update will deprecate its compatibility layer. The patch is for f in $(git grep -l 'seastar::compat::source_location'); do sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f; done and removal of few header includes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27309	2025-11-27 19:10:11 +02:00
Andrzej Jackowski	e366030a92	treewide: seastar module update The reason for this seastar update is to have the fixed handling of the `integer` type in `seastar-json2code` because it's needed for further development of ScyllaDB REST API. The following changes were introduced to ScyllaDB code to ensure it compiles with the updated seastar: - Remove `seastar/util/modules.hh` includes as the file was removed from seastar - Modified `metrics::impl::labels_type` construction in `test/boost/group0_test.cc` because now it requires `escaped_string` * seastar 340e14a7...8c3fba7a (32): > Merge 'Remove net::packet usage from dns.cc' from Pavel Emelyanov dns: Optimize packet sending for newer c-ares versions dns: Replace net::packet with vector<temporary_buffer> dns: Remove unused local variable dns: Remove pointless for () loop wrapping dns: Introduce do_sendv_tcp() method dns: Introduce do_send_udp() method > test: Add http rules test of matching order > Merge 'Generalize packet_data_source into memory_data_source' from Pavel Emelyanov memcached: Patch test to use memory_data_source memcached: Use memory_data_source in server rpc: Use memory_data_sink without constructing net::packet util: Generalize packet_data_source into memory_data_source > tests: coroutines: restore "explicit this" tests > reactor: remove blocking of SIGILL > Merge 'Update compilers in GH actions scripts' from Pavel Emelyanov github: Use gcc-14 github: Use clang-20 > Merge 'Reinforce DNS reverse resolution test ' from Pavel Emelyanov test: Make test_resolve() try several addresses test: Coroutinize test_resolve() helper > modules: make module support standards-compliant > Merge 'Fix incorrect union access in dns resolver' from Pavel Emelyanov dns: Squash two if blocks together dns: Do not check tcp entry for udp type > coroutine: Fix compilation of execute_involving_handle_destruction_in_await_suspend > promise: Document that promise is resolved at most once > coroutine: exception: workaround broken destroy coroutine handle in await_suspend > socket: Return unspecified socket_address for unconnected socket > smp: Fix exception safety of invoke_on_... internal copying > Merge 'Improve loads evaluation by reactor' from Pavel Emelyanov reactor: Keep loads timer on reactor reactor: Update loads evaluation loop > Merge 'scripts: add 'integer' type to seastar-json2code' from Andrzej Jackowski test: extend tests/unit/api.json to use 'integer' type scripts: add 'integer' type to seastar-json2code > Merge 'Sanitize tls::session::do_put(_one)? overloads' from Pavel Emelyanov tls: Rename do_put_one(temporary_buffer) into do_put() tls: Fix indentation after previous patch tls: Move semaphore grab into iterating do_put() > net: tcp: change unsent queue from packets to temporary_buffer:s > timer: Enable highres timer based on next timeout value > rpc: Add a new constructor in closed_error to accept string argument > memcache: Implement own data sink for responses > Merge 'file: recursive_remove_directory: general cleanup' from Avi Kivity file: do_recursive_remove_directory(): move object when popping from queue file: do_recursive_remove_directory(): adjust indentation file: do_recursive_remove_directory(): coroutinize file: do_recursive_remove_directory(): simplify conditional file: do_recursive_remove_directory(): remove wrong const file: do_recursive_remove_directory(): clean up work_entry > tests: Move thread_context_switch_test into perf/ > test: Add unit test for append_challenged_posix_file > Merge 'Prometheus metrics handler optimization' from Travis Downs prometheus: optimize metrics aggregation prometheus: move and test aggregate_by helper prometheus: various optimizations metrics: introduce escaped_string for label values metric:value: implement + in terms of += tests: add prometheus text format acceptance tests extract memory_data_sink.hh metrics_perf: enhance metrics bench > demos: Simplify udp_zero_copy_demo's way of preparing the packet > metrics: Remove deprecated make_...-ers > Merge 'Make slab_test be BOOST kind' from Pavel Emelyanov test: Use BOOST_REQUIRE checkers test: Replace some SEASTAR_ASSERT-s with static_assert-s test: Convert slab test into boost kind > Merge 'Coroutinize lister_test' from Pavel Emelyanov test: Fix indentation after previuous patch test: Coroutinize lister_test lister::report() method test: Coroutinize lister_test main code > file: recursive_remove_directory(): use a list instead of a deque > Merge 'Stop using packets in tls data_sink and session' from Pavel Emelyanov tls: Stop using net::packet in session::put() tls: Fix indentation after previous patch tls: Split session::do_put() tls: Mark some session methods private Closes scylladb/scylladb#27240	2025-11-27 12:34:22 +02:00
Yaniv Michael Kaul	765a7e9868	gms/gossiper.cc: fix gossip log to show host-id/ip instead of host-id/host-id Probably a copy-paste error, fixes the log to print host-id/ip. Backport: no need, benign log issue. Fixes: https://github.com/scylladb/scylladb/issues/27113 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#27225	2025-11-25 20:56:20 +01:00
Radosław Cybulski	d589e68642	Add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros,include_file_mtime,include_file_ctime Closes scylladb/scylladb#26617	2025-11-21 12:27:41 +02:00
Michael Litvak	9208b2f317	cql3: allow counters with tablets Now that counters work with tablets, allow to create a table with counters in a tablets-enabled keyspace, and remove the warning about counters not being supported when creating a keyspace with tablets. We allow to use counters with tablets only when all nodes are upgraded and support counters with tablets. We add a new feature flag to determine if this is the case. Fixes scylladb/scylladb#18180	2025-11-03 16:04:37 +01:00
Gleb Natapov	eb9112a4a2	db: experimental consistent-tablets option The option will be used to hid consistent tablets feature until it is ready.	2025-10-15 11:27:10 +03:00
Andrzej Jackowski	c59a7db1c9	service_level_controller: automatically create `sl:driver` This commit: - Increases the number of allowed scheduling groups to allow the creation of `sl:driver`. - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating `sl:driver` until all nodes have increased the number of scheduling groups. - Starts using `get_create_driver_service_level_mutations` to unconditionally create `sl:driver` on `raft_initialize_discovery_leader`. The purpose of this code path is ensuring existence of `sl:driver` in new system and tests. - Starts using `migrate_to_driver_service_level` to create `sl:driver` if it is not already present. The creation of `sl:driver` is managed by `topology_coordinator`, similar to other system keyspace updates, such as the `view_builder` migration. The purpose of this code path is handling upgrades. - Modifies related tests to pass after `sl:driver` is added. Later in this patch series, `sl:driver` will be used by `transport/server` to handle selected traffic, such as the driver's schema and topology fetches. Refs: scylladb/scylladb#24411	2025-10-08 08:24:43 +02:00
Tomasz Grabiec	66755db062	locator, cql3: Support rack lists in replication options Allows per-DC replication factor to be either a string, holding a numerical value, or a list of strings, holding a list of rack names. The rack list is not respected yet by the tablet allocator, this is achieved in subsequent commit. This changes the format of options stored in the flattened map in system_schema.keyspaces#replication. Values which are rack lists, are converted into multiple entries, with the list index appended to the key with ':' as the separator: For example, this extended map: { 'dc1': '3', 'dc2': ['rack1', 'rack2'] } is stored as a flattened map: { 'dc1': '3', 'dc2:0': 'rack1', 'dc2:1': 'rack2' } Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Tomasz Grabiec <tgrabiec@scylladb.com>	2025-10-02 19:42:39 +02:00
Michał Chojnowski	ef11dc57c1	db/config: expose "ms" format to the users via database config Extend the `sstable_format` config enum with a "ms" value, and, if it's enabled (in the config and in cluster features), use it for new sstables on the node. (Before this commit, writing `ms` sstables should only be possible in unit tests, via internal APIs. After this commit, the format can be enabled in the config and the database will write it during normal operation). As of this commit, the new format is not the default yet. (But it will become the default in a later commit in the same series).	2025-09-29 22:15:25 +02:00
Avi Kivity	1258e7c165	Revert "Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski" This reverts commit `fe7e63f109`, reversing changes made to `b5f3f2f4c5`. It is causing test.py failures around cqlpy. Fixes #26163 Closes scylladb/scylladb#26174	2025-09-22 09:32:46 +03:00
Avi Kivity	fe7e63f109	Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski This patch series: - Increases the number of allowed scheduling groups to allow creation of `sl:driver` - Implements `create_driver_service_level` that creates `sl:driver` with shares=200 if it wasn't already created - Implements creation of `sl:driver` for new systems and tests in `raft_initialize_discovery_leader` - Modifies `topology_coordinator` to use create `sl:driver` after upgrades. - Implements using `sl:driver` for new connections in `transport/server` - Adds to `transport/server` recognition of driver's control connections and forcing them to keep using `sl:driver`. - Adds tests to verify the new functionality - Modifies existing tests to let them pass after `sl:driver` is added - Modifies the documentation to contain new `sl:driver` The changes were evaluated by a test with the following scenario ([test_connections-sl-driver.py](https://github.com/user-attachments/files/22021273/test_connections-sl-driver.py)): - Start ScyllaDB with one node - Create 1000 keyspaces, 1 table in each keyspace - Start `cassandra-stress` (`-rate threads=50 -mode native cql3`) - Run connection storm with 1000 session (100 python processes, 10 sessions each) The maximum latency during connection storm dropped from 224.94ms to 41.43ms (those numbers are average from 20 test executions, were max latency was in [140ms, 361ms] before change and [31.4ms, 61.5ms] after). The snippet of cassandra-stress output from the moment of connection storm: Before: ``` type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb ... total, 789206, 85887, 85887, 85887, 0.6, 0.3, 2.0, 2.0, 2.5, 5.0, 9.0, 0.09679, 0, 0, 0, 0, 0, 0 total, 909322, 120116, 120116, 120116, 0.4, 0.2, 1.9, 2.0, 2.1, 3.1, 10.0, 0.09053, 0, 0, 0, 0, 0, 0 total, 964392, 55070, 55070, 55070, 0.9, 0.4, 2.0, 4.5, 7.7, 18.9, 11.0, 0.09203, 0, 0, 0, 0, 0, 0 total, 975705, 11313, 11313, 11313, 4.4, 3.5, 6.5, 24.5, 82.7, 83.0, 12.0, 0.11713, 0, 0, 0, 0, 0, 0 total, 987548, 11843, 11843, 11843, 4.2, 3.5, 6.5, 33.7, 48.6, 51.5, 13.0, 0.13366, 0, 0, 0, 0, 0, 0 total, 995422, 7874, 7874, 7874, 6.3, 4.0, 7.7, 85.6, 112.9, 113.5, 14.0, 0.14753, 0, 0, 0, 0, 0, 0 total, 1007228, 11806, 11806, 11806, 4.3, 3.5, 6.5, 29.1, 43.8, 87.1, 15.0, 0.15598, 0, 0, 0, 0, 0, 0 total, 1012840, 5612, 5612, 5612, 8.2, 5.0, 11.5, 121.8, 166.6, 170.1, 16.0, 0.16535, 0, 0, 0, 0, 0, 0 total, 1016186, 3346, 3346, 3346, 13.4, 7.4, 20.1, 204.9, 207.6, 210.4, 17.0, 0.17405, 0, 0, 0, 0, 0, 0 total, 1025462, 9276, 9276, 9276, 6.3, 3.9, 9.6, 74.6, 206.8, 210.0, 18.0, 0.17800, 0, 0, 0, 0, 0, 0 total, 1035979, 10517, 10517, 10517, 4.8, 3.5, 6.7, 38.5, 82.6, 83.0, 19.0, 0.18120, 0, 0, 0, 0, 0, 0 total, 1047488, 11509, 11509, 11509, 4.3, 3.5, 6.0, 32.6, 72.3, 74.0, 20.0, 0.18334, 0, 0, 0, 0, 0, 0 total, 1077456, 29968, 29968, 29968, 1.7, 1.6, 2.9, 3.6, 7.0, 8.2, 21.0, 0.17943, 0, 0, 0, 0, 0, 0 total, 1105490, 28034, 28034, 28034, 1.8, 1.8, 3.5, 4.6, 5.3, 13.8, 22.0, 0.17609, 0, 0, 0, 0, 0, 0 total, 1132221, 26731, 26731, 26731, 1.9, 1.8, 3.8, 5.2, 8.4, 11.1, 23.0, 0.17314, 0, 0, 0, 0, 0, 0 total, 1162149, 29928, 29928, 29928, 1.7, 1.7, 3.0, 4.5, 8.0, 9.1, 24.0, 0.16950, 0, 0, 0, 0, 0, 0 ... ``` After: ``` type total ops, op/s, pk/s, row/s, mean, med, .95, .99, .999, max, time, stderr, errors, gc: #, max ms, sum ms, sdv ms, mb ... total, 822863, 94379, 94379, 94379, 0.5, 0.3, 2.0, 2.0, 2.1, 3.7, 9.0, 0.06669, 0, 0, 0, 0, 0, 0 total, 937337, 114474, 114474, 114474, 0.4, 0.2, 2.0, 2.0, 2.1, 3.4, 10.0, 0.06301, 0, 0, 0, 0, 0, 0 total, 986630, 49293, 49293, 49293, 1.0, 1.0, 2.0, 2.1, 17.9, 19.0, 11.0, 0.07318, 0, 0, 0, 0, 0, 0 total, 1026734, 40104, 40104, 40104, 1.2, 1.0, 2.0, 2.2, 6.3, 7.1, 12.0, 0.08410, 0, 0, 0, 0, 0, 0 total, 1066124, 39390, 39390, 39390, 1.3, 1.0, 2.0, 2.2, 2.6, 3.4, 13.0, 0.09108, 0, 0, 0, 0, 0, 0 total, 1103082, 36958, 36958, 36958, 1.3, 1.1, 2.1, 2.5, 3.1, 4.2, 14.0, 0.09643, 0, 0, 0, 0, 0, 0 total, 1141987, 38905, 38905, 38905, 1.3, 1.0, 2.0, 2.4, 11.4, 12.7, 15.0, 0.09894, 0, 0, 0, 0, 0, 0 total, 1180023, 38036, 38036, 38036, 1.3, 1.0, 2.0, 3.7, 5.6, 7.1, 16.0, 0.10070, 0, 0, 0, 0, 0, 0 total, 1216481, 36458, 36458, 36458, 1.4, 1.0, 2.1, 3.6, 4.7, 5.0, 17.0, 0.10210, 0, 0, 0, 0, 0, 0 total, 1256819, 40338, 40338, 40338, 1.2, 1.0, 2.0, 2.2, 3.5, 5.4, 18.0, 0.10173, 0, 0, 0, 0, 0, 0 total, 1295122, 38303, 38303, 38303, 1.3, 1.0, 2.0, 2.4, 21.0, 21.1, 19.0, 0.10136, 0, 0, 0, 0, 0, 0 total, 1334743, 39621, 39621, 39621, 1.3, 1.0, 2.0, 2.3, 3.3, 4.0, 20.0, 0.10055, 0, 0, 0, 0, 0, 0 total, 1375579, 40836, 40836, 40836, 1.2, 1.0, 2.0, 2.1, 3.4, 5.7, 21.0, 0.09927, 0, 0, 0, 0, 0, 0 total, 1415576, 39997, 39997, 39997, 1.2, 1.0, 2.0, 2.3, 3.2, 4.1, 22.0, 0.09807, 0, 0, 0, 0, 0, 0 total, 1449268, 33692, 33692, 33692, 1.5, 1.4, 2.5, 3.2, 4.2, 5.6, 23.0, 0.09800, 0, 0, 0, 0, 0, 0 total, 1471873, 22605, 22605, 22605, 2.2, 2.0, 4.8, 5.9, 7.0, 7.9, 24.0, 0.10015, 0, 0, 0, 0, 0, 0 ... ``` Fixes: https://github.com/scylladb/scylladb/issues/24411 This is a new feature, so no backport needed. Closes scylladb/scylladb#25412 * github.com:scylladb/scylladb: docs: workload-prioritization: add driver service level test: add test to verify use of `sl:driver` transport: use `sl:driver` to handle driver's control connections transport: whitespace only change in update_scheduling_group transport: call update_scheduling_group for non-auth connections generic_server: transport: start using `sl:driver` for new connections test: add test_desc_* for driver service level test: service_levels: add tests for sl:driver creation and removal test: add reload_raft_topology_state() to ScyllaRESTAPIClient service_level_controller: automatically create `sl:driver` service_level_controller: methods to create driver service level service_level_controller: handle special sl:driver in DESC output topology_coordinator: add service_level_controller reference system_keyspace: add service_level_driver_created test: add MAX_USER_SERVICE_LEVELS	2025-09-18 19:45:17 +03:00
Andrzej Jackowski	6f678a2d1f	service_level_controller: automatically create `sl:driver` This commit: - Increases the number of allowed scheduling groups to allow the creation of `sl:driver`. - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating `sl:driver` until all nodes have increased the number of scheduling groups. - Starts using `get_create_driver_service_level_mutations` to unconditionally create `sl:driver` on `raft_initialize_discovery_leader`. The purpose of this code path is ensuring existence of `sl:driver` in new system and tests. - Starts using `migrate_to_driver_service_level` to create `sl:driver` if it is not already present. The creation of `sl:driver` is managed by `topology_coordinator`, similar to other system keyspace updates, such as the `view_builder` migration. The purpose of this code path is handling upgrades. - Modifies related tests to pass after `sl:driver` is added. Later in this patch series, `sl:driver` will be used by `transport/server` to handle selected traffic, such as the driver's schema and topology fetches. Refs: scylladb/scylladb#24411	2025-09-18 09:28:32 +02:00
Michael Litvak	5f1caebcc7	cdc: add cdc_with_tablets feature flag add a new feature flag cdc_with_tablets to protect the schema changes that are required for the CDC with tablets feature. we will also use it to allow start using CDC in tablets-based keyspaces only once all nodes are upgraded and support this feature.	2025-09-17 14:47:11 +02:00
Patryk Jędrzejczak	9efe250a8f	Merge 'gossiper: ensure gossiper operations are executed in gossiper scheduling group' from Sergey Zolotukhin Sometimes gossiper operations invoked from storage_service and other components run under a non-gossiper scheduling group. If these operations acquire gossiper locks, priority inversion can occur: higher-priority gossiper tasks may wait behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness or even failures. This patch ensures that gossiper operations requiring locks on gossiper structures are explicitly executed in the gossiper scheduling group. To help detect similar issues in the future, a warning is logged whenever a gossiper lock is acquired under a non-gossiper scheduling group. Fixes scylladb/scylladb#25907 Refs: scylladb/scylladb#25702 Backport: this patch fixes an issue with gossiper operations scheduling group, that might affect topology operations, therefore backport is needed to 2025.1, 2025.2, 2025.3 Closes scylladb/scylladb#25981 * https://github.com/scylladb/scylladb: gossiper: ensure gossiper operations are executed in gossiper scheduling group gossiper: fix wrong gossiper instance used in `force_remove_endpoint`	2025-09-16 10:14:15 +02:00
Sergey Zolotukhin	6c2a145f6c	gossiper: ensure gossiper operations are executed in gossiper scheduling group Sometimes gossiper operations invoked from storage_service and other components run under a non-gossiper scheduling group. If these operations acquire gossiper locks, priority inversion can occur: higher-priority gossiper tasks may wait behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness or even failures. This patch ensures that gossiper operations requiring locks on gossiper structures are explicitly executed in the gossiper scheduling group. To help detect similar issues in the future, a warning is logged whenever a gossiper lock is acquired under a non-gossiper scheduling group. Fixes scylladb/scylladb#25907	2025-09-15 12:49:07 +02:00
Sergey Zolotukhin	340413e797	gossiper: fix wrong gossiper instance used in `force_remove_endpoint` `gossiper::force_remove_endpoint` is always executed on shard 0 using `invoke_on`. Since each shard has its own `gossiper` instance, if `force_remove_endpoint` is called from a shard other than shard 0, `my_host_id()` may be invoked on the wrong `gossiper` object. This results in undefined behavior due to unsynchronized access to resources on another shard.	2025-09-15 08:54:59 +02:00
Emil Maskovsky	99db980899	gossiper: eliminate duplicate code in do_shadow_round Remove a redundant code block inadvertently introduced in commit `4b3d160f34`. While the duplicate did not affect functionality, its presence could cause confusion and maintenance issues. This change does not alter behavior and is purely a cleanup. Fixes: scylladb/scylladb#25999 Backport: The issue exists in all 2025 branches, so it should be backported accordingly. Closes scylladb/scylladb#26001	2025-09-15 08:35:04 +03:00
Sergey Zolotukhin	b34d543f30	gossiper: fix empty initial local node state This change removes the addition of an empty state to `_endpoint_state_map`. Instead, a new state is created locally and then published via replicate, avoiding the issue of an empty state existing in `_endpoint_state_map` before the preemption point. Since this resolves the issue tested in `test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed. Fixes: scylladb/scylladb#25831	2025-09-08 11:38:31 +02:00
Sergey Zolotukhin	775642ea23	gossiper: add test for a race condition in start_gossiping This change adds a test for a race condition in `start_gossiping` that can lead to an empty self state sent in `gossip_get_endpoint_states_response`. Test for scylladb/scylladb#25831	2025-09-08 11:38:30 +02:00
Sergey Zolotukhin	f08df7c9d7	gossiper: check for a race condition in `do_apply_state_locally` In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change 1. adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. 2. Removes xfail from the test_gossiper_race test since the issue is now fixed. 3. Adds exception handling in `do_shadow_round` to skip responses from nodes that sent an empty host ID. This re-applies the commit `13392a40d4` that was reverted in `46aa59fe49`, after fixing the issues that caused the CI to fail. Fixes: scylladb/scylladb#25702 Fixes: scylladb/scylladb#25621 Ref: scylladb/scylla-enterprise#5613	2025-09-08 11:38:30 +02:00
Emil Maskovsky	28e0f42a83	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. This re-applies the commit `5dac4b38fb` that was reverted in `dc44fca67c`, but modified to relax the check from "on_internal_error" to a just warning log. The more strict can be re-introduced later once we are sure that all remaining problems are resolved and it will not break the CI. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721	2025-09-08 11:38:30 +02:00
Pavel Emelyanov	dc44fca67c	Revert "test/gossiper: add reproducible test for race condition during node decommission" This reverts commit `5dac4b38fb` as per request from #25803	2025-09-05 09:56:46 +03:00
Pavel Emelyanov	46aa59fe49	Revert "gossiper: check for a race condition in `do_apply_state_locally`" This reverts commit `13392a40d4` as per request from #25803	2025-09-05 09:56:21 +03:00
Radosław Cybulski	c242234552	Revert "build: add precompiled headers to CMakeLists.txt" This reverts commit `01bb7b629a`. Closes scylladb/scylladb#25735	2025-09-03 09:46:00 +03:00
Sergey Zolotukhin	13392a40d4	gossiper: check for a race condition in `do_apply_state_locally` In do_apply_state_locally, a race condition can occur if a task is suspended at a preemption point while the node entry is not locked. During this time, the host may be removed from _endpoint_state_map. When the task resumes, this can lead to inserting an entry with an empty host ID into the map, causing various errors, including a node crash. This change adds a check after locking the map entry: if a gossip ACK update does not contain a host ID, we verify that an entry with that host ID still exists in the gossiper’s _endpoint_state_map. Fixes scylladb/scylladb#25702 Fixes scylladb/scylladb#25621 Ref scylladb/scylla-enterprise#5613 Closes scylladb/scylladb#25727	2025-09-02 20:44:21 +02:00
Emil Maskovsky	5dac4b38fb	test/gossiper: add reproducible test for race condition during node decommission This change introduces a targeted test that simulates the gossiper race condition observed during node decommissioning. The test delays gossip state application and host ID lookup to reliably reproduce the scenario where `gossiper::get_host_id()` is called on a removed endpoint, potentially triggering an abort in `apply_new_states`. There is a specific error injection added to widen the race window, in order to increase the likelihood of hitting the race condition. The error injection is designed to delay the application of gossip state updates, for the specific node that is being decommissioned. This should then result in the server abort in the gossiper. Refs: scylladb/scylladb#25621 Fixes: scylladb/scylladb#25721 Backport: The test is primarily for an issue found in 2025.1, so it needs to be backported to all the 2025.x branches. Closes scylladb/scylladb#25685	2025-09-01 13:59:47 +02:00
Piotr Dulikowski	7ccb50514d	Merge 'Introduce view building coordinator' from Michał Jadwiszczak This patch introduces `view_building_coordinator`, a single entity within whole cluster responsible for building tablet-based views. The view building coordinator takes slightly different approach than the existing node-local view builder. The whole process is split into smaller view building tasks, one per each tablet replica of the base table. The coordinator builds one base table at a time and it can choose another when all views of currently processing base table are built. The tasks are started by setting `STARTED` state and they are executed by node-local view building worker. The tasks are scheduled in a way, that each shard processes only one tablet at a time (multiple tasks can be started for a shard on a node because a table can have multiple views but then all tasks have the same base table and tablet (last_token)). Once the coordinator starts the tasks, it sends `work_on_view_building_tasks` RPC to start the tasks and receive their results. This RPC is resilient to RPC failure or raft leader change, meaning if one RPC call started a batch of tasks but then failed (for instance the raft leader was changed and caller aborted waiting for the response), next RPC call will attach itself to the already started batch. The coordinator plugs into handling tablet operations (migration/resize/RF change) and adjusts its tasks accordingly. At the start of each tablet operation, the coordinator aborts necessary view building tasks to prevent https://github.com/scylladb/scylladb/issues/21564. Then, new adjusted tasks are created at the end of the operation. If the operation fails at any moment, aborted tasks are rollback. The view building coordinator can also handle staging sstables using process_staging view building tasks. We do this because we don't want to start generating view updates from a staging sstable prematurely, before the writes are directed to the new replica (https://github.com/scylladb/scylladb/issues/19149). For detailed description check: `docs/dev/view-building-coordinator.md` Fixes https://github.com/scylladb/scylladb/issues/22288 Fixes https://github.com/scylladb/scylladb/issues/19149 Fixes https://github.com/scylladb/scylladb/issues/21564 Fixes https://github.com/scylladb/scylladb/issues/17603 Fixes https://github.com/scylladb/scylladb/issues/22586 Fixes https://github.com/scylladb/scylladb/issues/18826 Fixes https://github.com/scylladb/scylladb/issues/23930 --- This PR is reimplementation of https://github.com/scylladb/scylladb/pull/21942 Closes scylladb/scylladb#23760 * github.com:scylladb/scylladb: test/cluster: add view build status tests test/cluster: add view building coordinator tests utils/error_injection: allow to abort `injection_handler::wait_for_message()` test: adjust existing tests utils/error_injection: add injection with `sleep_abortable()` db/view/view_builder: ignore `no_such_keyspace` exception docs/dev: add view building coordinator documentation db/view/view_building_worker: work on `process_staging` tasks db/view/view_building_worker: register staging sstable to view building coordinator when needed db/view/view_building_worker: discover staging sstables db/view/view_building_worker: add method to register staging sstable db/view/view_update_generator: add method to process staging sstables instantly db/view/view_update_generator: extract generating updates from staging sstables to a method db/view/view_update_generator: ignore tablet-based sstables db/view/view_building_coordinator: update view build status on node join/left db/view/view_building_coordinator: handle tablet operations db/view: add view building task mutation builder service/topology_coordinator: run view building coordinator db/view: introduce `view_building_coordinator` db/view/view_building_worker: update built views locally db/view: introduce `view_building_worker` db/view: extract common view building functionalities db/view: prepare to create abstract `view_consumer` message/messaging_service: add `work_on_view_building_tasks` RPC service/topology_coordinator: make `term_changed_error` public db/schema_tables: create/cleanup tasks when an index is created/dropped service/migration_manager: cleanup view building state on drop keyspace service/migration_manager: cleanup view building state on drop view service/migration_manager: create view building tasks on create view test/boost: enable proxy remote in some tests service/migration_manager: pass `storage_proxy` to `prepare_keyspace_drop_announcement()` service/migration_manager: coroutinize `prepare_new_view_announcement()` service/storage_proxy: expose references to `system_keyspace` and `view_building_state_machine` service: reload `view_building_state_machine` on group0 apply() service/vb_coordinator: add currently processing base db/system_keyspace: move `get_scylla_local_mutation()` up db/system_keyspace: add `view_building_tasks` table db/view: add view_building_state and views_state db/system_keyspace: add method to get view build status map db/view: extract `system.view_build_status_v2` cql statements to system_keyspace db/system_keyspace: move `internal_system_query_state()` function earlier db/view: ignore tablet-based views in `view_builder` gms/feature_service: add VIEW_BUILDING_COORDINATOR feature	2025-08-29 17:28:44 +02:00
Radosław Cybulski	01bb7b629a	build: add precompiled headers to CMakeLists.txt Add precompiled header support to CMakeLists.txt and configure.py - it improves compilation time by approximately 10%. New header `stdafx.hh` is added, don't include it manually - the compiler will include it for you. The header contains includes from external libraries used by Scylla - seastar, standard library, linux headers and zlib. The feature is enabled by default, use CMake option `Scylla_USE_PRECOMPILED_HEADER` or configure.py --disable-precompiled-header to disable. The feature should be disabled, when trying to check headers - otherwise you might get false negatives on missing includes from seastar / abseil and so on. Note: following configuration needs to be added to ccache.conf: sloppiness = pch_defines,time_macros Closes #25182	2025-08-27 21:37:54 +03:00
Michał Jadwiszczak	7dba3667c9	gms/feature_service: add VIEW_BUILDING_COORDINATOR feature	2025-08-27 08:55:46 +02:00
Avi Kivity	611918056a	Merge 'repair: Add tablet incremental repair support' from Asias He The central idea of incremental repair is to allow repair participants to select and repair only a portion of the dataset to speed up the repair process. All repair participants must utilize an identical selection method to repair and synchronize the same selected dataset. There are two primary selection methods: time-based and file-based. The time-based method selects data within a specified time frame. It is versatile but it is less efficient because it requires reading all of the dataset and omitting data beyond the time frame. The file-based method selects data from unrepaired SSTables and is more efficient because it allows the entire SSTable to be omitted. This document patch implements the file-based selection method. Incremental repair will only be supported for tablet tables; it will not be supported for vnode tables. On one hand, the legacy vnode is less important to support. On the other hand, the incremental repair for vnode is much harder to implement. With vnodes, a SSTalbe could contain data for multiple vnode ranges. When a given vnode range is repaired, only a portion of the SSTable is repaired. This complicates the manipulation of SSTables significantly during both repair and compaction. With tablets, an entire tablet is repaired so that a sstable is either fully repaired or not repaired which is a huge simplification. This patch uses the repaired_at from sstables::statistics component to mark a sstable as repaired. It uses a virtual clock as the repair timestamp, i.e., using a monotonically increasing number for the repaired_at field of a SSTable and sstables_repaired_at column in system.tablets table. Notice that when a sstable is not repaired, the repaired_at field will be set to the default value 0 by default. The being_repaired in memory field of a SSTable is used to explicitly mark that a SSTable is being selected. The following variables are used for incremental repair: The repaired_at on disk field of a SSTable is used. - A 64-bit number increases sequentially The sstables_repaired_at is added to the system.tablets table. - repaired_at <= sstables_repaired_at means the sstable is repaired The being_repaired in memory field of a SSTable is added. - A repair UUID tells which sstable has participated in the repair Initial test results: 1) Medium dataset results Node amount: 3 Instance type: i4i.2xlarge Disk usage per node: ~500GB Cluster pre-populated with ~500GB of data before starting repairs job. Results for Repair Timings: The regular repair run took 210 mins. Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s The speedup is: 183 mins / 48s = 228X 2) Small dataset results Node amount: 3 Instance type: i4i.2xlarge Disk usage per node: ~167GB Cluster pre-populated with ~167GB of data before starting the repairs job. Regular repair 1st run took 110s, 2nd and 3rd runs took 110s. Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds. The speedup is: 110s / 1.5s = 73X 3) Large dataset results Node amount: 6 Instance type: i4i.2xlarge, 3 racks 50% of base load, 50% read/write Dataset == Sum of data on each node Dataset Non-incremental repair (minutes) 1.3 TiB 31:07 3.5 TiB 25:10 5.0 TiB 19:03 6.3 TiB 31:42 Dataset Incremental repair (minutes) 1.3 TiB 24:32 3.0 TiB 13:06 4.0 TiB 5:23 4.8 TiB 7:14 5.6 TiB 3:58 6.3 TiB 7:33 7.0 TiB 6:55 Fixes #22472 Closes scylladb/scylladb#24291 * github.com:scylladb/scylladb: replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair compaction: Move compaction_reenabler to compaction_reenabler.hh topology_coordinator: Make rpc::remote_verb_error to warning level repair: Add metrics for sstable bytes read and skipped from sstables test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair test.py: Add tests for tablet incremental repair repair: Add tablet incremental repair support compaction: Add tablet incremental repair support feature_service: Add TABLET_INCREMENTAL_REPAIR feature tablet_allocator: Add tablet_force_tablet_count_increase and decrease repair: Add incremental helpers sstable: Add being_repaired to sstable sstables: Add set_repaired_at to metadata_collector mutation_compactor: Introduce add operator to compaction_stats tablet: Add sstables_repaired_at to system.tablets table test: Fix drain api in task_manager_client.py	2025-08-19 13:13:22 +03:00
Asias He	2ecd42f369	feature_service: Add TABLET_INCREMENTAL_REPAIR feature	2025-08-11 10:10:08 +08:00
Benny Halevy	0ad1898f0a	feature_service: move UUID_SSTABLE_IDENTIFIERS to supported_feature_set The feature is supported by all live versions since version 5.4 / 2024.1. (Although up to `6da758d74c` it could be disabled using the config option) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-08-08 11:46:15 +03:00
Piotr Dulikowski	ec7832cc84	Merge 'Raft-based recovery procedure: simplify rolling restart with recovery_leader' from Patryk Jędrzejczak The following steps are performed in sequence as part of the Raft-based recovery procedure: - set `recovery_leader` to the host ID of the recovery leader in `scylla.yaml` on all live nodes, - send the `SIGHUP` signal to all Scylla processes to reload the config, - perform a rolling restart (with the recovery leader being restarted first). These steps are not intuitive and more complicated than they could be. In this PR, we simplify these steps. From now on, we will be able to simply set `recovery_leader` on each node just before restarting it. Apart from making necessary changes in the code, we also update all tests of the Raft-based recovery procedure and the user-facing documentation. Fixes scylladb/scylladb#25015 The Raft-based procedure was added in 2025.2. This PR makes the procedure simpler and less error-prone, so it should be backported to 2025.2 and 2025.3. Closes scylladb/scylladb#25032 * github.com:scylladb/scylladb: docs: document the option to set recovery_leader later test: delay setting recovery_leader in the recovery procedure tests gossip: add recovery_leader to gossip_digest_syn db: system_keyspace: peers_table_read_fixup: remove rows with null host_id db/config, gms/gossiper: change recovery_leader to UUID db/config, utils: allow using UUID as a config option	2025-08-04 08:29:32 +02:00
Botond Dénes	2985c343ed	Merge 'repair: Avoid too many fragments in a single repair_row_on_wire' from Asias He When repairing a partition with many rows, we can store many fragments in a repair_row_on_wire object which is sent as a rpc stream message. This could cause reactor stalls when the rpc stream compression is turned on, because the compression compresses the whole message without any split and compression. This patch solves the problem at the higher level by reducing the message size that is sent to the rpc stream. Tests are added to make sure the message split works. Fixes #24808 Closes scylladb/scylladb#25002 * github.com:scylladb/scylladb: repair: Avoid too many fragments in a single repair_row_on_wire repair: Change partition_key_and_mutation_fragments to use chunked_vector utils: Allow chunked_vector::erase to work with non-default-constructible type	2025-07-29 17:45:57 +03:00
Asias He	e28c75aa79	repair: Avoid too many fragments in a single repair_row_on_wire When repairing a partition with many rows, we can store many fragments in a repair_row_on_wire object which is sent as a rpc stream message. This could cause reactor stalls when the rpc stream compression is turned on, because the compression compresses the whole message without any split and compression. This patch solves the problem at the higher level by reducing the message size that is sent to the rpc stream. Tests are added to make sure the message split works. Fixes #24808	2025-07-29 13:43:53 +08:00
Botond Dénes	f3ed27bd9e	Merge 'Move feature-service config creation code out of feature-service itself' from Pavel Emelyanov Nowadays the way to configure an internal service is 1. service declares its config struct 2. caller (main/test/tool) fills the respective config with values it wants 3. the service is started with the config passed by value The feature service code behaves likewise, but provides a helper method to create its config out of db::config. This PR moves this helper out of gms code, so that it doesn't mess with system-wide db::config and only needs its own small struct feature_config. For the reference: similar changes with other services: #23705 , #20174 , #19166 Closes scylladb/scylladb#25118 * github.com:scylladb/scylladb: gms,init: Move get_disabled_features_from_db_config() from gms code: Update callers generating feature service config gms: Make feature_config a simple struct gms: Split feature_config_from_db_config() into two	2025-07-29 08:17:49 +03:00

1 2 3 4 5 ...

1383 Commits