scylladb/service at 6da598fa4afb6dc00a8ef27df92fbaebe896031a - scylladb - Anomalous Gitea

mirrors/scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 03:30:49 +00:00

Files

History

Avi Kivity 2239474a87 Merge 'tablets: scheduler: Balance racks separately when rf_rack_valid_keyspaces is true' from Tomasz Grabiec

Greatly improves performance of plan making, because we don't consider
candidates in other racks, most of which will fail to be selected due
to replication constraints (no rack overload). Also (but minor)
reduces the overhead of candidate evaluation, as we don't have to
evaluate rack load.

Enabled only for rf_rack_valid_keyspaces because such setups guarantee
that we will not need (because we must not) move tablets across racks,
and we don't need to execute the general algorithm for the whole DC.

Tested with perf-load-balancing, which performs a single scale-out
operation on a cluster which initially has 10 nodes 88 shards each, 2
racks, RF=2, 70 tables, 256 tablets per table. Scale out adds 6 new
nodes (same shard count). Time to reballance the cluster (plan making
only, sum of all iterations, no streaming):

Before:  16 min 25 s
After:    0 min 25 s

Before, plan making cost (single incremental iteration) alternated
between fast (0.1 [s]) and slow (14.1 [s]):

  testlog - Rebalance iteration 7 took 14.156 [s]: mig=88, bad=88, first_bad=17741, eval=93874484, skiplist=0, skip: (load=0, rack=17653, node=0)
  testlog - Rebalance iteration 8 took 0.143 [s]: mig=88, bad=88, first_bad=88, eval=865407, skiplist=0, skip: (load=0, rack=0, node=0)

The slow run chose min and max nodes in different racks, hence the
fast path failed to find any candidates and we switched to exhaustive
search of candidates in other nodes.

After, all iterations are fast (0.1 [s] per rack, 0.2 [s] per plan-making). The plan is twice as large because it combines the output of two subsequent (pre-patch) plan-making calls.

Fixes #26016

Closes scylladb/scylladb#26017

* github.com:scylladb/scylladb:
  test: perf: perf-load-balancing: Add parallel-scaleout scenario
  test: perf: perf-load-balancing: Convert to tool_app_template
  tablets: scheduler: Balance racks separately when rf_rack_valid_keyspaces is true

2025-09-23 22:45:35 +03:00

..

broadcast_tables/experimental

treewide: Move replica related files to replica directory

2025-09-18 08:00:35 +03:00

direct_failure_detector

Move direct_failure_detector from root to service/

2025-04-08 13:03:24 +03:00

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"

2025-09-22 09:32:46 +03:00

Merge 'CDC with tablets' from Michael Litvak

2025-09-18 13:39:37 +02:00

address_map.hh

service: do not include unused headers

2025-03-20 11:18:16 +08:00

cache_hitrate_calculator.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

cas_shard.hh

storage_proxy: add cas_shard class

2025-06-30 10:33:17 +02:00

client_state.cc

client_state: decoroutinize check_internal_table_permissions

2025-08-30 18:46:54 +03:00

client_state.hh

Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"

2025-09-22 09:32:46 +03:00

CMakeLists.txt

vector_store_client: Move to vector_search module

2025-09-22 08:01:47 +02:00

endpoint_lifecycle_subscriber.hh

treewide: pass host id to endpoint_lifecycle_subscriber

2025-03-11 12:09:22 +02:00

load_broadcaster.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

load_meter.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

maintenance_mode.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

mapreduce_service.cc

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

mapreduce_service.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

memory_limiter.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

migration_listener.hh

migration_listener: add on_before_allocate_tablet_map notification

2025-09-17 14:47:11 +02:00

migration_manager.cc

cdc: fix create table with cdc if not exists

2025-09-21 09:38:36 +02:00

migration_manager.hh

service/migration_manager: pass storage_proxy to prepare_keyspace_drop_announcement()

2025-08-27 08:55:47 +02:00

misc_services.cc

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

query_state.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

session.cc

service: do not include unused headers

2025-03-20 11:18:16 +08:00

session.hh

service: session: use named gate

2025-04-12 11:28:49 +03:00

state_id.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

storage_proxy_fwd.hh

storage_proxy: introduce node_local_only flag

2025-07-24 19:48:08 +02:00

storage_proxy_stats.hh

storage_proxy_stats: add fenced_out_requests metric

2025-09-15 11:24:53 +02:00

storage_proxy.cc

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

storage_proxy.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

storage_service.cc

Merge 'mv: handle mismatched base/view replica count caused by RF change' from Wojciech Mitros

2025-09-23 08:10:08 +02:00

storage_service.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

tablet_allocator_fwd.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tablet_allocator.cc

Merge 'tablets: scheduler: Balance racks separately when rf_rack_valid_keyspaces is true' from Tomasz Grabiec

2025-09-23 22:45:35 +03:00

tablet_allocator.hh

service: tablets: Keep load_stats inside tablet_allocator

2025-04-09 20:21:51 +02:00

tablet_operation.hh

service: Add tablet_operation.hh

2025-01-17 16:12:05 +08:00

task_manager_module.cc

api: tasks: task_manager: keep children identities in chunked_{array,vector}

2025-09-15 08:44:16 +03:00

task_manager_module.hh

tasks: replace ip with host_id in task_identity

2025-02-05 10:11:52 +01:00

topology_coordinator.cc

Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"

2025-09-22 09:32:46 +03:00

topology_coordinator.hh

Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"

2025-09-22 09:32:46 +03:00

topology_guard.hh

service: do not include unused headers

2025-03-20 11:18:16 +08:00

topology_mutation.cc

topology coordinator: Implement global topology request queue

2025-06-11 11:29:33 +03:00

topology_mutation.hh

topology coordinator: Implement global topology request queue

2025-06-11 11:29:33 +03:00

topology_state_machine.cc

topology request: make it possible to hold global request types in request_type field

2025-06-09 13:38:49 +03:00

topology_state_machine.hh

topology coordinator: Implement global topology request queue

2025-06-11 11:29:33 +03:00

view_update_backlog_broker.hh

treewide: pass host id to endpoint state change subscribers

2025-03-11 12:09:22 +02:00