scylladb/locator at 969b07fdfd2a51bb9100a45adf87bdec1388ee49 - scylladb - Anomalous Gitea

mirrors/scylladb

Files

History

Tomasz Grabiec c4714180cc tablets: Make load balancing capacity-aware

Before this patch the load balancer was equalizing tablet count per
shard, so it achieved balance assuming that:
 1) tablets have the same size
 2) shards have the same capacity

That can cause imbalance of utilization if shards have different
capacity, which can happen in heterogenous clusters with different
instance types. One of the causes for capacity difference is that
larger instances run with fewer shards due to vCPUs being dedicated to
IRQ handling. This makes those shards have more disk capacity, and
more CPU power.

After this patch, the load balancer equalizes shard's storage
utilization, so it no longer assumes that shards have the same
capacity. It still assummes that each tablet has equal size. So it's a
middle step towards full size-aware balancing.

One consequence is that to be able to balance, the load balancer need
to know about every node's capacity, which is collected with the same
RPC which collects load_stats for average tablet size. This is not a
significant set back because migrations cannot proceed anyway if nodes
are down due to barriers. We could make intra-node migration
scheduling work without capacity information, but it's pointless due
to above, so not implemented.

2025-03-06 13:35:38 +01:00

..

abstract_replication_strategy.cc

service: migrate from boost::range::remove_if() to std::ranges::remove_if

2025-02-11 09:15:14 +08:00

abstract_replication_strategy.hh

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

azure_snitch.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

azure_snitch.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

CMakeLists.txt

build: cmake: add check-header target

2023-11-13 10:27:06 +02:00

ec2_multi_region_snitch.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

ec2_multi_region_snitch.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

ec2_snitch.cc

treewide: change assert() to SCYLLA_ASSERT()

2024-08-05 08:23:35 +03:00

ec2_snitch.hh

locator::ec2_snitch: change retry logic to exponential backoff

2023-12-25 18:17:23 +02:00

everywhere_replication_strategy.cc

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

everywhere_replication_strategy.hh

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

gce_snitch.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gce_snitch.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gossiping_property_file_snitch.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

gossiping_property_file_snitch.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

host_id.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

load_sketch.hh

tablets: Make load balancing capacity-aware

2025-03-06 13:35:38 +01:00

local_strategy.cc

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

local_strategy.hh

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

network_topology_strategy.cc

tablets: load_balancer: Move hints processing to tablet scheduler

2025-02-19 16:29:07 +01:00

network_topology_strategy.hh

tablets: load_balancer: Move hints processing to tablet scheduler

2025-02-19 16:29:07 +01:00

production_snitch_base.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

production_snitch_base.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

rack_inferring_snitch.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

rack_inferring_snitch.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

simple_snitch.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

simple_snitch.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

simple_strategy.cc

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

simple_strategy.hh

cql: restore validating replication strategies options

2025-02-04 12:27:33 +01:00

snitch_base.cc

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

snitch_base.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tablet_metadata_guard.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

tablet_replication_strategy.hh

tablets: load_balancer: Move hints processing to tablet scheduler

2025-02-19 16:29:07 +01:00

tablet_sharder.hh

locator: do not include unused headers

2025-01-08 14:26:48 +02:00

tablets.cc

storage_service, tablets: Collect per-node capacity in load_stats

2025-03-06 12:17:32 +01:00

tablets.hh

storage_service, tablets: Collect per-node capacity in load_stats

2025-03-06 12:17:32 +01:00

token_metadata_fwd.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

token_metadata.cc

locator: token_metadata: drop update_host_id() function that does nothing now

2025-01-16 16:37:08 +02:00

token_metadata.hh

locator: token_metadata: drop update_host_id() function that does nothing now

2025-01-16 16:37:08 +02:00

token_range_splitter.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

topology.cc

locator: topology: consult local_dc_rack if node not found by host_id

2025-01-22 09:04:24 +02:00

topology.hh

locator: topology: consult local_dc_rack if node not found by host_id

2025-01-22 09:04:24 +02:00

types.hh

treewide: relicense to ScyllaDB-Source-Available-1.0

2024-12-18 17:45:13 +02:00

util.cc

storage_service: move describe ring and get_range_to_endpoint_map to use host ids inside and translate to ips at the last moment

2025-01-16 16:37:06 +02:00

util.hh

storage_service: move describe ring and get_range_to_endpoint_map to use host ids inside and translate to ips at the last moment

2025-01-16 16:37:06 +02:00