scylladb/service at 3ccfed37f69ec166f2b4180ae5f98f692076ed2b - scylladb - Anomalous Gitea

mirrors/scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Files

History

Patryk Jędrzejczak 3ccfed37f6 storage_service: set up topology properly in maintenance mode

We currently make the local node the only token owner (that owns the
whole ring) in maintenance mode, but we don't update the topology properly.
The node is present in the topology, but in the `none` state. That's how
it's inserted by `tm.get_topology().set_host_id_cfg(host_id);` in
`scylla_main`. As a result, the node started in maintenance mode crashes
in the following way in the presence of a vnodes-based keyspace with the
NetworkTopologyStrategy:
```
scylla: locator/network_topology_strategy.cc:207:
    locator::natural_endpoints_tracker::natural_endpoints_tracker(
    const token_metadata &, const network_topology_strategy::dc_rep_factor_map &):
    Assertion `!_token_owners.empty() && !_racks.empty()' failed.
```
Both `_token_owners` and `_racks` are empty. The reason is that
`_tm.get_datacenter_token_owners()` and
`_tm.get_datacenter_racks_token_owners()` called above filter out nodes
in the `none` state.

This bug basically made maintenance mode unusable in customer clusters.

We fix it by changing the node state to `normal`. We also update its
rack, datacenter, and shards count. Rack and datacenter are present in the
topology somehow, but there is nothing wrong with updating them again.
The shard count is also missing, so we better update it to avoid other
issues.

Fixes #27988

(cherry picked from commit a08c53ae4b)

2026-02-03 11:33:51 +01:00

..

broadcast_tables/experimental

treewide: Move replica related files to replica directory

2025-09-18 08:00:35 +03:00

direct_failure_detector

direct_failure_detector: run direct failure detector in the gossiper scheduling group

2025-12-09 17:19:31 +02:00

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

paxos_state: get_replica_lock: remove shard check

2025-11-02 11:14:48 +01:00

service/qos: Do not crash Scylla if auth_integration absent

2025-11-15 22:11:06 +00:00

raft: drop invoke_on from the pinger verb handler

2025-12-07 14:57:50 +00:00

address_map.hh

address_map: Use barrier() to wait for replication

2025-11-23 21:05:34 +00:00

cache_hitrate_calculator.hh

…

cas_shard.hh

storage_proxy: add cas_shard class

2025-06-30 10:33:17 +02:00

client_state.cc

auth: add system table permissions to VECTOR_SEARCH_INDEXING

2026-01-09 14:55:34 +01:00

client_state.hh

cql: allow VECTOR_SEARCH_INDEXING users to select

2025-10-30 10:13:16 +00:00

CMakeLists.txt

vector_store_client: Move to vector_search module

2025-09-22 08:01:47 +02:00

endpoint_lifecycle_subscriber.hh

hinted_handoff: drain hints after the target node stops owning tokens

2025-09-24 07:11:59 +02:00

load_broadcaster.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

load_meter.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

maintenance_mode.hh

…

mapreduce_service.cc

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

mapreduce_service.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

memory_limiter.hh

…

migration_listener.hh

migration_manager: pass timestamp to pre_create

2025-11-15 22:10:55 +00:00

migration_manager.cc

db/view/view_building_state: replace task's state with aborted flag

2025-11-26 17:47:16 +01:00

migration_manager.hh

service/migration_manager: pass storage_proxy to prepare_keyspace_drop_announcement()

2025-08-27 08:55:47 +02:00

misc_services.cc

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

query_state.hh

…

session.cc

service: do not include unused headers

2025-03-20 11:18:16 +08:00

session.hh

service: session: use named gate

2025-04-12 11:28:49 +03:00

state_id.hh

…

storage_proxy_fwd.hh

storage_proxy: introduce node_local_only flag

2025-07-24 19:48:08 +02:00

storage_proxy_stats.hh

storage_proxy_stats: add fenced_out_requests metric

2025-09-15 11:24:53 +02:00

storage_proxy.cc

storage_proxy: drop stop() method

2026-01-23 19:24:06 +00:00

storage_proxy.hh

storage_proxy: drop stop() method

2026-01-23 19:24:06 +00:00

storage_service.cc

storage_service: set up topology properly in maintenance mode

2026-02-03 11:33:51 +01:00

storage_service.hh

topology coordinator: complete pending operation for a replaced node

2026-01-19 09:42:20 +02:00

tablet_allocator_fwd.hh

…

tablet_allocator.cc

repair: Add tablet repair progress report support

2026-01-19 09:39:13 +02:00

tablet_allocator.hh

service: tablets: Keep load_stats inside tablet_allocator

2025-04-09 20:21:51 +02:00

tablet_operation.hh

…

task_manager_module.cc

repair: Add tablet repair progress report support

2026-01-19 09:39:13 +02:00

task_manager_module.hh

…

topology_coordinator.cc

repair: Add tablet repair progress report support

2026-01-19 09:39:13 +02:00

topology_coordinator.hh

Revert "Merge 'transport: service_level_controller: create and use driver service level' from Andrzej Jackowski"

2025-09-22 09:32:46 +03:00

topology_guard.hh

service: do not include unused headers

2025-03-20 11:18:16 +08:00

topology_mutation.cc

topology coordinator: Implement global topology request queue

2025-06-11 11:29:33 +03:00

topology_mutation.hh

treewide: Move mutation related files to a mutation directory

2025-09-24 13:23:38 +03:00

topology_state_machine.cc

topology request: make it possible to hold global request types in request_type field

2025-06-09 13:38:49 +03:00

topology_state_machine.hh

topology coordinator: Implement global topology request queue

2025-06-11 11:29:33 +03:00

view_update_backlog_broker.hh

…