scylladb/service at ca2bbbad97aead8b4f36dbe0feaba48e2bc51ed9 - scylladb - Anomalous Gitea

mirrors/scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 11:36:54 +00:00

Files

History

Dawid Mędrek c9d192c684 Merge 'raft ropology: prevent crashes of multiple nodes' from Patryk Jędrzejczak

Some assertions in the Raft-based topology are likely to cause crashes of
multiple nodes due to the consistent nature of the Raft-based code. If the
failing assertion is executed in the code run by each follower (e.g., the code
reloading the in-memory topology state machine), then all nodes can crash. If
the failing assertion is executed only by the leader (e.g., the topology
coordinator fiber), then multiple consecutive group0 leaders will chain-crash
until there is no group0 majority.

Crashing multiple nodes is much more severe than necessary. It's enough to
prevent the topology state machine from making more progress. This will
naturally happen after throwing a runtime error. The problematic fiber will be
killed or will keep failing in a loop. Note that it should be safe to block
the topology state machine, but not the whole group0, as the topology state
machine is mostly isolated from the rest of group0.

We replace some occurrences of `on_fatal_internal_error` and `SCYLLA_ASSERT`
with `on_internal_error`. These are not all occurrences, as some fatal
assertions make sense, for example, in the bootstrap procedure.

We also raise an internal error to prevent a segmentation fault in a few places.

Fixes #27987

Backporting this PR is not required, but we can consider it at least for 2026.1
because:
- it is LTS,
- the changes are low-risk,
- there shouldn't be many conflicts.

Closes scylladb/scylladb#28558

* github.com:scylladb/scylladb:
  raft topology: prevent accessing nullptr returned by topology::find
  raft topology: make some assertions non-crashing

2026-02-19 16:50:03 +01:00

..

broadcast_tables/experimental

treewide: Move replica related files to replica directory

2025-09-18 08:00:35 +03:00

direct_failure_detector

treewide: #include Seastar headers with angle brackets

2026-01-13 14:56:15 +02:00

mv: allow setting concurrency in PRUNE MATERIALIZED VIEW

2025-11-27 00:02:28 +01:00

paxos_state: get_replica_lock: remove shard check

2025-10-31 21:37:39 +01:00

Populate all sl:* groups into dedicated top-level supergroup

2026-01-21 14:14:48 +02:00

raft topology: make some assertions non-crashing

2026-02-12 13:10:03 +01:00

strong_consistency

strong consistency: implement coordinator::query()

2026-01-21 14:56:01 +01:00

address_map.hh

code: Stop using seastar::compat::source_location

2025-11-27 19:10:11 +02:00

cache_hitrate_calculator.hh

…

cas_shard.hh

…

client_routes.cc

db: api: service: Fix ClientConnectorError in test_client_routes The bug was caused by capturing local variables by reference in lambdas passed to with_retry(), which is a coroutine. When the coroutine suspends, the lambda frame exits and the referenced locals are destroyed, leading to use-after-lifetime issues. This change fixes the problem by ensuring safe ownership across suspension points and also refactors how route_keys and route_entries are passed from the caller. Previously they were passed as const lvalue references, which cannot be moved and therefore ended up being repeatedly copied across function calls and lambda invocations. The new approach avoids unnecessary copies and makes the lifetime semantics explicit and safe.

2025-12-22 14:52:47 +02:00

client_routes.hh

db: api: service: Fix ClientConnectorError in test_client_routes The bug was caused by capturing local variables by reference in lambdas passed to with_retry(), which is a coroutine. When the coroutine suspends, the lambda frame exits and the referenced locals are destroyed, leading to use-after-lifetime issues. This change fixes the problem by ensuring safe ownership across suspension points and also refactors how route_keys and route_entries are passed from the caller. Previously they were passed as const lvalue references, which cannot be moved and therefore ended up being repeatedly copied across function calls and lambda invocations. The new approach avoids unnecessary copies and makes the lifetime semantics explicit and safe.

2025-12-22 14:52:47 +02:00

client_state.cc

auth: add CDC streams and timestamps to vector search permissions

2026-02-04 09:10:08 +01:00

client_state.hh

service: remove unused has_schema_access

2026-01-28 10:18:26 +02:00

CMakeLists.txt

strong_consistency: add coordinator

2026-01-21 14:56:01 +01:00

endpoint_lifecycle_subscriber.hh

service: transport: add CLIENT_ROUTES_CHANGE event

2025-12-15 18:19:37 +01:00

load_broadcaster.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

load_meter.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

maintenance_mode.hh

…

mapreduce_service.cc

treewide: Move query related files to a new query directory

2025-09-16 23:40:47 +03:00

mapreduce_service.hh

code: Replace distributed<> with sharded<>

2025-09-19 12:22:51 +02:00

memory_limiter.hh

…

migration_listener.hh

migration_listener: fix deadlock in nested notifications

2025-12-17 14:00:28 +01:00

migration_manager.cc

gossiper: Export its scheduling group for those who need it

2026-01-28 18:29:33 +03:00

migration_manager.hh

migration_manager: Reorder members

2026-01-28 18:29:33 +03:00

misc_services.cc

replica/table: keep track of total pre-compression file size

2025-11-13 00:49:57 +01:00

query_state.hh

…

session.cc

…

session.hh

…

state_id.hh

…

storage_proxy_fwd.hh

…

storage_proxy_stats.hh

storage_proxy_stats: add fenced_out_requests metric

2025-09-15 11:24:53 +02:00

storage_proxy.cc

storage_proxy: hold erms in replica handlers

2026-02-16 08:57:42 +01:00

storage_proxy.hh

hints: Provide explicit scheduling group for hint_sender

2026-01-27 12:50:11 +02:00

storage_service.cc

Merge 'raft ropology: prevent crashes of multiple nodes' from Patryk Jędrzejczak

2026-02-19 16:50:03 +01:00

storage_service.hh

topology_coordinator: pass raft_topology_cmd by value

2026-02-16 08:57:42 +01:00

tablet_allocator_fwd.hh

…

tablet_allocator.cc

Merge 'tablets: Reduce per-shard migration concurrency to 2' from Tomasz Grabiec

2026-02-19 15:31:43 +02:00

tablet_allocator.hh

Merge 'Improve load balancer logging and other minor cleanups' from Tomasz Grabiec

2026-01-29 08:25:17 +02:00

tablet_operation.hh

…

task_manager_module.cc

service: tasks: fix type of global_topology_request_virtual_task

2026-01-16 11:36:21 +01:00

task_manager_module.hh

tasks: service: add global_topology_request_virtual_task

2025-12-16 13:31:22 +01:00

topology_coordinator.cc

Merge 'raft ropology: prevent crashes of multiple nodes' from Patryk Jędrzejczak

2026-02-19 16:50:03 +01:00

topology_coordinator.hh

topology_coordinator: pass raft_topology_cmd by value

2026-02-16 08:57:42 +01:00

topology_guard.hh

…

topology_mutation.cc

raft topology: make some assertions non-crashing

2026-02-12 13:10:03 +01:00

topology_mutation.hh

db: service: add paused_rf_change_requests to system.topology

2025-12-16 13:25:38 +01:00

topology_state_machine.cc

topology: Protect against empty cancelation reason

2026-01-18 15:36:05 +01:00

topology_state_machine.hh

tasks, topology: Make pending node operations abortable

2026-01-18 15:36:05 +01:00

view_update_backlog_broker.hh

…