mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-25 11:00:35 +00:00
Previously, `raft_group0::abort()` was called in `storage_service::do_drain` (introduced in #24418) to stop the group0 Raft server before destroying local storage. This was necessary because `raft::server` depends on storage (via `raft_sys_table_storage` and `group0_state_machine`). However, this caused issues: services like `sstable_dict_autotrainer` and `auth::service`, which use `group0_client` but are not stopped by `storage_service`, could trigger use-after-free if `raft_group0` was destroyed too early. This can happen both during normal shutdown and when 'nodetool drain' is used. This PR reworks the shutdown logic: * Introduces `abort_and_drain()`, which aborts the server and waits for background tasks to finish, but keeps the server object alive. Clients will see `raft::stopped_error` if they try to access group0 after this method is called. * Final destruction now happens in `abort_and_destroy()`, called later from `main.cc`, ensuring safe cleanup. The `raft_server_for_group::aborted` is changed to a `shared_future`, as it is now awaited in both abort methods. Node startup can fail before reaching `storage_service`, in which case `drain_on_shutdown()` and `abort_and_drain()` are never called. To ensure proper cleanup, `raft_group0` deinitialization logic must be included in both `abort_and_drain()` and `abort_and_destroy()`. Refs #25115 Fixes #24625 Backport: the changes are complicated and not safe to backport, we'll backport a revert of the original patch (#24418) in a separate PR. Closes scylladb/scylladb#25151 * https://github.com/scylladb/scylladb: raft_group0: split shutdown into abort_and_drain and destroy Revert "main.cc: fix group0 shutdown order"