storage_proxy: log gate_closed_exception

gate_closed_exception likely signals that we have shutdown order
issues. If we just swallow it we lose information what
exact component was shutdown prematurely.

For example, we stopped local storage before group0 during shutdown
in main.cc. If a group0 command arrives, topology_state_load might
try to write something and get mutation_write_failure_exception,
which results in 'applier fiber stopped because of the error'.
There is no other information in the logs in this case, other
than 'mutation_write_failure_exception'. It's not clear what the
original problem is and what component is triggering it.

In this commit we add a warning to the logs when gate_closed_exception
is thrown from lmutate or rmutate.

Another option is to just remove the try_catch_nested line and allow
gate_closed_exception to be logged as an error below. However,
this might break some tests which check ERROR lines in the logs.
This commit is contained in:
Petr Gusev
2025-06-06 10:00:18 +02:00
parent 8aeb404893
commit e456d2d507

View File

@@ -4378,8 +4378,10 @@ void storage_proxy::send_to_live_endpoints(storage_proxy::response_id_type respo
msg = stale->what();
} else if (try_catch_nested<rpc::closed_error>(eptr)) {
// ignore, disconnect will be logged by gossiper
} else if (try_catch_nested<seastar::gate_closed_exception>(eptr)) {
// may happen during shutdown, ignore it
} else if (const auto* e = try_catch_nested<seastar::gate_closed_exception>(eptr)) {
// may happen during shutdown, log and ignore it
slogger.warn("gate_closed_exception during mutation write to {}: {}",
coordinator, e->what());
} else if (try_catch<timed_out_error>(eptr)) {
// from lmutate(). Ignore so that logs are not flooded
// database total_writes_timedout counter was incremented.