mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-01 04:26:48 +00:00
Fixes #10447 This issue is an expected behavior. However, `abort_requested_exception` is not handled properly. -- Why this issue appeared 1. The node is drained. 2. `migration_manager::drain` is called and executes `_as.request_abort();`. 3. The coordinator sends read RPCs to the drained replica. On the replica side, `storage_proxy::handle_read` calls `migration_manager::get_schema_for_read`, which is defined like this: ```cpp future<schema_ptr> migration_manager::get_schema_for_write(/* ... */) { if (_as.abort_requested()) { co_return coroutine::exception(std::make_exception_ptr(abort_requested_exception())); } /* ... */ ``` So, `abort_requested_exception` is thrown. 4. RPC doesn't preserve information about its type, and it is converted to a string containing its error message. 5. It is rethrown as `std::runtime_error` on the coordinator side, and `abstract_resolve_reader::error()` logs information about it. However, we don't want to report `abort_requested_exception` there. This exception should be catched and ignored: ```cpp void error(/* ... */) { /* ... */ else if (try_catch<abort_requested_exception>(eptr)) { // do not report aborts, they are trigerred by shutdown or timeouts } /* ... */ ``` -- Proposed solution To fix this issue, we can add `abort_requested_exception` to `replica::exception_variant` and make sure that if it is thrown by `migration_manager::get_schema_for_write`, `storage_proxy::handle_read` correctly encodes it. Thanks to this change, `abstract_read_resolver::error` can correctly handle `abort_requested_exception` thrown on the replica side by not reporting it. -- Side effect of the proposed solution If the replica supports it, the coordinator doesn't, and all nodes support `feature_service::typed_errors_in_read_rpc`, the coordinator will fail to decode `abort_requested_exception` and it will be decoded to `unknown_exception`. It will still be rethrown as `std::runtime_error`, however the message will change from *abort requested* to *unknown exception*. -- Another issue Moreover, `handle_write` reports abort requests for the same reason. This also floods the logs (this time on the replica side) for the same reason. I don't think it is intended, so I've changed it too. This change is in the last commit. Closes #14681 * github.com:scylladb/scylladb: service: storage_proxy: do not report abort requests in handle_write service: storage_proxy: encode abort_requested_exception in handle_read service: storage_proxy: refactor encode_replica_exception_for_rpc replica: add abort_requested_exception to exception_variant