Fixes#10447
This issue is an expected behavior. However, `abort_requested_exception` is not handled properly.
-- Why this issue appeared
1. The node is drained.
2. `migration_manager::drain` is called and executes `_as.request_abort();`.
3. The coordinator sends read RPCs to the drained replica. On the replica side, `storage_proxy::handle_read` calls `migration_manager::get_schema_for_read`, which is defined like this:
```cpp
future<schema_ptr> migration_manager::get_schema_for_write(/* ... */) {
if (_as.abort_requested()) {
co_return coroutine::exception(std::make_exception_ptr(abort_requested_exception()));
}
/* ... */
```
So, `abort_requested_exception` is thrown.
4. RPC doesn't preserve information about its type, and it is converted to a string containing its error message.
5. It is rethrown as `std::runtime_error` on the coordinator side, and `abstract_resolve_reader::error()` logs information about it. However, we don't want to report `abort_requested_exception` there. This exception should be catched and ignored:
```cpp
void error(/* ... */) {
/* ... */
else if (try_catch<abort_requested_exception>(eptr)) {
// do not report aborts, they are trigerred by shutdown or timeouts
}
/* ... */
```
-- Proposed solution
To fix this issue, we can add `abort_requested_exception` to `replica::exception_variant` and make sure that if it is thrown by `migration_manager::get_schema_for_write`, `storage_proxy::handle_read` correctly encodes it. Thanks to this change, `abstract_read_resolver::error` can correctly handle `abort_requested_exception` thrown on the replica side by not reporting it.
-- Side effect of the proposed solution
If the replica supports it, the coordinator doesn't, and all nodes support `feature_service::typed_errors_in_read_rpc`, the coordinator will fail to decode `abort_requested_exception` and it will be decoded to `unknown_exception`. It will still be rethrown as `std::runtime_error`, however the message will change from *abort requested* to *unknown exception*.
-- Another issue
Moreover, `handle_write` reports abort requests for the same reason. This also floods the logs (this time on the replica side) for the same reason. I don't think it is intended, so I've changed it too. This change is in the last commit.
Closes#14681
* github.com:scylladb/scylladb:
service: storage_proxy: do not report abort requests in handle_write
service: storage_proxy: encode abort_requested_exception in handle_read
service: storage_proxy: refactor encode_replica_exception_for_rpc
replica: add abort_requested_exception to exception_variant