Files
scylladb/replica
Kamil Braun 56c91473f2 Merge 'storage_proxy: silence abort_requested_exception on reads and writes' from Patryk Jędrzejczak
Fixes #10447

This issue is an expected behavior. However, `abort_requested_exception` is not handled properly.

-- Why this issue appeared

1. The node is drained.
2. `migration_manager::drain` is called and executes `_as.request_abort();`.
3. The coordinator sends read RPCs to the drained replica. On the replica side, `storage_proxy::handle_read` calls `migration_manager::get_schema_for_read`, which is defined like this:
```cpp
future<schema_ptr> migration_manager::get_schema_for_write(/* ... */) {
    if (_as.abort_requested()) {
        co_return coroutine::exception(std::make_exception_ptr(abort_requested_exception()));
    }
    /* ... */
```
So, `abort_requested_exception` is thrown.
4. RPC doesn't preserve information about its type, and it is converted to a string containing its error message.
5. It is rethrown as `std::runtime_error` on the coordinator side, and `abstract_resolve_reader::error()` logs information about it. However, we don't want to report `abort_requested_exception` there. This exception should be catched and ignored:
```cpp
void error(/* ... */) {
    /* ... */
   else if (try_catch<abort_requested_exception>(eptr)) {
        // do not report aborts, they are trigerred by shutdown or timeouts
    }
    /* ... */
```

-- Proposed solution

To fix this issue, we can add `abort_requested_exception` to `replica::exception_variant` and make sure that if it is thrown by `migration_manager::get_schema_for_write`, `storage_proxy::handle_read` correctly  encodes it. Thanks to this change, `abstract_read_resolver::error` can correctly handle `abort_requested_exception` thrown on the replica side by not reporting it.

-- Side effect of the proposed solution

If the replica supports it, the coordinator doesn't, and all nodes support `feature_service::typed_errors_in_read_rpc`, the coordinator will fail to decode `abort_requested_exception` and it will be decoded to `unknown_exception`. It will still be rethrown as `std::runtime_error`, however the message will change from *abort requested* to *unknown exception*.

-- Another issue

Moreover, `handle_write` reports abort requests for the same reason. This also floods the logs (this time on the replica side) for the same reason. I don't think it is intended, so I've changed it too. This change is in the last commit.

Closes #14681

* github.com:scylladb/scylladb:
  service: storage_proxy: do not report abort requests in handle_write
  service: storage_proxy: encode abort_requested_exception in handle_read
  service: storage_proxy: refactor encode_replica_exception_for_rpc
  replica: add abort_requested_exception to exception_variant
2023-07-18 17:04:05 +02:00
..
2023-02-15 11:01:50 +02:00
2023-05-12 07:26:18 -04:00
2023-07-17 11:47:02 +03:00