Before this change, if a read executor had just enough targets to
achieve query's CL, and there was a connection drop (e.g. node failure),
the read executor waited for the entire request timeout to give drivers
time to execute a speculative read in a meantime. Such behavior don't
work well when a very long query timeout (e.g. 1800s) is set, because
the unfinished request blocks topology changes.
This change implements a mechanism to thrown a new
read_failure_exception_with_timeout in the aforementioned scenario.
The exception is caught by CQL server which conducts the waiting, after
ERM is released. The new exception inherits from read_failure_exception,
because layers that don't catch the exception (such as mapreduce
service) should handle the exception just a regular read_failure.
However, when CQL server catch the exception, it returns
read_timeout_exception to the client because after additional waiting
such an error message is more appropriate (read_timeout_exception was
also returned before this change was introduced).
This change:
- Rewrite cql_server::connection::process_request_one to use
seastar::futurize_invoke and try_catch<> instead of utils::result_try
- Add new read_failure_exception_with_timeout and throws it in storage_proxy
- Add sleep in CQL server when the new exception is caught
- Catch local exceptions in Mapreduce Service and convert them
to std::runtime_error.
- Add get_cql_exclusive to manager_client.py
- Add test_long_query_timeout_erm
No backport needed - minor issue fix.
Closesscylladb/scylladb#23156
* github.com:scylladb/scylladb:
test: add test_long_query_timeout_erm
test: add get_cql_exclusive to manager_client.py
mapreduce: catch local read_failure_exception_with_timeout
transport: storage_proxy: release ERM when waiting for query timeout
transport: remove redundant references in process_request_one
transport: fix the indentation in process_request_one
transport: add futures in CQL server exception handling