This PR introduces several key improvements to bolster the reliability of our S3 client, particularly in handling intermittent authentication and TLS-related issues. The changes include:
1. **Automatic Credential Renewal and Request Retry**: When credentials expire, the new retry strategy now resets the credentials and set the client to the retryable state, so the client will re-authenticate, and automatically retry the request. This change prevents transient authentication failures from propagating as fatal errors.
2. **Enhanced Exception Unwrapping**: The client now extracts the embedded std::system_error from std::nested_exception instances that may be raised by the Seastar HTTP client when using TLS. This allows for more precise error reporting and handling.
3. **Expanded TLS Error Handling**: We've added support for retryable TLS error codes within the std::system_error handler. This modification enables the client to detect and recover from transient TLS issues by retrying the affected operations.
Together, these enhancements improve overall client robustness by ensuring smoother recovery from both credential and TLS-related errors.
No backport needed since it is an enhancement
Closesscylladb/scylladb#22150
* github.com:scylladb/scylladb:
aws_error: Add GNU TLS codes
s3_client: Handle nested std::system_error exceptions
s3_client: Start using new retry strategy
retry_strategy: Add custom retry strategy for S3 client
retry_strategy: Make `should_retry` awaitable
Enhance error handling by detecting and processing std::system_error exceptions
nested within std::nested_exception. This improvement ensures that system-level
errors wrapped in the exception chain are properly caught and managed, leading
to more robust error reporting and recovery.
Currently when a client::make_request() is called it can pass
std::optional<status> argument indicating which status it expects from
server. In case status doesn't match, the request body handler won't be
called, the request will fail with unexpected status exception.
However, disengaged expected implicitly means, that the requestor
expects the OK (200) status. This makes it impossible to make a query
which return status is not known in advance and it's up to the handler
to check it.
Lower level http client allows disengaged expected with the described
semantics -- handler will check status its own. This behavios for s3
client is needed for GET request. Server can respond with OK or partial
content status depending on the Range header. If the header is absent or
is large enough for the requested object to fit into it, the status
would be OK, if the object is "trimmed" the status is partial content.
In the end of the day, requestor cannot "guess" the returning status in
advance and should check it upon response arrival.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23243
* seastar 5b95d1d7...412d058c (62):
> fstream: Export functions for making file_data_source
> build: Include DPDK dependency libraries in Seastar linkage
> demos/tls_echo_server_demo: Modernize with seastar::async
> http/client: Pass abort source by pointer
> rpc: remove deprecated logging function support
> github: Add Alpine Linux workflow to test builds with musl libc
> exception_hacks: Make dl_iterate_phdr resolution manual
> tests: relax test_file_system_space check for empty filesystems
> demos/udp_server_demo: Modernize with seastar::async and proper teardown
> future: remove deprecated functions/concepts
> util: logger: remove deprecated set_stdout_enabled and logger_ostream_type::{stdout,stderr}
> memory: guard __GLIBC_PREREQ usage with __GLIBC__ check
> scheduling_specific: Add noexcept wrapper for free()
> file: Replace __gid_t with standard POSIX gid_t
> aio_storage_context: Use reactor::do_at_exit()
> json2code: support chunked_fifo
> json: remove unused headers
> httpd: test cases for streaming
> build: use find_dependency() instead find_package() in config file
> build: stop using a loop for finding dependencies
> dns: Fix event processing to work safely with recent c-ares
> tutorial: add a section about initialization and cleanup
> reactor: deprecate at_exit()
> httpclient: Add exception handling to connection::close
> file: document max_length-limits for dma_read/write funcs taking vector<iovec>
> build: fix P2582R1 detection in GCC compatibility check
> json2code: optimize string handling using std::string_view
> tests/unit: fix typo in test output
> doc: Update documentation after removing build.sh
> test: Add direct exception passing for awaits for perf test
> github: add Docker build verification workflow
> docker: update LLVM debian repo for Ubuntu Orcular migration
> tests/unit: Use http.HTTPStatus constants instead of raw status codes
> tests/unit: Fix exception verification in json2code_test.py
> httpd: handle streaming results in more handlers
> json: stream_object now moves value
> json: support for rvalue ranges
> chunked_fifo: make copyable
> reactor: deprecate at_destroy()
> testing: prevent test scheduling after reactor exit
> net: Add bytes sent/received metrics
> net: switch rss_key_type to std::span instead of std::string_view
> log: fixes for libc++ 19
> sstring: fixes for lib++ 19
> build: finalize numactl dependency removal
> build: link DPDK against libnuma when detected during build
> memory: remove libnuma dependency
> treewide: replace assert with SEASTAR_ASSERT
> future: fix typo in comment
> http: Unwrap nested exceptions to handle retryable transport errors
> net/ip, net: sed -i 's/to_ulong/to_uint/'
> core: function_traits noexcept specializations
> util/variant: seastar::visit forward value arg
> net/tls: fix missing include
> tls: Add a way to inspect peer certificate chain
> websocket: Extract encode_base64() function
> websocket: Rename wlogger to websocket_logger
> websocket: Extract parts of server_connection usable for client
> websocket: Rename connection to server_connection
> websocket: Extract websocket parser to separate file
> json2code_test: factor out query method
> seastar-json2code: fix error handling
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23281
- Seastar's HTTP client is known to throw exceptions for various reasons, including network errors, TLS errors and other transient issues.
- Update error handling to correctly capture and process all exceptions from Seastar's HTTP client.
- Previously, only aws_exception was handled, causing retryable errors to be missed and `should_retry` not invoked.
- Now, all exceptions trigger the appropriate retry logic per the intended strategy.
- Add tests for the S3 proxy to ensure robustness and reliability of these enhancements.