Commit Graph

146 Commits

Author SHA1 Message Date
Ernest Zaslavsky
a0016bd0cc s3_client: relocate req creation closer to usage
Move the creation of the `req` object to the point where it is
actually used, improving code clarity and reducing premature
initialization.
2025-08-14 16:18:43 +03:00
Ernest Zaslavsky
6ef2b0b510 s3_client: reformat long logging lines for readability
Break up excessively long logging statements to improve readability
and maintain consistent formatting across the codebase.
2025-08-14 16:18:43 +03:00
Ernest Zaslavsky
dd51e50f60 s3_client: add memory fallback in chunked_download_source
Introduce fallback logic in `chunked_download_source` to handle
memory exhaustion. When memory is low, feed the `deque` with only
one uncounted buffer at a time. This allows slow but steady progress
without getting stuck on the memory semaphore.

Fixes: https://github.com/scylladb/scylladb/issues/25453
Fixes: https://github.com/scylladb/scylladb/issues/25262

Closes scylladb/scylladb#25452
2025-08-14 09:52:10 +03:00
Ernest Zaslavsky
380c73ca03 s3_client: make memory semaphore acquisition abortable
Add `abort_source` to the `get_units` call for the memory semaphore
in the S3 client, allowing the acquisition process to be aborted.

Fixes: https://github.com/scylladb/scylladb/issues/25454

Closes scylladb/scylladb#25469
2025-08-13 08:48:55 +03:00
Ernest Zaslavsky
fc2c9dd290 s3_client: Disable Seastar-level retries in HTTP client creation
Prevent Seastar from retrying HTTP requests to avoid buffer double-feed
issues when an entire request is retried. This could cause data
corruption in `chunked_download_source`. The change is global for every
instance of `s3_client`, but it is still safe because:
* Seastar's `http_client` resets connections regardless of retry behavior
* `s3_client` retry logic handles all error types—exceptions, HTTP errors,
  and AWS-specific errors—via `http_retryable_client`
2025-07-21 17:03:23 +03:00
Ernest Zaslavsky
ba910b29ce s3_test: Validate handling of non-aws_error exceptions
Inject exceptions not wrapped in `aws_error` from request callback
lambda to verify they are properly caught and handled.
2025-07-21 16:52:43 +03:00
Ernest Zaslavsky
b7ae6507cd s3_client: Improve error handling in chunked_download_source
Create aws_error from raised exceptions when possible and respond
appropriately. Previously, non-aws_exception types leaked from the
request handler and were treated as non-retryable, causing potential
data corruption during download.
2025-07-21 16:49:47 +03:00
Ernest Zaslavsky
342e94261f s3_client: parse multipart response XML defensively
Ensure robust handling of XML responses when initiating multipart
uploads. Check for the existence of required nodes before access,
and throw an exception if the XML is empty or malformed.

Refs: https://github.com/scylladb/scylladb/issues/24676

Closes scylladb/scylladb#24990
2025-07-17 10:55:04 +03:00
Ernest Zaslavsky
acf15eba8e s3_test: Add s3_client test for non-retryable error handling
Introduce a test that injects a non-retryable error and verifies
that the chunked download source throws an exception as expected.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
49e8c14a86 s3_client: Fix edge case when the range is exhausted
Handle case where the download loop exits after consuming all data,
but before receiving an empty buffer signaling EOF. Without this, the
next request is sent with a non-zero offset and zero length, resulting
in "Range request cannot be satisfied" errors. Now, an empty buffer is
pushed to indicate completion and exit the fiber properly.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
e50f247bf1 s3_client: Fix indentation in try..catch block
Correct indentation in the `try..catch` block to improve code
readability and maintain consistent formatting.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
d2d69cbc8c s3_client: Stop retries in chunked download source
Disable retries for S3 requests in the chunked download source to
prevent duplicate chunks from corrupting the buffer queue. The
response handler now throws an exception to bypass the retry
strategy, allowing the next range to be attempted cleanly.

This exception is only triggered for retryable errors; unretryable
ones immediately halt further requests.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
6d9cec558a s3_client: Fix missing negation
Restore a missing `not` in a conditional check that caused
incorrect behavior during S3 client execution.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
e73b83e039 s3_client: Refine logging
Fix typo in log message to improve clarity and accuracy during
S3 operations.
2025-07-01 18:45:17 +03:00
Ernest Zaslavsky
f1d0690194 s3_client: Improve logging placement for current_range output
Relocated logging to occur after determining the `current_range`,
ensuring more relevant output during S3 client operations.
2025-07-01 18:45:17 +03:00
Pavel Emelyanov
dc166be663 s3: Mark claimed_buffer constructor noexcept
It just std::move-s a buffer and a semaphore_units objects, both moves
are noexcept, so is the constructor itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#24552
2025-06-18 20:36:45 +03:00
Pavel Emelyanov
b0766d1e73 Merge 's3_client: Refactor range class for state validation' from Ernest Zaslavsky
Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3. This should address and prevent future problems related to this issue https://github.com/minio/minio/issues/21333

No backport needed since this problem related only to this change https://github.com/scylladb/scylladb/pull/23880

Closes scylladb/scylladb#24312

* github.com:scylladb/scylladb:
  s3_client: headers cleanup
  s3_client: Refactor `range` class for state validation
2025-06-17 10:34:55 +03:00
Ernest Zaslavsky
e398576795 s3_client: Fix hang in get() on EOF by signaling condition variable
* Ensure _get_cv.signal() is called when an empty buffer received
* Prevents `get()` from stalling indefinitely while waiting on EOF
* Found when testing https://github.com/scylladb/scylladb/pull/23695

Closes scylladb/scylladb#24490
2025-06-17 10:33:19 +03:00
Ernest Zaslavsky
1b20e0be4a s3_client: headers cleanup 2025-06-16 16:02:30 +03:00
Ernest Zaslavsky
9ad7a456fe s3_client: Refactor range class for state validation
Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3.
2025-06-16 16:02:24 +03:00
Ernest Zaslavsky
2b300c8eb9 s3_client: Improve reporting of S3 client statistics
Revise how we report statistics for `chunked_download_source`. Ensure
metrics for downloaded but unconsumed data are visible, as they do not
contribute to read amplification, which is tracked separately.

Closes scylladb/scylladb#24491
2025-06-16 09:33:57 +03:00
Ernest Zaslavsky
30199552ac s3_client: Mitigate connection exhaustion in download_source
The existing `download_source` implementation optimizes performance
by keeping the connection to S3 open and draining data directly from
the socket. While this eliminates the overhead (60-100ms) of repeatedly
establishing new connections, it leads to rapid exhaustion of client-
side connections.

On a single shard, two `mx_readers` for load and stream are enough to
trigger this issue. Since each client typically holds two connections,
readers keeping index and data sources open can cause deadlocks where
processes stall due to unavailable connections.

Introduce `chunked_download_source`, a new S3 download method built on
`download_source`, to dynamically manage connections:

- Buffers data in 5MiB chunks using a producer-consumer model
- Closes connections once buffers reach capacity, returning them to
  the pool for other clients
- Uses a filling fiber that resumes fetching once buffers are
  consumed from the queue

Performance remains comparable to `download_source`, achieving
95MiB/s for sequential 1GiB downloads from S3. However, preloading
large chunks may cause read amplification.

Fixes: https://github.com/scylladb/scylladb/issues/23785

Closes scylladb/scylladb#23880
2025-06-10 12:58:24 +03:00
Ernest Zaslavsky
a369dda049 s3_client: implement S3 copy object
Add support for the CopyObject API to enable direct copying of S3
objects between locations. This approach eliminates networking
overhead on the client side, as the operation is handled internally
by S3.
2025-04-17 09:47:47 +03:00
Ernest Zaslavsky
8929cb324e s3_client: improve exception message
Clarify that the multipart upload was aborted due to a failure in
parsing ETags.
2025-04-16 18:58:22 +03:00
Ernest Zaslavsky
993953016f s3_client: reposition local function for future use
The local function has been relocated higher in the code
to prepare for its usage in upcoming implementations.
2025-04-16 18:46:31 +03:00
Benny Halevy
d3f498ae59 utils: s3::client::multipart_upload: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Benny Halevy
eea83464c7 utils: s3::client: use named_gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:46:51 +03:00
Kefu Chai
55777812d4 s3/client: Optimize file streaming with zero-copy multipart uploads
When streaming files using multipart upload, switch from using
`output_stream::write(const char*, size_t)` to passing buffer objects
directly to `output_stream::write()`. This eliminates unnecessary memory
copying that occurred when the original implementation had to
defensively copy data before sending.

The buffer objects can now be safely reused by the output stream instead
of creating deep copies, which should improve performance by reducing
memory operations during S3 file uploads.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23567
2025-04-07 12:50:06 +03:00
Pavel Emelyanov
1f301b1c5d s3/client: Introduce data_source_impl for object downloading
The new data source implementation runs a single GET for the whole range
specified and lends the body input_stream for the upper input_stream's
get()-s. Eventually, getting the data from the body stream EOFs or
fails. In either case, the existing body is closed and a new GET is
spawn with the updater Range header so that not to include the bytes
read so far.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:06 +03:00
Pavel Emelyanov
d47719f70e s3/client: Detach format_range_header() helper
The get_object_contiguous() formats the 'bytes=X-Y' one for its GET
request. The very same code will be needed by next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:06 +03:00
Ernest Zaslavsky
2fb5c7402e s3_client: Rearrange credentials providers chain
As the IAM role is not configured to assume a role at this moment, it
makes sense to move the instance metadata credentials provider up in
the chain. This avoids unnecessary network calls and prevents log
clutter caused by failure messages.

Closes scylladb/scylladb#23360
2025-03-20 17:43:04 +03:00
Pavel Emelyanov
23089e1387 Merge 'Enhance S3 client robustness' from Ernest Zaslavsky
This PR introduces several key improvements to bolster the reliability of our S3 client, particularly in handling intermittent authentication and TLS-related issues. The changes include:

1. **Automatic Credential Renewal and Request Retry**: When credentials expire, the new retry strategy now resets the credentials and set the client to the retryable state, so the client will re-authenticate, and automatically retry the request. This change prevents transient authentication failures from propagating as fatal errors.
2. **Enhanced Exception Unwrapping**: The client now extracts the embedded std::system_error from std::nested_exception instances that may be raised by the Seastar HTTP client when using TLS. This allows for more precise error reporting and handling.
3. **Expanded TLS Error Handling**: We've added support for retryable TLS error codes within the std::system_error handler. This modification enables the client to detect and recover from transient TLS issues by retrying the affected operations.

Together, these enhancements improve overall client robustness by ensuring smoother recovery from both credential and TLS-related errors.

No backport needed since it is an enhancement

Closes scylladb/scylladb#22150

* github.com:scylladb/scylladb:
  aws_error: Add GNU TLS codes
  s3_client: Handle nested std::system_error exceptions
  s3_client: Start using new retry strategy
  retry_strategy: Add custom retry strategy for S3 client
  retry_strategy: Make `should_retry` awaitable
2025-03-20 16:52:20 +03:00
Ernest Zaslavsky
367140a9c5 s3_client: Start using new retry strategy
* Previously, token expiration was considered a fatal error. With this change,
the `s3_client` uses new retry strategy that is trying to renew expired
creds
* Added related test to the `s3_proxy`
2025-03-17 16:38:14 +02:00
Pavel Emelyanov
6217124d1d s3/client: Make "expected" reply status truly optional
Currently when a client::make_request() is called it can pass
std::optional<status> argument indicating which status it expects from
server. In case status doesn't match, the request body handler won't be
called, the request will fail with unexpected status exception.

However, disengaged expected implicitly means, that the requestor
expects the OK (200) status. This makes it impossible to make a query
which return status is not known in advance and it's up to the handler
to check it.

Lower level http client allows disengaged expected with the described
semantics -- handler will check status its own. This behavios for s3
client is needed for GET request. Server can respond with OK or partial
content status depending on the Range header. If the header is absent or
is large enough for the requested object to fit into it, the status
would be OK, if the object is "trimmed" the status is partial content.
In the end of the day, requestor cannot "guess" the returning status in
advance and should check it upon response arrival.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23243
2025-03-17 15:34:58 +02:00
Ernest Zaslavsky
7c49ee4520 s3_client: enhance retryable_http_client functionality
Enhanced `retryable_http_client` by allowing the injection of a custom error handler through its constructor.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
b589a882bb s3_client: isolate retryable_http_client
Relocated `retryable_http_client` into its own dedicated file for improved clarity and maintainability.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
5eff83af95 s3_client: Prepare for retryable_http_client relocation
Expose `map_s3_client_exception` outside the S3 client class to facilitate moving `retryable_http_client` to a separate file.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
2b3abba10a s3_client: Remove is_redirect_status function
Eliminate the `is_redirect_status` function in favor of the equivalent functionality provided by Seastar's HTTP client.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
5b7d4a4136 s3_client: Move retryable functionality out of s3 client
This commit moves the retryable HTTP client functionality out of the S3 client implementation. Since this functionality is also required for other services, such as AWS STS, it has been separated to ensure broader applicability.
2025-03-10 09:01:47 +02:00
Robert Bindar
27f2d64725 Remove object storage config credentials provider
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Closes scylladb/scylladb#22951
2025-03-07 10:40:58 +03:00
Pavel Emelyanov
b52d1a3d99 s3/client: Make http client connections limit configurable
It's now calculated based on sched group shares, but for tests explicit
value is needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-02-14 16:27:25 +03:00
Ernest Zaslavsky
5a266926e5 s3_client: Increase default part size for optimal performance
Set the `upload_file` part size to 50MiB, as this value provides the best performance based on tests conducted using `perf_s3_client` on an i4i.4xlarge instance.

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 5
INFO  2025-02-06 10:34:08,007 [shard 0:main] perf - Uploaded 1024MB in 27.768863962s, speed 36.87583335786734MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 10
INFO  2025-02-06 10:35:07,161 [shard 0:main] perf - Uploaded 1024MB in 28.175412552s, speed 36.34374467845414MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 20
INFO  2025-02-06 10:35:55,530 [shard 0:main] perf - Uploaded 1024MB in 14.483539631s, speed 70.700949221575MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 30
INFO  2025-02-06 10:36:35,466 [shard 0:main] perf - Uploaded 1024MB in 11.486155799s, speed 89.15080188004683MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 40
INFO  2025-02-06 10:37:46,642 [shard 0:main] perf - Uploaded 1024MB in 10.236196424s, speed 100.03715809898961MB/s

/perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 50
INFO  2025-02-06 10:38:34,777 [shard 0:main] perf - Uploaded 1024MB in 9.490644522s, speed 107.895728011548MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 60
INFO  2025-02-06 10:39:08,832 [shard 0:main] perf - Uploaded 1024MB in 9.767783693s, speed 104.83442633295012MB/s

./perf_s3_client --smp 1 --upload --object_name ./1G-test-file --sockets 1 --part_size_mb 70
INFO  2025-02-06 10:39:47,916 [shard 0:main] perf - Uploaded 1024MB in 10.166116742s, speed 100.72675988162482MB/s

Closes scylladb/scylladb#22732
2025-02-07 13:49:54 +03:00
Ernest Zaslavsky
97d789043a s3_client: Fix buffer offset reset on request retry
This patch addresses an issue where the buffer offset becomes incorrect when a request is retried. The new request uses an offset that has already been advanced, causing misalignment. This fix ensures the buffer offset is correctly reset, preventing such errors.

Closes scylladb/scylladb#22729
2025-02-07 08:52:08 +03:00
Ernest Zaslavsky
dee4fc7150 aws creds: add STS and Instance Metadata service credentials providers
This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.
2025-02-05 14:57:19 +02:00
Ernest Zaslavsky
d534051bea aws creds: add env. and file credentials providers
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
2025-02-05 14:57:19 +02:00
Ernest Zaslavsky
c911fc4f34 s3 creds: move credentials out of endpoint config
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
2025-02-04 16:45:23 +02:00
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Pavel Emelyanov
bb094cc099 Merge 'Make restore task abortable' from Calle Wilund
Fixes #20717

Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.

Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".

Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.

There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.

Closes scylladb/scylladb#21567

* github.com:scylladb/scylladb:
  test_backup: Add restore abort test case
  sstables_loader: Make restore task abortable
  distributed_loader: Add optional abort_source to get_sstables_from_object_store
  s3_storage: Add optional abort_source to params/object
  s3::client: Make "readable_file" abortable
2024-12-19 12:23:33 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Calle Wilund
af4dd1f2cb s3::client: Make "readable_file" abortable
Adds optional abortable source to "readable_file" interface.
Note: the abortable aspect is not preserved across a "dup()" call
however, since these objects are generally not used in a cross-shard
fashion, it should be ok.
2024-12-02 12:30:24 +00:00