scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 20:57:00 +00:00

Author	SHA1	Message	Date
Ernest Zaslavsky	fa1ca8096b	s3_client: add tests for calc_part_size Introduce tests that validate the corrected multipart part-size calculation, including boundary conditions and error cases. (cherry picked from commit `6280cb91ca`)	2026-02-18 09:40:10 +00:00
Ernest Zaslavsky	137233b1e6	s3_client: correct multipart part-size logic to respect 10k limit The previous calculation could produce more than 10,000 parts for large uploads because we mixed values in bytes and MiB when determining the part size. This could result in selecting a part size that still exceeded the AWS multipart upload limit. The updated logic now ensures the number of parts never exceeds the allowed maximum. This change also aligns the implementation with the code comment: we prefer a 50 MiB part size because it provides the best performance, and we use it whenever it fits within the 10,000-part limit. If it does not, we increase the part size (in bytes, aligned to MiB) to stay within the limit. (cherry picked from commit `289e910cec`)	2026-02-18 09:40:10 +00:00
Ernest Zaslavsky	161b66759c	aws_error: fix nested exception handling The loop that unwraps nested exception, rethrows nested exception and saves pointer to the temporary std::exception& inner on stack, then continues. This pointer is, thus, pointing to a released temporary Closes scylladb/scylladb#28143 (cherry picked from commit `829bd9b598`) Closes scylladb/scylladb#28239	2026-01-20 11:19:06 +01:00
Ernest Zaslavsky	aa8495d465	s3_client: handle additional transient network errors Add handling for a broader set of transient network-related `std::errc` values in `aws_error::from_system_error`. Treat these conditions as retryable when the client re-creates the socket for each request. Fixes: https://github.com/scylladb/scylladb/issues/27349 Closes scylladb/scylladb#27350 (cherry picked from commit `605f71d074`) Closes scylladb/scylladb#27390	2025-12-03 12:25:15 +03:00
Ernest Zaslavsky	c39c560bc3	s3_client: tune logging level Change all logging related to errors in `chunked_download_source` background download fiber to `info` to make it visible right away in logs. (cherry picked from commit `fdd0d66f6e`)	2025-10-22 15:24:06 +03:00
Ernest Zaslavsky	fa3f309877	s3_client: add logging Add logging for the case when we encounter expired credentials, shouldnt happen but just in case (cherry picked from commit `4497325cd6`)	2025-10-22 15:24:06 +03:00
Ernest Zaslavsky	aca20f5ca5	s3_client: improve exception handling for chunked downloads Refactor the wrapping exception used in `chunked_download_source` to prevent the retry strategy from reattempting failed requests. The new implementation preserves the original `exception_ptr`, making the root cause clearer and easier to diagnose. (cherry picked from commit `1d34657b14`)	2025-10-22 15:24:06 +03:00
Ernest Zaslavsky	898f0ebe5e	s3_client: fix indentation Reformat `client::make_request` to fix the indentation of `if` block (cherry picked from commit `58a1cff3db`)	2025-10-22 15:24:06 +03:00
Ernest Zaslavsky	c89bed0a85	s3_client: add max for client level retries To prevent client retrying indefinitely time skew and authentication errors add `max_attempts` to the `client::make_request` (cherry picked from commit `43acc0d9b9`)	2025-10-22 15:24:05 +03:00
Ernest Zaslavsky	779a45e2c9	s3_client: remove `s3_retry_strategy` It never worked as intended, so the credentials handling is moving to the same place where we handle time skew, since we have to reauthenticate the request (cherry picked from commit `116823a6bc`)	2025-10-22 15:24:05 +03:00
Ernest Zaslavsky	85102711ba	s3_client: support high-level request retries Add an option to retry S3 requests at the highest level, including reinitializing headers and reauthenticating. This addresses cases where retrying the same request fails, such as when the S3 server rejects a timestamp older than 15 minutes. (cherry picked from commit `185d5cd0c6`)	2025-10-22 15:24:05 +03:00
Ernest Zaslavsky	c1d53eee92	s3_client: just reformat `make_request` Just reformat previously changed methods to improve readability (cherry picked from commit `db1ca8d011`)	2025-10-21 12:26:15 +00:00
Ernest Zaslavsky	268e6720a8	s3_client: unify `make_request` implementation Refactor `make_request` to use a single core implementation that handles authentication and issues the HTTP request. All overloads now delegate to this unified method. (cherry picked from commit `55fb2223b6`)	2025-10-21 12:26:15 +00:00
Ernest Zaslavsky	e9bdd13d1b	s3_client: track memory starvation in background filling fiber Introduce a counter metric to monitor instances where the background filling fiber is blocked due to insufficient memory in the S3 client. Closes scylladb/scylladb#26466 (cherry picked from commit `413739824f`) Closes scylladb/scylladb#26553	2025-10-15 12:05:20 +02:00
Ernest Zaslavsky	b2dc4647dd	s3_client: fix `when` condition to prevent infinite locking Refine condition variable predicate in filling fiber to avoid indefinite waiting when `close` is invoked. Closes scylladb/scylladb#26449 (cherry picked from commit `c2bab430d7`) Closes scylladb/scylladb#26496	2025-10-10 10:30:54 +03:00
Pavel Emelyanov	702eda371b	s3: Add metrics to show S3 prefetch bytes The chunked download source sends large GET requests and then consumes data as it arrives. Sometimes it can stop reading from socket early and drop the in-flight data. The existing read-bytes metrics show only the number of consumed bytes, we we also want to know the number of requested bytes Refs #25770 (accounting of read-bytes) Fixes #25876 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25877 (cherry picked from commit `6fb66b796a`) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26070	2025-09-25 09:26:41 +03:00
Pavel Emelyanov	b4598031e6	s3: Fix chunked download source metrics calculations In S3 client both read and write metrics have three counters -- number of requests made, number of bytes processed and request latency. In most of the cases all three counters are updated at once -- upon response arrival. However, in case of chunked download source this way of accounting metrics is misleading. In this code the request is made once, and then the obtained bytes are consumed eventually as the data arrive. Currently, each time a new portion of data is read from the socket the number of read requests is incremented. That's wrong, the request is made once, and this counter should also be incremented once, not for every data buffer that arrived in response. Same for read request latency -- it's "added" for every data buffer that arrives, but it's a lenghy process, the _request_ latency should be accounted once per responce. Maybe later we'll want to have "data latency" metrics as well, but for what we have now it's request latency. The number of read bytes is accounted properly, so not touched here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25770 (cherry picked from commit `9deea3655f`) Closes scylladb/scylladb#26145	2025-09-22 07:35:03 +03:00
Ernest Zaslavsky	8a017834a0	s3_client: add memory fallback in `chunked_download_source` Introduce fallback logic in `chunked_download_source` to handle memory exhaustion. When memory is low, feed the `deque` with only one uncounted buffer at a time. This allows slow but steady progress without getting stuck on the memory semaphore. Fixes: https://github.com/scylladb/scylladb/issues/25453 Fixes: https://github.com/scylladb/scylladb/issues/25262 Closes scylladb/scylladb#25452 (cherry picked from commit `dd51e50f60`) Closes scylladb/scylladb#25511	2025-08-15 13:30:38 +03:00
Ernest Zaslavsky	c70ba8384e	s3_client: make memory semaphore acquisition abortable Add `abort_source` to the `get_units` call for the memory semaphore in the S3 client, allowing the acquisition process to be aborted. Fixes: https://github.com/scylladb/scylladb/issues/25454 Closes scylladb/scylladb#25469 (cherry picked from commit `380c73ca03`) Closes scylladb/scylladb#25499	2025-08-14 10:34:28 +02:00
Ernest Zaslavsky	88779a6884	s3_creds: code cleanup Remove unnecessary code which is no more used (cherry picked from commit `837475ec6f`)	2025-08-06 00:50:36 +00:00
Ernest Zaslavsky	ccdc98c8f0	s3_creds: Make `reload` unconditional Assume that any caller invoking `reload` intends to refresh credentials. Remove conditional logic that checks for expiration before reloading. (cherry picked from commit `e4ebe6a309`)	2025-08-06 00:50:36 +00:00
Ernest Zaslavsky	e45852a595	s3_client: Disable Seastar-level retries in HTTP client creation Prevent Seastar from retrying HTTP requests to avoid buffer double-feed issues when an entire request is retried. This could cause data corruption in `chunked_download_source`. The change is global for every instance of `s3_client`, but it is still safe because: * Seastar's `http_client` resets connections regardless of retry behavior * `s3_client` retry logic handles all error types—exceptions, HTTP errors, and AWS-specific errors—via `http_retryable_client` (cherry picked from commit `fc2c9dd290`)	2025-07-22 16:46:54 +00:00
Ernest Zaslavsky	fdf706a6eb	s3_test: Validate handling of non-`aws_error` exceptions Inject exceptions not wrapped in `aws_error` from request callback lambda to verify they are properly caught and handled. (cherry picked from commit `ba910b29ce`)	2025-07-22 16:46:53 +00:00
Ernest Zaslavsky	2bc3accf9c	s3_client: Improve error handling in chunked_download_source Create aws_error from raised exceptions when possible and respond appropriately. Previously, non-aws_exception types leaked from the request handler and were treated as non-retryable, causing potential data corruption during download. (cherry picked from commit `b7ae6507cd`)	2025-07-22 16:46:53 +00:00
Ernest Zaslavsky	0106d132bd	aws_error: Add factory method for `aws_error` from exception Move `aws_error` creation logic out of `retryable_http_client` and into the `aws_error` class to support reuse across components. (cherry picked from commit `d53095d72f`)	2025-07-22 16:46:53 +00:00
Ernest Zaslavsky	934359ea28	s3_client: parse multipart response XML defensively Ensure robust handling of XML responses when initiating multipart uploads. Check for the existence of required nodes before access, and throw an exception if the XML is empty or malformed. Refs: https://github.com/scylladb/scylladb/issues/24676 Closes scylladb/scylladb#24990 (cherry picked from commit `342e94261f`) Closes scylladb/scylladb#25057	2025-07-21 12:03:00 +02:00
Ernest Zaslavsky	873c8503cd	s3_test: Add s3_client test for non-retryable error handling Introduce a test that injects a non-retryable error and verifies that the chunked download source throws an exception as expected. (cherry picked from commit `acf15eba8e`)	2025-07-13 13:17:14 +00:00
Ernest Zaslavsky	7f303bfda3	s3_client: Fix edge case when the range is exhausted Handle case where the download loop exits after consuming all data, but before receiving an empty buffer signaling EOF. Without this, the next request is sent with a non-zero offset and zero length, resulting in "Range request cannot be satisfied" errors. Now, an empty buffer is pushed to indicate completion and exit the fiber properly. (cherry picked from commit `49e8c14a86`)	2025-07-13 13:17:14 +00:00
Ernest Zaslavsky	22739df69f	s3_client: Fix indentation in try..catch block Correct indentation in the `try..catch` block to improve code readability and maintain consistent formatting. (cherry picked from commit `e50f247bf1`)	2025-07-13 13:17:14 +00:00
Ernest Zaslavsky	54db6ca088	s3_client: Stop retries in chunked download source Disable retries for S3 requests in the chunked download source to prevent duplicate chunks from corrupting the buffer queue. The response handler now throws an exception to bypass the retry strategy, allowing the next range to be attempted cleanly. This exception is only triggered for retryable errors; unretryable ones immediately halt further requests. (cherry picked from commit `d2d69cbc8c`)	2025-07-13 13:17:14 +00:00
Ernest Zaslavsky	00f10e7f1d	s3_client: Fix missing negation Restore a missing `not` in a conditional check that caused incorrect behavior during S3 client execution. (cherry picked from commit `6d9cec558a`)	2025-07-13 13:17:14 +00:00
Ernest Zaslavsky	4cd1792528	s3_client: Refine logging Fix typo in log message to improve clarity and accuracy during S3 operations. (cherry picked from commit `e73b83e039`)	2025-07-13 13:17:14 +00:00
Ernest Zaslavsky	115e8c85e4	s3_client: Improve logging placement for current_range output Relocated logging to occur after determining the `current_range`, ensuring more relevant output during S3 client operations. (cherry picked from commit `f1d0690194`)	2025-07-13 13:17:14 +00:00
Pavel Emelyanov	dc166be663	s3: Mark claimed_buffer constructor noexcept It just std::move-s a buffer and a semaphore_units objects, both moves are noexcept, so is the constructor itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24552	2025-06-18 20:36:45 +03:00
Pavel Emelyanov	b0766d1e73	Merge 's3_client: Refactor `range` class for state validation' from Ernest Zaslavsky Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3. This should address and prevent future problems related to this issue https://github.com/minio/minio/issues/21333 No backport needed since this problem related only to this change https://github.com/scylladb/scylladb/pull/23880 Closes scylladb/scylladb#24312 * github.com:scylladb/scylladb: s3_client: headers cleanup s3_client: Refactor `range` class for state validation	2025-06-17 10:34:55 +03:00
Ernest Zaslavsky	e398576795	s3_client: Fix hang in get() on EOF by signaling condition variable * Ensure _get_cv.signal() is called when an empty buffer received * Prevents `get()` from stalling indefinitely while waiting on EOF * Found when testing https://github.com/scylladb/scylladb/pull/23695 Closes scylladb/scylladb#24490	2025-06-17 10:33:19 +03:00
Ernest Zaslavsky	1b20e0be4a	s3_client: headers cleanup	2025-06-16 16:02:30 +03:00
Ernest Zaslavsky	9ad7a456fe	s3_client: Refactor `range` class for state validation Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3.	2025-06-16 16:02:24 +03:00
Ernest Zaslavsky	2b300c8eb9	s3_client: Improve reporting of S3 client statistics Revise how we report statistics for `chunked_download_source`. Ensure metrics for downloaded but unconsumed data are visible, as they do not contribute to read amplification, which is tracked separately. Closes scylladb/scylladb#24491	2025-06-16 09:33:57 +03:00
Ernest Zaslavsky	30199552ac	s3_client: Mitigate connection exhaustion in `download_source` The existing `download_source` implementation optimizes performance by keeping the connection to S3 open and draining data directly from the socket. While this eliminates the overhead (60-100ms) of repeatedly establishing new connections, it leads to rapid exhaustion of client- side connections. On a single shard, two `mx_readers` for load and stream are enough to trigger this issue. Since each client typically holds two connections, readers keeping index and data sources open can cause deadlocks where processes stall due to unavailable connections. Introduce `chunked_download_source`, a new S3 download method built on `download_source`, to dynamically manage connections: - Buffers data in 5MiB chunks using a producer-consumer model - Closes connections once buffers reach capacity, returning them to the pool for other clients - Uses a filling fiber that resumes fetching once buffers are consumed from the queue Performance remains comparable to `download_source`, achieving 95MiB/s for sequential 1GiB downloads from S3. However, preloading large chunks may cause read amplification. Fixes: https://github.com/scylladb/scylladb/issues/23785 Closes scylladb/scylladb#23880	2025-06-10 12:58:24 +03:00
Pavel Emelyanov	324daac156	Merge 'Add CopyObject API implementation to S3 client' from Ernest Zaslavsky Implement the CopyObject API to directly copy S3 object from one location to another. This implementation consumes zero networking overhead on the client side since the object is copied internally by S3 machinery Usage example: Backup of tiered SSTables - you already have SSTables on S3, CopyObject is the ideal way to go No need to backport since we are adding new functionality for a future use Closes scylladb/scylladb#23779 * github.com:scylladb/scylladb: s3_client: implement S3 copy object s3_client: improve exception message s3_client: reposition local function for future use	2025-04-18 16:17:41 +03:00
Pavel Emelyanov	cc919b08c2	Merge 'backup: Optimize S3 throughput with shard-based upload' from Ernest Zaslavsky This PR enhances S3 throughput by leveraging every available shard to upload backup files concurrently. By distributing the load across multiple shards, we significantly improve the upload performance. Each shard retrieves an SSTable and processes its files sequentially, ensuring efficient, file-by-file uploads. To prevent uncontrolled fiber creation and potential resource exhaustion, the backup task employs a directory semaphore from the sstables_manager. This mechanism helps regulate concurrency at the directory level, ensuring stable and predictable performance during large-scale backup operations. Refs #22460 fixes: #22520 ``` =========================================== Release build, master, smp-16, mem-32GiB Bytes: 2342880184, backup time: 9.51 s =========================================== Release build, this PR, smp-16, mem-32GiB Bytes: 2342891015, backup time: 1.23 s =========================================== ``` Looks like it is faster at least x7.7 No backport needed since it (native backup) is still unused functionality Closes scylladb/scylladb#23727 * github.com:scylladb/scylladb: backup: Add test for invalid endpoint backup_task: upload on all shards backup_task: integrate sharded storage manager for upload	2025-04-18 16:17:41 +03:00
Benny Halevy	b7212620f9	backup_task: upload on all shards Use all shards to upload snapshot files to S3. By using the sharded sstables_manager_for_table infrastructure. Refs #22460 Quick perf comparison =========================================== Release build, master, smp-16, mem-32GiB Bytes: 2342880184, backup time: 9.51 s =========================================== Release build, this PR, smp-16, mem-32GiB Bytes: 2342891015, backup time: 1.23 s =========================================== Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Co-authored-by: Ernest Zaslavsky <ernest.zaslavsky@scylladb.com>	2025-04-17 16:31:42 +03:00
Kefu Chai	b0cbe86780	s3/client: define a constant for security credential resource instead of repeating it, let's define a consstant and reuse it. less repeatings this way. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23713	2025-04-17 11:51:15 +03:00
Ernest Zaslavsky	a369dda049	s3_client: implement S3 copy object Add support for the CopyObject API to enable direct copying of S3 objects between locations. This approach eliminates networking overhead on the client side, as the operation is handled internally by S3.	2025-04-17 09:47:47 +03:00
Ernest Zaslavsky	8929cb324e	s3_client: improve exception message Clarify that the multipart upload was aborted due to a failure in parsing ETags.	2025-04-16 18:58:22 +03:00
Ernest Zaslavsky	993953016f	s3_client: reposition local function for future use The local function has been relocated higher in the code to prepare for its usage in upcoming implementations.	2025-04-16 18:46:31 +03:00
Pavel Emelyanov	b25cb5af0c	Merge 'Use named gates' from Benny Halevy Name the gates and phased barriers we use to make it easy to debug gate_closed_exception Refs https://github.com/scylladb/seastar/pull/2688 * Enhancement only, no backport needed Closes scylladb/scylladb#23329 * github.com:scylladb/scylladb: utils: loading_cache: use named_gate utils: flush_queue: use named_gate sstables_manager: use named gate sstables_loader: use named gate utils: phased_barrier, pluggable: use named gate utils: s3::client::multipart_upload: use named gate utils: s3::client: use named_gate transport: controller: use named gate tracing: trace_keyspace_helper: use named gate task_manager: module: use named gate topology_coordinator: use named gate storage_service: use named gate storage_proxy: wait_for_hint_sync_point: use named gate storage_proxy: remote: use named gate service: session: use named gate service: raft: raft_rpc: use named gate service: raft: raft_group0: use named gate service: raft: persistent_discovery: use named gate service: raft: group0_state_machine: use named gate service: migration_manager: use named gate replica: table: use named gate replica: compaction_group, storage_group: use named gate redis: query_processor: use named gate repair: repair_meta: use named gate reader_concurrency_semaphore: use named gate raft: server_impl: use named gate querier_cache: use named gate gms: gossiper: use named gate generic_server: use named gate db: sstables_format_listener: use named gate db: snapshot: backup_task: use named gate db: snapshot_ctl: use named gate hints: hints_sender: use named gate hints: manager: use named gate hints: hint_endpoint_manager: use named gate commitlog: segment_manager: use named gate db: batchlog_manager: use named gate query_processor: remote: use named gate compaction: compaction_state: use named gate alternator/server: use named_gate	2025-04-14 20:56:32 +03:00
Kefu Chai	b3f709bed7	s3: remove an extraneous space Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23714	2025-04-14 13:02:58 +03:00
Benny Halevy	d3f498ae59	utils: s3::client::multipart_upload: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:47:00 +03:00

1 2 3 4

194 Commits