scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	961fc9e041	s3: Don't rearm credential timers when credentials are not refreshed The update_credentials_and_rearm() may get "empty" credentials from _creds_provider_chain.get_aws_credentials() -- it doesn't throw, but returns default-initialized value. In that case the expires_at will be set to time_point::min, and it's probably not a good idea to arm the refresh timer and, even worse idea, to subtract 1h from it. Fixes #29056 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29057	2026-03-20 10:07:01 +02:00
Pavel Emelyanov	0a8dc4532b	s3: Fix missing upload ID in copy_part trace log The format string had two {} placeholders but three arguments, the _upload_id one is skipped from formatting Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29053	2026-03-20 10:05:44 +02:00
Pavel Emelyanov	d6c01be09b	s3/client: Don't reconstruct regex on every parse_content_range call Make the pattern static const so it is compiled once at first call rather than on every Content-Range header parse. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29054	2026-03-18 17:56:33 +02:00
Botond Dénes	5d868dcc55	Merge 's3_client: fix s3::range max value for object size' from Ernest Zaslavsky - fix s3::range max value for object size which is 50TiB and not 5. - refactor constants to make it accessible for all interested parties, also reuse these constants in tests No need to backport, doubt we will encounter an object larger than 5TiB Closes scylladb/scylladb#28601 * github.com:scylladb/scylladb: s3_client: reorganize tests in part_size_calculation_test s3_client: switch using s3 limits constants in tests s3_client: fix the s3::range max object size s3_client: remove "aws" prefix from object limits constants s3_client: make s3 object limits accessible	2026-03-17 16:34:42 +02:00
Ernest Zaslavsky	24e70b30c8	s3_client: remove "aws" prefix from object limits constants remove "aws" prefix from object limits constants since it is irrelevant and unnecessary when sitting under s3 namespace	2026-02-18 12:12:04 +02:00
Ernest Zaslavsky	329c156600	s3_client: make s3 object limits accessible make s3 limits constants publicly accessible to reuse it later	2026-02-18 12:12:04 +02:00
Pavel Emelyanov	89d8ae5cb6	Merge 'http: prepare http clients retry machinery refactoring' from Ernest Zaslavsky Today S3 client has well established and well testes (hopefully) http request retry strategy, in the rest of clients it looks like we are trying to achieve the same writing the same code over and over again and of course missing corner cases that already been addressed in the S3 client. This PR aims to extract the code that could assist other clients to detect the retryability of an error originating from the http client, reuse the built in seastar http client retryability and to minimize the boilerplate of http client exception handling No backport needed since it is only refactoring of the existing code Closes scylladb/scylladb#28250 * github.com:scylladb/scylladb: exceptions: add helper to build a chain of error handlers http: extract error classification code aws_error: extract `retryable` from aws_error	2026-02-18 10:06:37 +03:00
Pavel Emelyanov	2f10fd93be	Merge 's3_client: Fix s3 part size and number of parts calculation' from Ernest Zaslavsky - Correct `calc_part_size` function since it could return more than 10k parts - Add tests - Add more checks in `calc_part_size` to comply with S3 limits Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-640 Must be ported back to 2025.3/4 and 2026.1 since we may encounter this bug in production clusters Closes scylladb/scylladb#28592 * github.com:scylladb/scylladb: s3_client: add more constrains to the calc_part_size s3_client: add tests for calc_part_size s3_client: correct multipart part-size logic to respect 10k limit	2026-02-18 10:04:53 +03:00
Ernest Zaslavsky	034c6fbd87	s3_client: limit multipart upload concurrency Prevent launching hundreds or thousands of fibers during multipart uploads by capping concurrent part submissions to 16. Closes scylladb/scylladb#28554	2026-02-16 13:32:58 +03:00
Ernest Zaslavsky	960adbb439	s3_client: add more constrains to the calc_part_size Enforce more checks on part size and object size as defined in "Amazon S3 multipart upload limits", see https://docs.aws.amazon.com/AmazonS3/latest/userguide/qfacts.html and https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingObjects.html	2026-02-10 13:15:07 +02:00
Ernest Zaslavsky	6280cb91ca	s3_client: add tests for calc_part_size Introduce tests that validate the corrected multipart part-size calculation, including boundary conditions and error cases.	2026-02-10 13:13:26 +02:00
Ernest Zaslavsky	289e910cec	s3_client: correct multipart part-size logic to respect 10k limit The previous calculation could produce more than 10,000 parts for large uploads because we mixed values in bytes and MiB when determining the part size. This could result in selecting a part size that still exceeded the AWS multipart upload limit. The updated logic now ensures the number of parts never exceeds the allowed maximum. This change also aligns the implementation with the code comment: we prefer a 50 MiB part size because it provides the best performance, and we use it whenever it fits within the 10,000-part limit. If it does not, we increase the part size (in bytes, aligned to MiB) to stay within the limit.	2026-02-10 13:13:25 +02:00
Ernest Zaslavsky	5beb7a2814	aws_error: extract `retryable` from aws_error Move aws::retryable to common location to reuse it later in other http based clients	2026-02-09 08:48:41 +02:00
Avi Kivity	bd08b6e5b2	Merge 'Unify configuration of object storage endpoints (take 2)' from Pavel Emelyanov To configure S3 storage, one needs to do ``` object_storage_endpoints: - name: s3.us-east-1.amazonaws.com port: 443 https: true aws_region: us-east-1 ``` and for GCS it's ``` object_storage_endpoints: - name: https://storage.googleapis.com:433 type: gs credentials_file: <gcp account credentials json file> ``` This PR updates the S3 part to look like ``` object_storage_endpoints: - name: https://s3.us-east-1.amazonaws.com:443 aws_region: us-east-1 ``` fixes: #26570 This is 2nd attempt, previous one (#27360) was reverted because it reported endpoint configs in new format via API and CQL always, even if the endpoint was configured in the old way. This "broke" scylla manager and some dtests. This version has this bug fixed, and endpoints are reported in the same format as they were configured with. About correctness of the changes. No modifications to existing tests are made here, so old format is respected correctly (as far as it's covered by tests). To prove the new format works the the test_get_object_store_endpoints is extended to validate both options. Some preparations to this test to make this happen come on their own with the PR #28111 to show that they are valid and pass before changing the core code. Enhancing the way configuration is made, likely no need to backport. Closes scylladb/scylladb#28112 * github.com:scylladb/scylladb: test: Validate S3 endpoints new format works docs: Update docs according to new endpoints config option format object_storage: Create s3 client with "extended" endpoint name s3/storage: Tune config updating sstable: Shuffle args for s3_client_wrapper test: Rename badconf variable into objconf test: Split the object_store/test_get_object_store_endpoints test	2026-01-14 18:29:03 +02:00
Pavel Emelyanov	e57ee84662	util: Re-use seastar::util::memory_data_sink A data_sink that stores buffers into an in-memory collection had appeared in seastar recently. In Scylla there's similar thing that uses memory_data_sink_buffer as a container, so it's possible to drop the data_sink_impl iself in favor of seastar implementation. For that to work there should be append_buffers() overload for the aforementioned container. For its nice implementation the container, in turn, needs to get push_back() method and value_type trait. The method already exists, but is called put(), so just rename it. There's one more user of it this method in S3 client, and it can enjoy the added append_buffers() helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28124	2026-01-14 08:54:00 +02:00
Pavel Emelyanov	f227de24b2	object_storage: Create s3 client with "extended" endpoint name For this, add the s3::client::make(endpoint, ...) overload that accepts endpoint in proto://host:port format. Then it parses the provided url and calls the legacy one, that accepts raw host string and config with port, https bit, etc. The generic object_storage_endpoint_param no longer needs to carry the internal s3::endpoint_config, the config option parsing changes respectively. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:24:06 +03:00
Pavel Emelyanov	8f97e6b3de	s3/storage: Tune config updating Don't prepare s3::endpoint_config from generic code, jut pass the region and iam_role_arn (those that can potentially change) to the callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:24:06 +03:00
Avi Kivity	0df85c8ae8	Revert "Merge 'Unify configuration of object storage endpoints' from Pavel Emelyanov" This reverts commit `1bb897c7ca`, reversing changes made to `954f2cbd2f`. It makes incompatible changes to the object storage configuration format, breaking tests [1]. It's likely that it doesn't break any production configuration, but we can't be sure. Fixes #27966 Closes scylladb/scylladb#27969	2026-01-05 08:53:41 +02:00
Pavel Emelyanov	a3ca4fccef	object_storage: Create s3 client with "extended" endpoint name For this, add the s3::client::make(endpoint, ...) overload that accepts endpoint in proto://host:port format. Then it parses the provided url and calls the legacy one, that accepts raw host string and config with port, https bit, etc. The generic object_storage_endpoint_param no longer needs to carry the internal s3::endpoint_config, the config option parsing changes respectively. Tests, that generate the config files, and docs are updated. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-12-10 15:33:47 +03:00
Pavel Emelyanov	932b008107	s3/storage: Tune config updating Don't prepare s3::endpoint_config from generic code, jut pass the region and iam_role_arn (those that can potentially change) to the callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-12-10 15:33:46 +03:00
Ernest Zaslavsky	e8ce49dadf	s3_client: remove unnecessary `co_await` in `make_request` Eliminates a redundant `co_await` by directly returning the `future`, simplifying the control flow without affecting behavior.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	d44bbb1b10	s3_client: remove unused `filler_exception` Eliminate the now-obsolete `filler_exception`, rendered redundant by earlier refactors that streamlined error handling in the S3 client.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	d3c6338de6	s3_client: fix indentation Fix indentation in background download fiber in `chunked_download_source`	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	47704deb1e	s3_client: simplify chunked download error handling using `make_request` Refactor `chunked_download_source` to eliminate redundant exception handling by leveraging the new `make_request` override with custom retry strategy. This streamlines the download fiber logic, improving readability and maintainability.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	2bc9b205b6	s3_client: reformat `make_request` functions for readability Reformats `make_request` functions with long argument lists to improve readability and comply with formatting guidelines.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	bf39412f4a	s3_client: eliminate duplication in `make_request` by using overload Removes redundant code in the `make_request` function by invoking the appropriate overload, simplifying logic and improving maintainability.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	3d51124cb0	s3_client: add `make_request` override with custom retry and error handler Introduce an override for `make_request` in `s3_client` to support custom retry strategies and error handlers, enabling flexibility beyond the default client behavior and improving control over request handling	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	bdb3979456	s3_client: migrate s3_client to Seastar HTTP client Eliminate use of `retryable_http_client` in `s3_client` and adopt Seastar's native HTTP client.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	2025760e75	s3_client: fix crash in `copy_s3_object` due to dangling stream In the `copy_part` method, move the `input_stream<char>` argument into a local variable before use. Failing to do so can lead to a SIGSEGV or trigger an abort under address sanitizer.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	0983c791e9	s3_client: coroutinize `copy_s3_object` response callback coroutinize `copy_s3_object` response callback for a bugfix in the following commit to prevent failing on dangling stream	2025-10-23 15:58:10 +03:00
Avi Kivity	ab488fbb3f	Merge 'Switch to seastar API level 9 (no more packet-s in output_stream/data_sink API)' from Pavel Emelyanov Other than patching Scylla sinks to implement new data_sink_impl::put(std::span<temporary_buffer>) overload, the PR changes transport write_response() method to stop using output_stream::write(scattered_message) because it's also gone. Using newer seastar API, no need to backport Closes scylladb/scylladb#26592 * github.com:scylladb/scylladb: code: Fix indentation after previous patch code: Switch to seastar API level 9 transport: Open-code invoke_with_counting into counting_data_sink::put transport: Don't use scattered_message utils: Implement memory_data_sink::put(net::packet)	2025-10-22 01:51:43 +03:00
Ernest Zaslavsky	fdd0d66f6e	s3_client: tune logging level Change all logging related to errors in `chunked_download_source` background download fiber to `info` to make it visible right away in logs.	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	4497325cd6	s3_client: add logging Add logging for the case when we encounter expired credentials, shouldnt happen but just in case	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	1d34657b14	s3_client: improve exception handling for chunked downloads Refactor the wrapping exception used in `chunked_download_source` to prevent the retry strategy from reattempting failed requests. The new implementation preserves the original `exception_ptr`, making the root cause clearer and easier to diagnose.	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	58a1cff3db	s3_client: fix indentation Reformat `client::make_request` to fix the indentation of `if` block	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	43acc0d9b9	s3_client: add max for client level retries To prevent client retrying indefinitely time skew and authentication errors add `max_attempts` to the `client::make_request`	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	116823a6bc	s3_client: remove `s3_retry_strategy` It never worked as intended, so the credentials handling is moving to the same place where we handle time skew, since we have to reauthenticate the request	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	185d5cd0c6	s3_client: support high-level request retries Add an option to retry S3 requests at the highest level, including reinitializing headers and reauthenticating. This addresses cases where retrying the same request fails, such as when the S3 server rejects a timestamp older than 15 minutes.	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	db1ca8d011	s3_client: just reformat `make_request` Just reformat previously changed methods to improve readability	2025-10-20 10:44:37 +03:00
Pavel Emelyanov	a88a36f5b5	code: Switch to seastar API level 9 In the new API the biggest change is to implement the only data_sink_impl::put(span<temporary_buffer>) overload. Encrypted file impl and sstables compress sink use fallback_put() helper that generates a chain of continuations each holding a buffer. The counting_data_sink in transport had mostly been patched to correct implementation by the previous patch, the change here is to replace vector argument with span one. Most other sinks just re-implement their put(vector<temporary_buffer>) overload by iterating over span and non-preemptively grabbing buffers from it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-10-17 10:26:50 +03:00
Ernest Zaslavsky	55fb2223b6	s3_client: unify `make_request` implementation Refactor `make_request` to use a single core implementation that handles authentication and issues the HTTP request. All overloads now delegate to this unified method.	2025-10-16 15:51:28 +03:00
Ernest Zaslavsky	413739824f	s3_client: track memory starvation in background filling fiber Introduce a counter metric to monitor instances where the background filling fiber is blocked due to insufficient memory in the S3 client. Closes scylladb/scylladb#26466	2025-10-14 11:22:54 +03:00
Ernest Zaslavsky	c2bab430d7	s3_client: fix `when` condition to prevent infinite locking Refine condition variable predicate in filling fiber to avoid indefinite waiting when `close` is invoked. Closes scylladb/scylladb#26449	2025-10-09 15:55:37 +03:00
Botond Dénes	1ac7b4c35e	treewide: move away from accessing httpd::request::query_parameters Acecssing this member directly is deprecated, migrate code to use {get,set}_query_param() and friends instead. Fixes: https://github.com/scylladb/scylladb/issues/26023	2025-09-24 11:52:15 +03:00
Pavel Emelyanov	6fb66b796a	s3: Add metrics to show S3 prefetch bytes The chunked download source sends large GET requests and then consumes data as it arrives. Sometimes it can stop reading from socket early and drop the in-flight data. The existing read-bytes metrics show only the number of consumed bytes, we we also want to know the number of requested bytes Refs #25770 (accounting of read-bytes) Fixes #25876 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25877	2025-09-16 23:40:47 +03:00
Pavel Emelyanov	9deea3655f	s3: Fix chunked download source metrics calculations In S3 client both read and write metrics have three counters -- number of requests made, number of bytes processed and request latency. In most of the cases all three counters are updated at once -- upon response arrival. However, in case of chunked download source this way of accounting metrics is misleading. In this code the request is made once, and then the obtained bytes are consumed eventually as the data arrive. Currently, each time a new portion of data is read from the socket the number of read requests is incremented. That's wrong, the request is made once, and this counter should also be incremented once, not for every data buffer that arrived in response. Same for read request latency -- it's "added" for every data buffer that arrives, but it's a lenghy process, the _request_ latency should be accounted once per responce. Maybe later we'll want to have "data latency" metrics as well, but for what we have now it's request latency. The number of read bytes is accounted properly, so not touched here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25770	2025-09-08 09:49:03 +03:00
Ernest Zaslavsky	a0016bd0cc	s3_client: relocate `req` creation closer to usage Move the creation of the `req` object to the point where it is actually used, improving code clarity and reducing premature initialization.	2025-08-14 16:18:43 +03:00
Ernest Zaslavsky	6ef2b0b510	s3_client: reformat long logging lines for readability Break up excessively long logging statements to improve readability and maintain consistent formatting across the codebase.	2025-08-14 16:18:43 +03:00
Ernest Zaslavsky	dd51e50f60	s3_client: add memory fallback in `chunked_download_source` Introduce fallback logic in `chunked_download_source` to handle memory exhaustion. When memory is low, feed the `deque` with only one uncounted buffer at a time. This allows slow but steady progress without getting stuck on the memory semaphore. Fixes: https://github.com/scylladb/scylladb/issues/25453 Fixes: https://github.com/scylladb/scylladb/issues/25262 Closes scylladb/scylladb#25452	2025-08-14 09:52:10 +03:00
Ernest Zaslavsky	380c73ca03	s3_client: make memory semaphore acquisition abortable Add `abort_source` to the `get_units` call for the memory semaphore in the S3 client, allowing the acquisition process to be aborted. Fixes: https://github.com/scylladb/scylladb/issues/25454 Closes scylladb/scylladb#25469	2025-08-13 08:48:55 +03:00

1 2 3 4

192 Commits