scylladb

Author	SHA1	Message	Date
Avi Kivity	bd08b6e5b2	Merge 'Unify configuration of object storage endpoints (take 2)' from Pavel Emelyanov To configure S3 storage, one needs to do ``` object_storage_endpoints: - name: s3.us-east-1.amazonaws.com port: 443 https: true aws_region: us-east-1 ``` and for GCS it's ``` object_storage_endpoints: - name: https://storage.googleapis.com:433 type: gs credentials_file: <gcp account credentials json file> ``` This PR updates the S3 part to look like ``` object_storage_endpoints: - name: https://s3.us-east-1.amazonaws.com:443 aws_region: us-east-1 ``` fixes: #26570 This is 2nd attempt, previous one (#27360) was reverted because it reported endpoint configs in new format via API and CQL always, even if the endpoint was configured in the old way. This "broke" scylla manager and some dtests. This version has this bug fixed, and endpoints are reported in the same format as they were configured with. About correctness of the changes. No modifications to existing tests are made here, so old format is respected correctly (as far as it's covered by tests). To prove the new format works the the test_get_object_store_endpoints is extended to validate both options. Some preparations to this test to make this happen come on their own with the PR #28111 to show that they are valid and pass before changing the core code. Enhancing the way configuration is made, likely no need to backport. Closes scylladb/scylladb#28112 * github.com:scylladb/scylladb: test: Validate S3 endpoints new format works docs: Update docs according to new endpoints config option format object_storage: Create s3 client with "extended" endpoint name s3/storage: Tune config updating sstable: Shuffle args for s3_client_wrapper test: Rename badconf variable into objconf test: Split the object_store/test_get_object_store_endpoints test	2026-01-14 18:29:03 +02:00
Pavel Emelyanov	e57ee84662	util: Re-use seastar::util::memory_data_sink A data_sink that stores buffers into an in-memory collection had appeared in seastar recently. In Scylla there's similar thing that uses memory_data_sink_buffer as a container, so it's possible to drop the data_sink_impl iself in favor of seastar implementation. For that to work there should be append_buffers() overload for the aforementioned container. For its nice implementation the container, in turn, needs to get push_back() method and value_type trait. The method already exists, but is called put(), so just rename it. There's one more user of it this method in S3 client, and it can enjoy the added append_buffers() helper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#28124	2026-01-14 08:54:00 +02:00
Pavel Emelyanov	f227de24b2	object_storage: Create s3 client with "extended" endpoint name For this, add the s3::client::make(endpoint, ...) overload that accepts endpoint in proto://host:port format. Then it parses the provided url and calls the legacy one, that accepts raw host string and config with port, https bit, etc. The generic object_storage_endpoint_param no longer needs to carry the internal s3::endpoint_config, the config option parsing changes respectively. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:24:06 +03:00
Pavel Emelyanov	8f97e6b3de	s3/storage: Tune config updating Don't prepare s3::endpoint_config from generic code, jut pass the region and iam_role_arn (those that can potentially change) to the callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-01-13 13:24:06 +03:00
Avi Kivity	0df85c8ae8	Revert "Merge 'Unify configuration of object storage endpoints' from Pavel Emelyanov" This reverts commit `1bb897c7ca`, reversing changes made to `954f2cbd2f`. It makes incompatible changes to the object storage configuration format, breaking tests [1]. It's likely that it doesn't break any production configuration, but we can't be sure. Fixes #27966 Closes scylladb/scylladb#27969	2026-01-05 08:53:41 +02:00
Pavel Emelyanov	a3ca4fccef	object_storage: Create s3 client with "extended" endpoint name For this, add the s3::client::make(endpoint, ...) overload that accepts endpoint in proto://host:port format. Then it parses the provided url and calls the legacy one, that accepts raw host string and config with port, https bit, etc. The generic object_storage_endpoint_param no longer needs to carry the internal s3::endpoint_config, the config option parsing changes respectively. Tests, that generate the config files, and docs are updated. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-12-10 15:33:47 +03:00
Pavel Emelyanov	932b008107	s3/storage: Tune config updating Don't prepare s3::endpoint_config from generic code, jut pass the region and iam_role_arn (those that can potentially change) to the callback. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-12-10 15:33:46 +03:00
Ernest Zaslavsky	e8ce49dadf	s3_client: remove unnecessary `co_await` in `make_request` Eliminates a redundant `co_await` by directly returning the `future`, simplifying the control flow without affecting behavior.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	d44bbb1b10	s3_client: remove unused `filler_exception` Eliminate the now-obsolete `filler_exception`, rendered redundant by earlier refactors that streamlined error handling in the S3 client.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	d3c6338de6	s3_client: fix indentation Fix indentation in background download fiber in `chunked_download_source`	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	47704deb1e	s3_client: simplify chunked download error handling using `make_request` Refactor `chunked_download_source` to eliminate redundant exception handling by leveraging the new `make_request` override with custom retry strategy. This streamlines the download fiber logic, improving readability and maintainability.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	2bc9b205b6	s3_client: reformat `make_request` functions for readability Reformats `make_request` functions with long argument lists to improve readability and comply with formatting guidelines.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	bf39412f4a	s3_client: eliminate duplication in `make_request` by using overload Removes redundant code in the `make_request` function by invoking the appropriate overload, simplifying logic and improving maintainability.	2025-10-23 15:58:11 +03:00
Ernest Zaslavsky	3d51124cb0	s3_client: add `make_request` override with custom retry and error handler Introduce an override for `make_request` in `s3_client` to support custom retry strategies and error handlers, enabling flexibility beyond the default client behavior and improving control over request handling	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	bdb3979456	s3_client: migrate s3_client to Seastar HTTP client Eliminate use of `retryable_http_client` in `s3_client` and adopt Seastar's native HTTP client.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	2025760e75	s3_client: fix crash in `copy_s3_object` due to dangling stream In the `copy_part` method, move the `input_stream<char>` argument into a local variable before use. Failing to do so can lead to a SIGSEGV or trigger an abort under address sanitizer.	2025-10-23 15:58:10 +03:00
Ernest Zaslavsky	0983c791e9	s3_client: coroutinize `copy_s3_object` response callback coroutinize `copy_s3_object` response callback for a bugfix in the following commit to prevent failing on dangling stream	2025-10-23 15:58:10 +03:00
Avi Kivity	ab488fbb3f	Merge 'Switch to seastar API level 9 (no more packet-s in output_stream/data_sink API)' from Pavel Emelyanov Other than patching Scylla sinks to implement new data_sink_impl::put(std::span<temporary_buffer>) overload, the PR changes transport write_response() method to stop using output_stream::write(scattered_message) because it's also gone. Using newer seastar API, no need to backport Closes scylladb/scylladb#26592 * github.com:scylladb/scylladb: code: Fix indentation after previous patch code: Switch to seastar API level 9 transport: Open-code invoke_with_counting into counting_data_sink::put transport: Don't use scattered_message utils: Implement memory_data_sink::put(net::packet)	2025-10-22 01:51:43 +03:00
Ernest Zaslavsky	fdd0d66f6e	s3_client: tune logging level Change all logging related to errors in `chunked_download_source` background download fiber to `info` to make it visible right away in logs.	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	4497325cd6	s3_client: add logging Add logging for the case when we encounter expired credentials, shouldnt happen but just in case	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	1d34657b14	s3_client: improve exception handling for chunked downloads Refactor the wrapping exception used in `chunked_download_source` to prevent the retry strategy from reattempting failed requests. The new implementation preserves the original `exception_ptr`, making the root cause clearer and easier to diagnose.	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	58a1cff3db	s3_client: fix indentation Reformat `client::make_request` to fix the indentation of `if` block	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	43acc0d9b9	s3_client: add max for client level retries To prevent client retrying indefinitely time skew and authentication errors add `max_attempts` to the `client::make_request`	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	116823a6bc	s3_client: remove `s3_retry_strategy` It never worked as intended, so the credentials handling is moving to the same place where we handle time skew, since we have to reauthenticate the request	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	185d5cd0c6	s3_client: support high-level request retries Add an option to retry S3 requests at the highest level, including reinitializing headers and reauthenticating. This addresses cases where retrying the same request fails, such as when the S3 server rejects a timestamp older than 15 minutes.	2025-10-20 17:12:59 +03:00
Ernest Zaslavsky	db1ca8d011	s3_client: just reformat `make_request` Just reformat previously changed methods to improve readability	2025-10-20 10:44:37 +03:00
Pavel Emelyanov	a88a36f5b5	code: Switch to seastar API level 9 In the new API the biggest change is to implement the only data_sink_impl::put(span<temporary_buffer>) overload. Encrypted file impl and sstables compress sink use fallback_put() helper that generates a chain of continuations each holding a buffer. The counting_data_sink in transport had mostly been patched to correct implementation by the previous patch, the change here is to replace vector argument with span one. Most other sinks just re-implement their put(vector<temporary_buffer>) overload by iterating over span and non-preemptively grabbing buffers from it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-10-17 10:26:50 +03:00
Ernest Zaslavsky	55fb2223b6	s3_client: unify `make_request` implementation Refactor `make_request` to use a single core implementation that handles authentication and issues the HTTP request. All overloads now delegate to this unified method.	2025-10-16 15:51:28 +03:00
Ernest Zaslavsky	413739824f	s3_client: track memory starvation in background filling fiber Introduce a counter metric to monitor instances where the background filling fiber is blocked due to insufficient memory in the S3 client. Closes scylladb/scylladb#26466	2025-10-14 11:22:54 +03:00
Ernest Zaslavsky	c2bab430d7	s3_client: fix `when` condition to prevent infinite locking Refine condition variable predicate in filling fiber to avoid indefinite waiting when `close` is invoked. Closes scylladb/scylladb#26449	2025-10-09 15:55:37 +03:00
Botond Dénes	1ac7b4c35e	treewide: move away from accessing httpd::request::query_parameters Acecssing this member directly is deprecated, migrate code to use {get,set}_query_param() and friends instead. Fixes: https://github.com/scylladb/scylladb/issues/26023	2025-09-24 11:52:15 +03:00
Pavel Emelyanov	6fb66b796a	s3: Add metrics to show S3 prefetch bytes The chunked download source sends large GET requests and then consumes data as it arrives. Sometimes it can stop reading from socket early and drop the in-flight data. The existing read-bytes metrics show only the number of consumed bytes, we we also want to know the number of requested bytes Refs #25770 (accounting of read-bytes) Fixes #25876 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25877	2025-09-16 23:40:47 +03:00
Pavel Emelyanov	9deea3655f	s3: Fix chunked download source metrics calculations In S3 client both read and write metrics have three counters -- number of requests made, number of bytes processed and request latency. In most of the cases all three counters are updated at once -- upon response arrival. However, in case of chunked download source this way of accounting metrics is misleading. In this code the request is made once, and then the obtained bytes are consumed eventually as the data arrive. Currently, each time a new portion of data is read from the socket the number of read requests is incremented. That's wrong, the request is made once, and this counter should also be incremented once, not for every data buffer that arrived in response. Same for read request latency -- it's "added" for every data buffer that arrives, but it's a lenghy process, the _request_ latency should be accounted once per responce. Maybe later we'll want to have "data latency" metrics as well, but for what we have now it's request latency. The number of read bytes is accounted properly, so not touched here. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#25770	2025-09-08 09:49:03 +03:00
Ernest Zaslavsky	a0016bd0cc	s3_client: relocate `req` creation closer to usage Move the creation of the `req` object to the point where it is actually used, improving code clarity and reducing premature initialization.	2025-08-14 16:18:43 +03:00
Ernest Zaslavsky	6ef2b0b510	s3_client: reformat long logging lines for readability Break up excessively long logging statements to improve readability and maintain consistent formatting across the codebase.	2025-08-14 16:18:43 +03:00
Ernest Zaslavsky	dd51e50f60	s3_client: add memory fallback in `chunked_download_source` Introduce fallback logic in `chunked_download_source` to handle memory exhaustion. When memory is low, feed the `deque` with only one uncounted buffer at a time. This allows slow but steady progress without getting stuck on the memory semaphore. Fixes: https://github.com/scylladb/scylladb/issues/25453 Fixes: https://github.com/scylladb/scylladb/issues/25262 Closes scylladb/scylladb#25452	2025-08-14 09:52:10 +03:00
Ernest Zaslavsky	380c73ca03	s3_client: make memory semaphore acquisition abortable Add `abort_source` to the `get_units` call for the memory semaphore in the S3 client, allowing the acquisition process to be aborted. Fixes: https://github.com/scylladb/scylladb/issues/25454 Closes scylladb/scylladb#25469	2025-08-13 08:48:55 +03:00
Ernest Zaslavsky	fc2c9dd290	s3_client: Disable Seastar-level retries in HTTP client creation Prevent Seastar from retrying HTTP requests to avoid buffer double-feed issues when an entire request is retried. This could cause data corruption in `chunked_download_source`. The change is global for every instance of `s3_client`, but it is still safe because: * Seastar's `http_client` resets connections regardless of retry behavior * `s3_client` retry logic handles all error types—exceptions, HTTP errors, and AWS-specific errors—via `http_retryable_client`	2025-07-21 17:03:23 +03:00
Ernest Zaslavsky	ba910b29ce	s3_test: Validate handling of non-`aws_error` exceptions Inject exceptions not wrapped in `aws_error` from request callback lambda to verify they are properly caught and handled.	2025-07-21 16:52:43 +03:00
Ernest Zaslavsky	b7ae6507cd	s3_client: Improve error handling in chunked_download_source Create aws_error from raised exceptions when possible and respond appropriately. Previously, non-aws_exception types leaked from the request handler and were treated as non-retryable, causing potential data corruption during download.	2025-07-21 16:49:47 +03:00
Ernest Zaslavsky	342e94261f	s3_client: parse multipart response XML defensively Ensure robust handling of XML responses when initiating multipart uploads. Check for the existence of required nodes before access, and throw an exception if the XML is empty or malformed. Refs: https://github.com/scylladb/scylladb/issues/24676 Closes scylladb/scylladb#24990	2025-07-17 10:55:04 +03:00
Ernest Zaslavsky	acf15eba8e	s3_test: Add s3_client test for non-retryable error handling Introduce a test that injects a non-retryable error and verifies that the chunked download source throws an exception as expected.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	49e8c14a86	s3_client: Fix edge case when the range is exhausted Handle case where the download loop exits after consuming all data, but before receiving an empty buffer signaling EOF. Without this, the next request is sent with a non-zero offset and zero length, resulting in "Range request cannot be satisfied" errors. Now, an empty buffer is pushed to indicate completion and exit the fiber properly.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	e50f247bf1	s3_client: Fix indentation in try..catch block Correct indentation in the `try..catch` block to improve code readability and maintain consistent formatting.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	d2d69cbc8c	s3_client: Stop retries in chunked download source Disable retries for S3 requests in the chunked download source to prevent duplicate chunks from corrupting the buffer queue. The response handler now throws an exception to bypass the retry strategy, allowing the next range to be attempted cleanly. This exception is only triggered for retryable errors; unretryable ones immediately halt further requests.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	6d9cec558a	s3_client: Fix missing negation Restore a missing `not` in a conditional check that caused incorrect behavior during S3 client execution.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	e73b83e039	s3_client: Refine logging Fix typo in log message to improve clarity and accuracy during S3 operations.	2025-07-01 18:45:17 +03:00
Ernest Zaslavsky	f1d0690194	s3_client: Improve logging placement for current_range output Relocated logging to occur after determining the `current_range`, ensuring more relevant output during S3 client operations.	2025-07-01 18:45:17 +03:00
Pavel Emelyanov	dc166be663	s3: Mark claimed_buffer constructor noexcept It just std::move-s a buffer and a semaphore_units objects, both moves are noexcept, so is the constructor itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24552	2025-06-18 20:36:45 +03:00
Pavel Emelyanov	b0766d1e73	Merge 's3_client: Refactor `range` class for state validation' from Ernest Zaslavsky Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3. This should address and prevent future problems related to this issue https://github.com/minio/minio/issues/21333 No backport needed since this problem related only to this change https://github.com/scylladb/scylladb/pull/23880 Closes scylladb/scylladb#24312 * github.com:scylladb/scylladb: s3_client: headers cleanup s3_client: Refactor `range` class for state validation	2025-06-17 10:34:55 +03:00

1 2 3 4

179 Commits