Add a test demonstrating that renewing credentials does not update
their expiration. After requesting credentials again, the expiration
remains unchanged, indicating no actual update occurred.
Disable retries for S3 requests in the chunked download source to
prevent duplicate chunks from corrupting the buffer queue. The
response handler now throws an exception to bypass the retry
strategy, allowing the next range to be attempted cleanly.
This exception is only triggered for retryable errors; unretryable
ones immediately halt further requests.
Revamped the `range` class to actively manage its state by enforcing validation on all modifications. This prevents overflow, invalid states, and ensures the object size does not exceed the 5TiB limit in S3.
The existing `download_source` implementation optimizes performance
by keeping the connection to S3 open and draining data directly from
the socket. While this eliminates the overhead (60-100ms) of repeatedly
establishing new connections, it leads to rapid exhaustion of client-
side connections.
On a single shard, two `mx_readers` for load and stream are enough to
trigger this issue. Since each client typically holds two connections,
readers keeping index and data sources open can cause deadlocks where
processes stall due to unavailable connections.
Introduce `chunked_download_source`, a new S3 download method built on
`download_source`, to dynamically manage connections:
- Buffers data in 5MiB chunks using a producer-consumer model
- Closes connections once buffers reach capacity, returning them to
the pool for other clients
- Uses a filling fiber that resumes fetching once buffers are
consumed from the queue
Performance remains comparable to `download_source`, achieving
95MiB/s for sequential 1GiB downloads from S3. However, preloading
large chunks may cause read amplification.
Fixes: https://github.com/scylladb/scylladb/issues/23785Closesscylladb/scylladb#23880
Refactored the copy object test to enhance readability and maintainability.
The test was simplified and split into smaller, more focused parts.
Additionally, a "proxied" variant of the test was introduced to expand
coverage.
Add support for the CopyObject API to enable direct copying of S3
objects between locations. This approach eliminates networking
overhead on the client side, as the operation is handled internally
by S3.
Added utility functions to handle S3 Fully Qualified Names (FQN). These
functions enable parsing, splitting, and identification of S3 paths,
enhancing our ability to work with S3 object storage more effectively.
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#22951
This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
Add variants of existing S3 tests that route through a proxy instead of connecting directly to MinIO. The proxy allows injecting errors to validate error handling and recovery mechanisms under failure conditions.
Switch `s3_test` to use the S3 proxy which is used to randomly inject retryable S3 errors to test the "retry" part of the S3 client.
Fix `put_object` to make it retryable
Directory lister comes with a filter function that tells lister which
entries to skip by its .get() method. For uniformity, add the same to
S3 bucket_lister.
After this change the lister reports shorter name in the returned
directory entry (with the prefix cut), so also need to tune up the unit
test respectively.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
this member function prepares for the backup feature, where the
object to be stored in the object storage is already persisted as a
file on local filesystem. this brings us two benefits:
- with the file, we don't need to accumulate the payloads in memory
and send them in batch, as we do in upload_sink and in
upload_jumbo_sink. this puts less pressure on the memory subsystem.
- with the file, we can read multiple parts in parallel if multpart
upload applies to it, this helps to improve the throughput.
so, this new helper is introduced to help upload an sstable from local
filesystem to the object storage.
Fixes#16287
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
There's a test case the validates uploading sink by getting random
portions of the uploaded object. The portions are generated as
len = random % chunk_size
off = random % file_size - len
The latter may apparently render negative value which will translate
into huuuuge 64-bit range offset which, in turn, would result in invalid
http range specifier and getting object part fails with status OK
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.
That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.
The translation is, for now, pretty simple:
- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test case creates non-jumbo upload simk and puts some bytes into it,
then flushes. In order to make sure the fallback did took place the
multipar memory tracker sempahore is broken in advance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
- s/aws_key/aws_access_key_id/
- s/aws_secret/aws_secret_access_key/
- s/aws_token/aws_session_token/
rename them to more popular names, these names are also used by
boto's API. this should improve the readability and consistency.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any.
Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not.
This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use.
Closesscylladb/scylladb#15497
* github.com:scylladb/scylladb:
s3::client: Track memory in client uploads
code: Configure s3 clients' memory usage
s3::client: Construct client with shared semaphore
sstables::storage_manager: Introduce config
when accessing AWS resources, uses are allowed to long-term security
credentials, they can also the temporary credentials. but if the latter
are used, we have to pass a session token along with the keys.
see also https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
so, if we want to programatically get authenticated, we need to
set the "x-amz-security-token" header,
see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#UsingTemporarySecurityCredentials
so, in this change, we
1. add another member named `token` in `s3::endpoint_config::aws_config`
for storing "AWS_SESSION_TOKEN".
2. populate the setting from "object_storage.yaml" and
"$AWS_SESSION_TOKEN" environment variable.
3. set "x-amz-security-token" header if
`s3::endpoint_config::aws_config::token` is not empty.
this should allow us to test s3 client and s3 object store backend
with S3 bucket, with the temporary credentials.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15486
This sets the real limits on the memory semaphore.
- scylla sets it to 1% of total memory, 10Mb min, 100Mb max
- tests set it to 16Mb
- perf test sets it to all available memory
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The semaphore will be used to cap memory consumption by client. This
patch makes sure the reference to a semaphore exists as an argument to
client's constructor, not more than that.
In scylla binary, the semaphore sits on storage_manager. In tests the
semaphore is some local object. For now the semaphore is unused and is
initialized locked as this patch just pushes the needed argument all the
way around, next patches will make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The added metrics include:
- http client metrics, which include the number of connections, the number of active connections and the number of new connections made so far
- IO metrics that mimic those for traditional IO -- total number of object read/write ops, total number of get/put/uploaded bytes and individual IO request delay (round-trip, including body transfer time)
fixes: #13369Closes#14494
* github.com:scylladb/scylladb:
s3/client: Add IO stats metrics
s3/client: Add HTTP client metrics
s3/client: Split make_request()
s3/client: Wrap http client with struct group_client
s3/client: Move client::stats to namespace scope
s3/client: Keep part size local variable
Now when the keys and region can be configured with "standard"
environment variables, the old custom one can be removed. No automation
uses that it was purely a support for manual testing of a client against
AWS's S3 server
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently minio applies anonymous public policy for the test bucket and
all tests just use unsigned S3 requests. This patch generates a policy
for the temporary minio user and removes the anon public one. All tests
are updated respectively to use the provided key:secret pair.
The use-https bit is off by default as minio still starts with plain
http. That's OK for now, all tests are local and have no secret data
anyway
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The bucket is going to stop being public, rename the env variable in
advance to make the essential patch smaller
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The stats is stats about object, not about client, so it's better if it
lives in namespace scope. Also it will avoid conflicts with client stats
that will be reported as metrics (later patch)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
let's use RAII to remove the object use as a fixture, so we don't
leave some object in the bucket for testing. this might interfere
with other tests which share the same minio server with the test
which fails to do its clean up if an exception is thrown.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
let's use RAII to tear down the client and the input file, so we can
always perform the cleanups even if the test throws.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
test.py with --x-log2-compaction-groups option rotted a little bit.
Some boost tests added later didn't use the correct header which
parses the option or they didn't adjust suite.yaml.
Perhaps it's time to set up a weekly (or bi-weekly) job to verify
there are no regressions with it. It's important as it stresses
the data plane for tablets reusing the existing tests available.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closes#14732
with tagging ops, we will be able to attach kv pairs to an object.
this will allow us to mark sstable components with taggings, and
filter them based on them.
* test/pylib/minio_server.py: enable anonymous user to perform
more actions. because the tagging related ops are not enabled by
"mc anonymous set public", we have to enable them using "set-json"
subcommand.
* utils/s3/client: add methods to manipulate taggings.
* test/boost/s3_test: add a simple test accordingly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14486
* seastar afe39231...99d28ff0 (16):
> file/util: Include seastar.hh
> http/exception: Use http::reply explicitly
> http/client: Include lost condition-variable.hh
> util: file: drop unnecessary include of reactor.hh
> tests: perf: add a markdown printer
> http/client: Introduce unexpected_status_error for client requests
> sharded: avoid #include <seastar/core/reactor.hh> for run_in_background()
> code: Use std::is_invocable_r_v instead of InvokeReturns
> http/client: Add ability to change pool size on the fly
> http/client: Add getters for active/idle connections counts
> http/client: Count and limit the number of connections
> http/client: Add connection->client RAII backref
> build: use the user-specified compiler when building DPDK
> build: use proper toolchain based on specified compiler
> build: only pass CMAKE_C_COMPILER when building ingredients
> build: use specified compiler when building liburing
Two changes are folded into the commit:
1. missing seastar/core/coroutine.hh include in one .cc file that
got it indirectly included before seastar reactor.hh drop from
file.hh
2. http client now returns unexpected_status_error instead of
std::runtime_error, so s3 test is updated respectively
Closes#14168
Currently the test uses a sequence of 1024-bytes buffers. This lets
minio server actively de-duplicate those blocks by page boundary (it's a
guess, but it it's truish because minio reports back equivalent ETags
for lots of uploading parts). Make the buffer not be power of two so
that when squashed together the resulting 2^X buffers don't get equal.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>