Commit Graph

45 Commits

Author SHA1 Message Date
Ernest Zaslavsky
a369dda049 s3_client: implement S3 copy object
Add support for the CopyObject API to enable direct copying of S3
objects between locations. This approach eliminates networking
overhead on the client side, as the operation is handled internally
by S3.
2025-04-17 09:47:47 +03:00
Pavel Emelyanov
1f301b1c5d s3/client: Introduce data_source_impl for object downloading
The new data source implementation runs a single GET for the whole range
specified and lends the body input_stream for the upper input_stream's
get()-s. Eventually, getting the data from the body stream EOFs or
fails. In either case, the existing body is closed and a new GET is
spawn with the updater Range header so that not to include the bytes
read so far.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-21 12:01:06 +03:00
Ernest Zaslavsky
367140a9c5 s3_client: Start using new retry strategy
* Previously, token expiration was considered a fatal error. With this change,
the `s3_client` uses new retry strategy that is trying to renew expired
creds
* Added related test to the `s3_proxy`
2025-03-17 16:38:14 +02:00
Ernest Zaslavsky
7c49ee4520 s3_client: enhance retryable_http_client functionality
Enhanced `retryable_http_client` by allowing the injection of a custom error handler through its constructor.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
b589a882bb s3_client: isolate retryable_http_client
Relocated `retryable_http_client` into its own dedicated file for improved clarity and maintainability.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
5eff83af95 s3_client: Prepare for retryable_http_client relocation
Expose `map_s3_client_exception` outside the S3 client class to facilitate moving `retryable_http_client` to a separate file.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
5b7d4a4136 s3_client: Move retryable functionality out of s3 client
This commit moves the retryable HTTP client functionality out of the S3 client implementation. Since this functionality is also required for other services, such as AWS STS, it has been separated to ensure broader applicability.
2025-03-10 09:01:47 +02:00
Ernest Zaslavsky
d534051bea aws creds: add env. and file credentials providers
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
2025-02-05 14:57:19 +02:00
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Pavel Emelyanov
bb094cc099 Merge 'Make restore task abortable' from Calle Wilund
Fixes #20717

Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.

Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".

Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.

There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.

Closes scylladb/scylladb#21567

* github.com:scylladb/scylladb:
  test_backup: Add restore abort test case
  sstables_loader: Make restore task abortable
  distributed_loader: Add optional abort_source to get_sstables_from_object_store
  s3_storage: Add optional abort_source to params/object
  s3::client: Make "readable_file" abortable
2024-12-19 12:23:33 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Calle Wilund
af4dd1f2cb s3::client: Make "readable_file" abortable
Adds optional abortable source to "readable_file" interface.
Note: the abortable aspect is not preserved across a "dup()" call
however, since these objects are generally not used in a cross-shard
fashion, it should be ok.
2024-12-02 12:30:24 +00:00
Ernest Zaslavsky
dc6e4c0d97 client: Add retries
Add retries to the s3 client, all retries are coordinated by an instance of `retry_strategy`. In a case of error also parse response body in attempt to retrieve additional and more focused error information as suggested by AWS. See https://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html.

Also move the expected http status check to the `make_s3_error_handler` since the http::client::make_request call is done with `nullopt` - we want to manage all the aws errors handling in s3 client to prevent the http client to validate it and fail before we have a chance to analyze the error properly
2024-11-07 21:01:25 +02:00
Calle Wilund
3321820c67 s3::client: Make operations (individually) abortable
Refs #20716

Adds optional abort_source to all s3 client operations. If provided, will
propagate to actual HTTP client and allow for aborting actual net op.

Note: this uses an abort source per call, not a client-local one.
This is for two reasons:

1.) The usage pattern of the client object is to create it outside the
    eventual owning object (task) that hosts the relevant abort source
2.) It is quite possible to want to have different/no abort source for
    some operation usage.
2024-11-05 14:23:24 +00:00
Pavel Emelyanov
51e03b1025 s3/client: Introduce upload_progress
This is a structure with "total" and "uploaded" counters that's passed
by user to client::upload_file() method so that client would update it
with the progress.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-29 08:38:39 +03:00
Pavel Emelyanov
f9a5e02b53 s3: Extract client_fwd.hh
This is to export some simple structures to users without the need to
include client.hh itself (rather large already)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-29 08:38:39 +03:00
Pavel Emelyanov
14b741afc9 s3/client: Split upload_sink_base class into two
This class implements two facilities -- multipart upload protocol itself
plus some common parts of upload_sink_impl (in fact -- only close() and
plugs put(packet)).

This patch aplits those two facilities into two classes. One of them
will be re-used later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-09-12 18:00:19 +03:00
Pavel Emelyanov
86bc5b11fe s3-client: Add support for lister::filter
Directory lister comes with a filter function that tells lister which
entries to skip by its .get() method. For uniformity, add the same to
S3 bucket_lister.

After this change the lister reports shorter name in the returned
directory entry (with the prefix cut), so also need to tune up the unit
test respectively.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:40 +03:00
Pavel Emelyanov
113d2449f8 utils: Introduce abstract (directory) lister
This patch hides directory_lister and bucket_lister behind a common
facade. The intention is to provide a uniform API for sstable_directory
that it could use to list sstables' components wherever they are.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:40 +03:00
Pavel Emelyanov
a02e65c649 s3_client: Add bucket lister
The lister resembles the directory_lister from util -- it returns
entries upon its .get() invocation, and should be .close()d at the end.

Internally the lister issues ListObjectsV2 request with provided prefix
and limits the server with the amount of entries returned not to consume
too much local memory (we don't have streaming XML parser for response).
If the result is indeed truncated, the subsequent calls include the
continuation token as per [1]

[1] https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListObjectsV2.html

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 21:15:43 +03:00
Kefu Chai
061def001d s3/client: add client::upload_file()
this member function prepares for the backup feature, where the
object to be stored in the object storage is already persisted as a
file on local filesystem. this brings us two benefits:

- with the file, we don't need to accumulate the payloads in memory
  and send them in batch, as we do in upload_sink and in
  upload_jumbo_sink. this puts less pressure on the memory subsystem.
- with the file, we can read multiple parts in parallel if multpart
  upload applies to it, this helps to improve the throughput.

so, this new helper is introduced to help upload an sstable from local
filesystem to the object storage.

Fixes #16287
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-23 14:39:30 +08:00
Patryk Wrobel
a89e3d10af code-cleanup: add missing header guards
The following command had been executed to get the
list of headers that did not contain '#pragma once':
'grep -rnw . -e "#pragma once" --include *.hh -L'

This change adds missing include guard to headers
that did not contain any guard.

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#19626
2024-07-09 18:31:35 +03:00
Pavel Emelyanov
fc5306c5e8 s3::client: Track memory in client uploads
When uploading an object part, client spawns a background fiber that
keeps the buffers with data on the http request's write_body() lambda
capture. This generates unbound usage of memory with uploaded buffers
which is not nice. Even though s3 client is limited with http's client
max-connections parallelism, waiting for the available connection still
happens with buffers held in memory.

This patch makes the client claim the background memory from the
provided semaphore (which, in turn, sits on the shard-wide storage
manager instance). Once body writing is complete, the claimed units are
returned back to the semaphore allowing for more background writes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-20 17:50:29 +03:00
Pavel Emelyanov
b299757884 s3::client: Construct client with shared semaphore
The semaphore will be used to cap memory consumption by client. This
patch makes sure the reference to a semaphore exists as an argument to
client's constructor, not more than that.

In scylla binary, the semaphore sits on storage_manager. In tests the
semaphore is some local object. For now the semaphore is unused and is
initialized locked as this patch just pushes the needed argument all the
way around, next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-20 17:50:07 +03:00
Pavel Emelyanov
308db51306 s3/client: Add IO stats metrics
These metrics mimic the existing IO ones -- total number of read
operation, total number of read bytes and total read delay. And the same
for writing.

This patch makes no difference between wrting object with plain PUT vs
putting it with multipart uploading. Instead, it "measures" individual
IO writes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
91235a84cd s3/client: Add HTTP client metrics
Currently an http client has several exported "numbers" regarding the
number of transport connections the client uses. This patch exports
those via S3 client's per-sched-group metrics and prepares the ground
for more metrics in next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
08a12cd4a6 s3/client: Split make_request()
There will appear another make_request() helper that'll do mostly the
same. This split will help to avoid code duplication

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
4b548dd240 s3/client: Wrap http client with struct group_client
The http-client is per-sched-group. Next patch will need to keep metrics
per-sched-group too and this sched-group -> http-client map is the good
place to put them on. Wrapping struct will allow extending it with
metrics

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Pavel Emelyanov
627c1932e4 s3/client: Move client::stats to namespace scope
The stats is stats about object, not about client, so it's better if it
lives in namespace scope. Also it will avoid conflicts with client stats
that will be reported as metrics (later patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Kefu Chai
ef78b31b43 s3/client: add tagging ops
with tagging ops, we will be able to attach kv pairs to an object.
this will allow us to mark sstable components with taggings, and
filter them based on them.

* test/pylib/minio_server.py: enable anonymous user to perform
  more actions. because the tagging related ops are not enabled by
  "mc anonymous set public", we have to enable them using "set-json"
  subcommand.
* utils/s3/client: add methods to manipulate taggings.
* test/boost/s3_test: add a simple test accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14486
2023-07-11 09:30:46 +03:00
Pavel Emelyanov
81d1bfce2a s3/client: Maintain several http clients on-board
The intent is to isolate workloads from different sched groups from each
other and not let one sched group consume all sockets from the http
client thus affecting requests made by other sched groups.

The conention happens in the maximim number of socket an http client may
have (see scylladb/seastar#1652). If requests take time and client is
asked to make more and more it will eventually stop spawning new
connections and would get blocked internally waiting for running
requests to complete and put a socket back to pool. If a sched group
workload (e.g. -- memtable flush) consumes all the available sockets
then workload from another group (e.g. -- query) would be blocked thus
spoiling its latency (which is poor on its own, but still)

After this change S3 client maintains a sched_group:http_client map
thus making sure different sched groups don't clash with each other so
that e.g. query requests don't wait for flush/compaction to release a
socket.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-08 18:28:55 +03:00
Pavel Emelyanov
b9ee0d385b s3/client: Add make_request() method
This helper call will serve several purposes.

First, make necessary preparations to the request before making, in
particular -- calling authorize()

Second, there's the need to re-make requests that failed with
"connection closed" error (see #13736)

Third, one S3 client is shared between different scheduling groups. In
order to isolate groups' workload from each other different http clients
should be used, and this helper will be in change of selecting one

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-06-08 18:19:19 +03:00
Pavel Emelyanov
f9686926c2 c3/client: Implement jumbo upload sink
The sink is also in charge of uploading large objects in parts, but this
time each part is put with the help of upload-part-copy API call, not
the regular upload-part one.

To make it work the new sink inherits from the uploading base class, but
instead of keeping memory_data_sink_buffers with parts it keeps a sink
to upload a temporary intermediate object with parts. When the object is
"full", i.e. the number of parts in it hits the limit, the object is
flushed, then copied into the target object with the S3 API call, then
deletes the intermediate object.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:23:18 +03:00
Pavel Emelyanov
a88629227f s3/client: Rename upload_sink -> upload_sink_base
There will appear another sink that would implement multipart upload
with the help of copy-part functionality. Current uploading code is
going to be partially re-used, so this patch moves all of it into the
base class in advance. Next patches will pick needed parts.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:19:50 +03:00
Pavel Emelyanov
613acba5d0 s3: Pick client from manager via handle
Add the global-factory onto the client that is

- cross-shard copyable
- generates a client from local storage_manager by given endpoint

With that the s3 file handle is fixed and also picks up shared s3
clients from the storage manager instead of creating its own one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
8ed9716f59 s3: Generalize s3 file handle
Currently the s3 file handle tries to carry client's info via explicit
host name and endpoint config pointer. This is buggy, the latter pointer
is shard-local can cannot be transferred across shards.

This patch prepares the fix by abstracting the client handle part.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Pavel Emelyanov
63ff6744d8 s3: Live-update clients' configs
Now when the client is accessible directli via the storage_manager, when
the latter is requested to update its endpoint config, it can kick the
client to do the same.

The latter, in turn, can only update the AWS creds info for now. The
endpoint port and https usage are immutable for now.

Also, updating the endpoint address is not possible, but for another
reason -- the endpoint itself is the part of keyspace configuration and
updating one in the object_storage.yaml will have no effect on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-11 19:39:01 +03:00
Raphael S. Carvalho
57661f0392 s3: Introduce get_object_stats()
get_object_stats() will be used for retrieving content size and
also last modified.

The latter is required for filling st_mtim, etc, in the
s3::client::readable_file::stat() method.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:51:10 -03:00
Raphael S. Carvalho
da2ccc44a4 s3: introduce get_object_header()
This allows other functions to reuse the code to retrieve the
object header.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:49:52 -03:00
Pavel Emelyanov
98b9c205bb s3/client: Sign requests if configured
If the endpoint config specifies AWS key, secret and region, all the
S3 requests get signed. Signature should have all the x-amz-... headers
included and should contain at least three of them. This patch includes
x-ams-date, x-amz-content-sha256 and host headers into the signing list.
The content can be unsigned when sent over HTTPS, this is what this
patch does.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:37 +03:00
Pavel Emelyanov
85f06ca556 s3/client: Construct it with config
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
caf9e357c8 s3/client: Construct it with sstring endpoint
Currently the client is constructed with socket_address which's prepared
by the caller from the endpoint string. That's not flexible engouh,
because s3 client needs to know the original endpoint string for two
reasons.

First, it needs to lookup endpoint config for potential AWS creds.
Second, it needs this exact value as Host: header in its http requests.

So this patch just relaxes the client constructor to accept the endpoint
string and hard-code the 9000 port. The latter is temporary, this is how
local tests' minio is started, but next patch will make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
033fa107f8 utils: Add S3 readable file impl for random reads
Sometimes an sstable is used for random read, sometimes -- for streamed
read using the input stream. For both cases the file API can be
provided, because S3 API allows random reads of arbitrary lengths.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
a4a64149a6 utils: Add S3 data sink for multipart upload
Putting a large object into S3 using plain PUT is bad choice -- one need
to collect the whole object in memory, then send it as a content-length
request with plain body. Less memory stress is by using multipart
upload, but multipart upload has its limitation -- each part should be
at least 5Mb in size. For that reason using file API doesn't work --
file IO API operates with external memory buffers and the file impl
would only have raw pointers to it. In order to collect 5Mb of chunk in
RAM the impl would have to copy the memory which is not good. Unlike the
file API data_sink API is more flexible, as it has temporary buffers at
hand and can cache them in zero-copy manner.

Having sad that, the S3 data_sink implementation is like this:

* put(buffer):
  move the buffer into local cache, once the local cache grows above 5Mb
  send out the part

* flush:
  send out whatever is in cache, then send upload completion request

* close:
  check that the upload finihsed (in flush), abort the upload otherwise

User of the API may (actually should) wrap the sink with output_stream
and use it as any other output_stream.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
3745b5c715 utils: Add S3 client with basic ops
Those include -- HEAD to get size, PUT to upload object in one go, GET
to read the object as contigious buffer and DELETE to drop one.

The client uses http client from seastar and just implements the S3
protocol using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00