Commit Graph

38 Commits

Author SHA1 Message Date
Pavel Emelyanov
86bc5b11fe s3-client: Add support for lister::filter
Directory lister comes with a filter function that tells lister which
entries to skip by its .get() method. For uniformity, add the same to
S3 bucket_lister.

After this change the lister reports shorter name in the returned
directory entry (with the prefix cut), so also need to tune up the unit
test respectively.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:40 +03:00
Pavel Emelyanov
05adee4c82 test: Add test for s3::client::bucket_lister
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-13 21:15:43 +03:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Kefu Chai
061def001d s3/client: add client::upload_file()
this member function prepares for the backup feature, where the
object to be stored in the object storage is already persisted as a
file on local filesystem. this brings us two benefits:

- with the file, we don't need to accumulate the payloads in memory
  and send them in batch, as we do in upload_sink and in
  upload_jumbo_sink. this puts less pressure on the memory subsystem.
- with the file, we can read multiple parts in parallel if multpart
  upload applies to it, this helps to improve the throughput.

so, this new helper is introduced to help upload an sstable from local
filesystem to the object storage.

Fixes #16287
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-07-23 14:39:30 +08:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Pavel Emelyanov
76705b6ba2 test/s3: Avoid object range overflow
There's a test case the validates uploading sink by getting random
portions of the uploaded object. The portions are generated as

   len = random % chunk_size
   off = random % file_size - len

The latter may apparently render negative value which will translate
into huuuuge 64-bit range offset which, in turn, would result in invalid
http range specifier and getting object part fails with status OK

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 10:54:54 +03:00
Pavel Emelyanov
c5d85bdf79 s3/client: Don't GET object contents on out-of-bound reads
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.

That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
855626f7de s3/client: Map http exceptions into storage_io_error
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.

The translation is, for now, pretty simple:

- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
caa3e751f7 test: Add s3_client test for upload PUT fallback
The test case creates non-jumbo upload simk and puts some bytes into it,
then flushes. In order to make sure the fallback did took place the
multipar memory tracker sempahore is broken in advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 15:03:53 +03:00
Kefu Chai
f3f31f0c65 main.cc: rename aws option
- s/aws_key/aws_access_key_id/
- s/aws_secret/aws_secret_access_key/
- s/aws_token/aws_session_token/

rename them to more popular names, these names are also used by
boto's API. this should improve the readability and consistency.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-23 14:31:32 +08:00
Kefu Chai
ac3406e537 utils/s3/creds: rename aws_config member variables
- s/key/access_key_id/
- s/secret/secret_access_key/
- s/token/session_token/

so they are more aligned with the AWS document.
for instance, in
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#ConstructingTheAuthenticationHeader
AWSAccessKeyId is used in the "Authorization" header.

this would help with the readability and maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-23 14:28:07 +08:00
Avi Kivity
1da6a939fe Merge 'Track memory usage of S3 object uploads' from Pavel Emelyanov
The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any.

Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not.

This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use.

Closes scylladb/scylladb#15497

* github.com:scylladb/scylladb:
  s3::client: Track memory in client uploads
  code: Configure s3 clients' memory usage
  s3::client: Construct client with shared semaphore
  sstables::storage_manager: Introduce config
2023-09-21 18:24:42 +03:00
Kefu Chai
c364efb998 utils/s3: auth using AWS_SESSION_TOKEN
when accessing AWS resources, uses are allowed to long-term security
credentials, they can also the temporary credentials. but if the latter
are used, we have to pass a session token along with the keys.
see also https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
so, if we want to programatically get authenticated, we need to
set the "x-amz-security-token" header,
see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#UsingTemporarySecurityCredentials

so, in this change, we

1. add another member named `token` in `s3::endpoint_config::aws_config`
   for storing "AWS_SESSION_TOKEN".
2. populate the setting from "object_storage.yaml" and
  "$AWS_SESSION_TOKEN" environment variable.
3. set "x-amz-security-token" header if
   `s3::endpoint_config::aws_config::token` is not empty.

this should allow us to test s3 client and s3 object store backend
with S3 bucket, with the temporary credentials.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15486
2023-09-21 13:26:11 +03:00
Pavel Emelyanov
182a5348d4 code: Configure s3 clients' memory usage
This sets the real limits on the memory semaphore.

- scylla sets it to 1% of total memory, 10Mb min, 100Mb max
- tests set it to 16Mb
- perf test sets it to all available memory

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-20 17:50:29 +03:00
Pavel Emelyanov
b299757884 s3::client: Construct client with shared semaphore
The semaphore will be used to cap memory consumption by client. This
patch makes sure the reference to a semaphore exists as an argument to
client's constructor, not more than that.

In scylla binary, the semaphore sits on storage_manager. In tests the
semaphore is some local object. For now the semaphore is unused and is
initialized locked as this patch just pushes the needed argument all the
way around, next patches will make use of it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-20 17:50:07 +03:00
Botond Dénes
cc16502691 Merge 'Add metrics to S3 client' from Pavel Emelyanov
The added metrics include:

- http client metrics, which include the number of connections, the number of active connections and the number of new connections made so far
- IO metrics that mimic those for traditional IO -- total number of object read/write ops, total number of get/put/uploaded bytes and individual IO request delay (round-trip, including body transfer time)

fixes: #13369

Closes #14494

* github.com:scylladb/scylladb:
  s3/client: Add IO stats metrics
  s3/client: Add HTTP client metrics
  s3/client: Split make_request()
  s3/client: Wrap http client with struct group_client
  s3/client: Move client::stats to namespace scope
  s3/client: Keep part size local variable
2023-09-14 09:49:08 +03:00
Pavel Emelyanov
4dc4f65b18 test/s3: Remove AWS_S3_EXTRA usage
Now when the keys and region can be configured with "standard"
environment variables, the old custom one can be removed. No automation
uses that it was purely a support for manual testing of a client against
AWS's S3 server

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 11:16:13 +03:00
Pavel Emelyanov
1d00cc5baa test/s3: Run tests over non-anonymous bucket
Currently minio applies anonymous public policy for the test bucket and
all tests just use unsigned S3 requests. This patch generates a policy
for the temporary minio user and removes the anon public one. All tests
are updated respectively to use the provided key:secret pair.

The use-https bit is off by default as minio still starts with plain
http. That's OK for now, all tests are local and have no secret data
anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 11:16:13 +03:00
Pavel Emelyanov
e8e8539c7c code: Rename S3_PUBLIC_BUCKET_FOR_TEST
The bucket is going to stop being public, rename the env variable in
advance to make the essential patch smaller

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 10:25:53 +03:00
Pavel Emelyanov
627c1932e4 s3/client: Move client::stats to namespace scope
The stats is stats about object, not about client, so it's better if it
lives in namespace scope. Also it will avoid conflicts with client stats
that will be reported as metrics (later patch)

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-07 09:25:00 +03:00
Kefu Chai
77faec4f38 s3/test: use seastar::deferred() to perform cleanup
let's use RAII to remove the object use as a fixture, so we don't
leave some object in the bucket for testing. this might interfere
with other tests which share the same minio server with the test
which fails to do its clean up if an exception is thrown.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-20 10:04:54 +08:00
Kefu Chai
7a9c802fc3 s3/test: close using deferred_close()
let's use RAII to tear down the client and the input file, so we can
always perform the cleanups even if the test throws.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-07-20 10:04:54 +08:00
Raphael S. Carvalho
da18a9badf Fix test.py with compaction groups
test.py with --x-log2-compaction-groups option rotted a little bit.
Some boost tests added later didn't use the correct header which
parses the option or they didn't adjust suite.yaml.
Perhaps it's time to set up a weekly (or bi-weekly) job to verify
there are no regressions with it. It's important as it stresses
the data plane for tablets reusing the existing tests available.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #14732
2023-07-18 16:57:11 +03:00
Kefu Chai
ef78b31b43 s3/client: add tagging ops
with tagging ops, we will be able to attach kv pairs to an object.
this will allow us to mark sstable components with taggings, and
filter them based on them.

* test/pylib/minio_server.py: enable anonymous user to perform
  more actions. because the tagging related ops are not enabled by
  "mc anonymous set public", we have to enable them using "set-json"
  subcommand.
* utils/s3/client: add methods to manipulate taggings.
* test/boost/s3_test: add a simple test accordingly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14486
2023-07-11 09:30:46 +03:00
Pavel Emelyanov
ce6a1ca13b Update seastar submodule
* seastar afe39231...99d28ff0 (16):
  > file/util: Include seastar.hh
  > http/exception: Use http::reply explicitly
  > http/client: Include lost condition-variable.hh
  > util: file: drop unnecessary include of reactor.hh
  > tests: perf: add a markdown printer
  > http/client: Introduce unexpected_status_error for client requests
  > sharded: avoid #include <seastar/core/reactor.hh> for run_in_background()
  > code: Use std::is_invocable_r_v instead of InvokeReturns
  > http/client: Add ability to change pool size on the fly
  > http/client: Add getters for active/idle connections counts
  > http/client: Count and limit the number of connections
  > http/client: Add connection->client RAII backref
  > build: use the user-specified compiler when building DPDK
  > build: use proper toolchain based on specified compiler
  > build: only pass CMAKE_C_COMPILER when building ingredients
  > build: use specified compiler when building liburing

Two changes are folded into the commit:

1. missing seastar/core/coroutine.hh include in one .cc file that
   got it indirectly included before seastar reactor.hh drop from
   file.hh

2. http client now returns unexpected_status_error instead of
   std::runtime_error, so s3 test is updated respectively

Closes #14168
2023-06-07 20:25:49 +03:00
Pavel Emelyanov
b3df2d0db0 s3/test: Tune-up multipart upload test alignment
Currently the test uses a sequence of 1024-bytes buffers. This lets
minio server actively de-duplicate those blocks by page boundary (it's a
guess, but it it's truish because minio reports back equivalent ETags
for lots of uploading parts). Make the buffer not be power of two so
that when squashed together the resulting 2^X buffers don't get equal.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:23:18 +03:00
Pavel Emelyanov
fffa04fa67 s3/test: Add jumbo upload test
It re-uses most of the existing upload sink test, but configures the
jumbo sink with at most 3 parts in each intermediate object not to
upload 50Gb part to switch to the next one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:23:18 +03:00
Raphael S. Carvalho
57661f0392 s3: Introduce get_object_stats()
get_object_stats() will be used for retrieving content size and
also last modified.

The latter is required for filling st_mtim, etc, in the
s3::client::readable_file::stat() method.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:51:10 -03:00
Pavel Emelyanov
e00d3188ed s3/test: Add ability to run boost test over real s3
Support the AWS_S3_EXTRA environment vairable that's :-split and the
respective substrings are set as endpoint AWS configuration. This makes
it possible to run boost S3 test over real S3.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:38 +03:00
Pavel Emelyanov
3bec5ea2ce s3/client: Keep server port on config
Currently the code temporarily assumes that the endpoint port is 9000.
This is what tests' local minio is started with. This patch keeps the
port number on endpoint config and makes test get the port number from
minio starting code via environment.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
85f06ca556 s3/client: Construct it with config
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
caf9e357c8 s3/client: Construct it with sstring endpoint
Currently the client is constructed with socket_address which's prepared
by the caller from the endpoint string. That's not flexible engouh,
because s3 client needs to know the original endpoint string for two
reasons.

First, it needs to lookup endpoint config for potential AWS creds.
Second, it needs this exact value as Host: header in its http requests.

So this patch just relaxes the client constructor to accept the endpoint
string and hard-code the 9000 port. The latter is temporary, this is how
local tests' minio is started, but next patch will make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
a77ca69360 s3/test: Rename MINIO_SERVER_ADDRESS environment variable
Using it the pylib minio code export minio address for tests. This
creates unneeded WTFs when running the test over AWS S3, so it's better
to rename to variable not to mention MINIO at all.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
12c4e7d605 s3/test: Keep public bucket name in environment
Local test.py runs minio with the public 'testbucket' bucket and all
test cases know that. This series adds an ability to run tests over real
S3 so the bucket name should be configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
91674da982 s3/test: Fix upload stream closure
If multipart upload fails for some reason the output stream remains not
closed and the respective assertion masquerades the original failure.
Fix that by closing the stream in all cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
033fa107f8 utils: Add S3 readable file impl for random reads
Sometimes an sstable is used for random read, sometimes -- for streamed
read using the input stream. For both cases the file API can be
provided, because S3 API allows random reads of arbitrary lengths.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
a4a64149a6 utils: Add S3 data sink for multipart upload
Putting a large object into S3 using plain PUT is bad choice -- one need
to collect the whole object in memory, then send it as a content-length
request with plain body. Less memory stress is by using multipart
upload, but multipart upload has its limitation -- each part should be
at least 5Mb in size. For that reason using file API doesn't work --
file IO API operates with external memory buffers and the file impl
would only have raw pointers to it. In order to collect 5Mb of chunk in
RAM the impl would have to copy the memory which is not good. Unlike the
file API data_sink API is more flexible, as it has temporary buffers at
hand and can cache them in zero-copy manner.

Having sad that, the S3 data_sink implementation is like this:

* put(buffer):
  move the buffer into local cache, once the local cache grows above 5Mb
  send out the part

* flush:
  send out whatever is in cache, then send upload completion request

* close:
  check that the upload finihsed (in flush), abort the upload otherwise

User of the API may (actually should) wrap the sink with output_stream
and use it as any other output_stream.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
3745b5c715 utils: Add S3 client with basic ops
Those include -- HEAD to get size, PUT to upload object in one go, GET
to read the object as contigious buffer and DELETE to drop one.

The client uses http client from seastar and just implements the S3
protocol using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00