Commit Graph

14 Commits

Author SHA1 Message Date
Pavel Emelyanov
ce6a1ca13b Update seastar submodule
* seastar afe39231...99d28ff0 (16):
  > file/util: Include seastar.hh
  > http/exception: Use http::reply explicitly
  > http/client: Include lost condition-variable.hh
  > util: file: drop unnecessary include of reactor.hh
  > tests: perf: add a markdown printer
  > http/client: Introduce unexpected_status_error for client requests
  > sharded: avoid #include <seastar/core/reactor.hh> for run_in_background()
  > code: Use std::is_invocable_r_v instead of InvokeReturns
  > http/client: Add ability to change pool size on the fly
  > http/client: Add getters for active/idle connections counts
  > http/client: Count and limit the number of connections
  > http/client: Add connection->client RAII backref
  > build: use the user-specified compiler when building DPDK
  > build: use proper toolchain based on specified compiler
  > build: only pass CMAKE_C_COMPILER when building ingredients
  > build: use specified compiler when building liburing

Two changes are folded into the commit:

1. missing seastar/core/coroutine.hh include in one .cc file that
   got it indirectly included before seastar reactor.hh drop from
   file.hh

2. http client now returns unexpected_status_error instead of
   std::runtime_error, so s3 test is updated respectively

Closes #14168
2023-06-07 20:25:49 +03:00
Pavel Emelyanov
b3df2d0db0 s3/test: Tune-up multipart upload test alignment
Currently the test uses a sequence of 1024-bytes buffers. This lets
minio server actively de-duplicate those blocks by page boundary (it's a
guess, but it it's truish because minio reports back equivalent ETags
for lots of uploading parts). Make the buffer not be power of two so
that when squashed together the resulting 2^X buffers don't get equal.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:23:18 +03:00
Pavel Emelyanov
fffa04fa67 s3/test: Add jumbo upload test
It re-uses most of the existing upload sink test, but configures the
jumbo sink with at most 3 parts in each intermediate object not to
upload 50Gb part to switch to the next one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-16 12:23:18 +03:00
Raphael S. Carvalho
57661f0392 s3: Introduce get_object_stats()
get_object_stats() will be used for retrieving content size and
also last modified.

The latter is required for filling st_mtim, etc, in the
s3::client::readable_file::stat() method.

Refs #13649.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-05-07 19:51:10 -03:00
Pavel Emelyanov
e00d3188ed s3/test: Add ability to run boost test over real s3
Support the AWS_S3_EXTRA environment vairable that's :-split and the
respective substrings are set as endpoint AWS configuration. This makes
it possible to run boost S3 test over real S3.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:23:38 +03:00
Pavel Emelyanov
3bec5ea2ce s3/client: Keep server port on config
Currently the code temporarily assumes that the endpoint port is 9000.
This is what tests' local minio is started with. This patch keeps the
port number on endpoint config and makes test get the port number from
minio starting code via environment.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
85f06ca556 s3/client: Construct it with config
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
caf9e357c8 s3/client: Construct it with sstring endpoint
Currently the client is constructed with socket_address which's prepared
by the caller from the endpoint string. That's not flexible engouh,
because s3 client needs to know the original endpoint string for two
reasons.

First, it needs to lookup endpoint config for potential AWS creds.
Second, it needs this exact value as Host: header in its http requests.

So this patch just relaxes the client constructor to accept the endpoint
string and hard-code the 9000 port. The latter is temporary, this is how
local tests' minio is started, but next patch will make it configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-05-03 20:19:43 +03:00
Pavel Emelyanov
a77ca69360 s3/test: Rename MINIO_SERVER_ADDRESS environment variable
Using it the pylib minio code export minio address for tests. This
creates unneeded WTFs when running the test over AWS S3, so it's better
to rename to variable not to mention MINIO at all.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
12c4e7d605 s3/test: Keep public bucket name in environment
Local test.py runs minio with the public 'testbucket' bucket and all
test cases know that. This series adds an ability to run tests over real
S3 so the bucket name should be configurable.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
91674da982 s3/test: Fix upload stream closure
If multipart upload fails for some reason the output stream remains not
closed and the respective assertion masquerades the original failure.
Fix that by closing the stream in all cases.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-19 12:51:12 +03:00
Pavel Emelyanov
033fa107f8 utils: Add S3 readable file impl for random reads
Sometimes an sstable is used for random read, sometimes -- for streamed
read using the input stream. For both cases the file API can be
provided, because S3 API allows random reads of arbitrary lengths.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
a4a64149a6 utils: Add S3 data sink for multipart upload
Putting a large object into S3 using plain PUT is bad choice -- one need
to collect the whole object in memory, then send it as a content-length
request with plain body. Less memory stress is by using multipart
upload, but multipart upload has its limitation -- each part should be
at least 5Mb in size. For that reason using file API doesn't work --
file IO API operates with external memory buffers and the file impl
would only have raw pointers to it. In order to collect 5Mb of chunk in
RAM the impl would have to copy the memory which is not good. Unlike the
file API data_sink API is more flexible, as it has temporary buffers at
hand and can cache them in zero-copy manner.

Having sad that, the S3 data_sink implementation is like this:

* put(buffer):
  move the buffer into local cache, once the local cache grows above 5Mb
  send out the part

* flush:
  send out whatever is in cache, then send upload completion request

* close:
  check that the upload finihsed (in flush), abort the upload otherwise

User of the API may (actually should) wrap the sink with output_stream
and use it as any other output_stream.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Pavel Emelyanov
3745b5c715 utils: Add S3 client with basic ops
Those include -- HEAD to get size, PUT to upload object in one go, GET
to read the object as contigious buffer and DELETE to drop one.

The client uses http client from seastar and just implements the S3
protocol using it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00