If the endpoint config specifies AWS key, secret and region, all the
S3 requests get signed. Signature should have all the x-amz-... headers
included and should contain at least three of them. This patch includes
x-ams-date, x-amz-content-sha256 and host headers into the signing list.
The content can be unsigned when sent over HTTPS, this is what this
patch does.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Existing seastar's factories work on socket_address, but in S3 we have
endpoint name which's a DNS name in case of real S3. So this patch
creates the http client for S3 with the custom connection factory that
does two things.
First, it resolves the provided endpoint name into address.
Second, it loads trust-file from the provided file path (or sets system
trust if configured that way).
Since s3 client creation is no-waiting code currently, the above
initialization is spawned in afiber and before creating the connection
this fiber is waited upon.
This code probably deserves living in seastar, but for now it can land
next to utils/s3/client.cc.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the code temporarily assumes that the endpoint port is 9000.
This is what tests' local minio is started with. This patch keeps the
port number on endpoint config and makes test get the port number from
minio starting code via environment.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the client is constructed with socket_address which's prepared
by the caller from the endpoint string. That's not flexible engouh,
because s3 client needs to know the original endpoint string for two
reasons.
First, it needs to lookup endpoint config for potential AWS creds.
Second, it needs this exact value as Host: header in its http requests.
So this patch just relaxes the client constructor to accept the endpoint
string and hard-code the 9000 port. The latter is temporary, this is how
local tests' minio is started, but next patch will make it configurable.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to access real S3 bucket, the client should use signed requests
over https. Partially this is due to security considerations, partially
this is unavoidable, because multipart-uploading is banned for unsigned
requests on the S3. Also, signed requests over plain http require
signing the payload as well, which is a bit troublesome, so it's better
to stick to secure https and keep payload unsigned.
To prepare signed requests the code needs to know three things:
- aws key
- aws secret
- aws region name
The latter could be derived from the endpoint URL, but it's simpler to
configure it explicitly, all the more so there's an option to use S3
URLs without region name in them we could want to use some time.
To keep the described configuration the proposed place is the
object_storage.yaml file with the format
endpoints:
- name: a.b.c
port: 443
aws_key: 12345
aws_secret: abcdefghijklmnop
...
When loaded, the map gets into db::config and later will be propagated
down to sstables code (see next patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
when compiling using GCC-13, it warns that:
```
/home/kefu/dev/scylladb/utils/s3/client.cc:224:9: error: stack usage might be 66352 bytes [-Werror=stack-usage=]
224 | sstring parse_multipart_upload_id(sstring& body) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~
```
so it turns out that `rapidxml::xml_document<>` could be very large,
let's allocate it on heap instead of on the stack to address this issue.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13722
The s3::readable_file::stat() call returns a hand-crafted stat structure
with some fields set to some sane values, most are constants. However,
other fields remain not initialized which leads to troubles sometimes.
Better to fill the stat with zeroes and later revisit it for more sane
values.
fixes: #13645
refs: #13649
Using designated initializers is not an option here, see PR #13499
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13650
The upload_sink::_upload_id remains empty until upload starts, remains
non-empty while it proceeds, then becomes empty again after it
completes. The upload_started() method cheks that and on .close()
started upload is aborted.
The final switch to empty is done by std::move()ing the upload id into
completion requrest, but it's better to use std::exchange() to emphasize
the fact the the _upload_id becomes empty at that point for a reason.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13570
The message length is pre-calculated in advance to provide correct
content-length request header. This math is not obvious and deserves a
comment.
Also, the final message preparation code is also implicitly checking
if any part failed to upload. There's a comment in the upload_sink's
upload_part() method about it, but the finalization place deserves one
too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When all parts upload complete the final message is prepared and sent
out to the server. The preparation code is also responsible for checking
if all parts uploaded OK by checking the part etag to be non-empty. In
that check a misprint crept in -- the whole list is checked to be empty,
not the individual etag itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Docs say, that part numbers should start from 1, while the code follows
the tradition and starts from 0. Minio is conveniently incompatible in
this sense so test had been passing so far. On real S3 part number 0
ends up with failed request.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It makes compiler complan about mis-ordered initialization of st_nlink
vs st_mode on different arches. Current code (st_nlink before st_mode)
compiled fine on x86, but fails on ARM which wants st_mode to come
before st_nlink. Changing the order would, apparently, break x86 build
with similar message.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13499
Sometimes an sstable is used for random read, sometimes -- for streamed
read using the input stream. For both cases the file API can be
provided, because S3 API allows random reads of arbitrary lengths.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Putting a large object into S3 using plain PUT is bad choice -- one need
to collect the whole object in memory, then send it as a content-length
request with plain body. Less memory stress is by using multipart
upload, but multipart upload has its limitation -- each part should be
at least 5Mb in size. For that reason using file API doesn't work --
file IO API operates with external memory buffers and the file impl
would only have raw pointers to it. In order to collect 5Mb of chunk in
RAM the impl would have to copy the memory which is not good. Unlike the
file API data_sink API is more flexible, as it has temporary buffers at
hand and can cache them in zero-copy manner.
Having sad that, the S3 data_sink implementation is like this:
* put(buffer):
move the buffer into local cache, once the local cache grows above 5Mb
send out the part
* flush:
send out whatever is in cache, then send upload completion request
* close:
check that the upload finihsed (in flush), abort the upload otherwise
User of the API may (actually should) wrap the sink with output_stream
and use it as any other output_stream.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Those include -- HEAD to get size, PUT to upload object in one go, GET
to read the object as contigious buffer and DELETE to drop one.
The client uses http client from seastar and just implements the S3
protocol using it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>