TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place.
To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream.
Closesscylladb/scylladb#16175
* github.com:scylladb/scylladb:
sstable: Generalize toc file read and parse
s3/client: Don't GET object contents on out-of-bound reads
s3/client: Cache stats on readable_file
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.
That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
S3-based sstables components are immutable, so every time stat is called
there's no need to ping server again.
But the main intention of this patch is to provide stats for read calls
in the next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.
In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.
The translation is, for now, pretty simple:
- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
vla (variable length array) is an extension in GCC and Clang. and
it is not part of the C++ standard.
so let's avoid using it if possible, for better standard compliant.
it's also more consistent with other places where we calculate the size
of an array of T in the same source file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16084
instead of linking against cryptopp, we should link against
crytopp::crytopp. the latter is the target exposed by
Findcryptopp.cmake, while the former is but a library name which
is not even exposed by any find_package() call.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16060
The jumbo sink is there to upload files that can be potentially larger
than 50Gb (10000*5Mb). For that the sink uploads a set of so called
"pieces" -- files up to 50Gb each -- then uses the copy-upload APi call
to squash the pieces together. After copying the piece is removed. In
case of a crash while uploading pieces remain in the bucket forever
which is not great.
This patch tags pieces with 'kind=piece' tag in order to tell pieces
from regular objects. This can be used, for example, by setting up the
lifecycle tag-based policy and collect dangling pieces eventually.
fixes: #13670
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#16023
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.
we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15913
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15939
* github.com:scylladb/scylladb:
test/nodetool: add README.md
tools/scylla-nodetool: implement enableautocompaction command
tools/scylla-nodetool: implement disableautocompaction command
tools/scylla-nodetool: implement the flush command
tools/scylla-nodetool: extract keyspace/table parsing
tools/scylla-nodetool: implement the drain command
tools/scylla-nodetool: implement the snapshot command
test/nodetool: add support for matching aproximate query parameters
utils/http: make dns_connection_factory::initialize() static
Said method can out-live the factory instance. This was not a problem
because the method takes care to keep all its need from `this` alive, by
copying them to the coroutine stack. However, this fact that this method
can out-live the instance is not obvious, and an unsuspecting developer
(me) added a new member (_logger) which was not kept alive.
This can cause a use-after-free in the factory. Fix by making
initialize() static, forcing the instance to pass all parameters
explicitely and add a comment explaining that this method can out-live
the instance.
on debian derivatives librapidxml-dev installs rapidxml.h as
rapixml/rapidxml.hpp, so let's use it as a fallback.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15814
While it may not be explicitly documented DynamoDB sometimes enchriches error
message by additional fields. For instance when ConditionalCheckFailedException
occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object,
similarly for TransactionCanceledException it will add CancellationReasons object.
There may be more cases like this so generic json field is added to our error class.
The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure
feature.
When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case
fixes: #13014Closesscylladb/scylladb#15824
* github.com:scylladb/scylladb:
test: Add s3_client test for upload PUT fallback
s3/client: Add PUT fallback to upload sink
When the non-jumbo sink is flushed and notices that the real upload is
not started yet, it may just go ahead and PUT the buffers into the
object with the single request.
For jumbo sink the fallback is not implemented as it likely doesn't make
and any sense -- jumbo sinks are unlikely to produce less than 5Mb of
data so it's going to be dead code anyway.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.
One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.
We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.
Ref #13930.
Closesscylladb/scylladb#15153
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.
Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.
Fixes: #7949
Signed-off-by: Michael Huang <michaelhly@gmail.com>
Closesscylladb/scylladb#15499
Consider the following code snippet:
```c++
future<> foo() {
semaphore.consume(1024);
}
future<> bar() {
return _allocating_section([&] {
foo();
});
}
```
If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves).
Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception.
A test reproducing the bug is also added.
Fixes: #15278Closesscylladb/scylladb#15581
* github.com:scylladb/scylladb:
test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill
utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled()
reader_concurrency_semaphore: use utils::memory_limit_reached exception
utils: add memory_limit_reached exception
It may happen that wrapping up multipart upload fails too. However,
before sending the request the driver clears the _upload_id field thus
marking the whole process as "all is OK". So in case the finalization
method fails and thrown, the upload context remains on the server side
forever.
Fix this by keeping the _upload_id set, so even if finalization throws,
closing the uploader notices this and calls abort.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15521
Said method catches bad-allocs and retries the passed-in function after
raising the reserves. This does nothing to help the function succeed if
the bad alloc was throw from the semaphore, because the kill limit was
reached. In this case the read should be left to fail and terminate.
Now that the semaphore is throwing utils::memory_limit_reached in this
case, we can distinguish this case and just re-throw the exception.
A distinct exception derived from std::bad_alloc, used in cases when
memory didn't really run out, but the process or task reached the memory
limit alloted for it. Using a distinct type for this case allows for LSA
to correctly react to this case.
The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any.
Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not.
This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use.
Closesscylladb/scylladb#15497
* github.com:scylladb/scylladb:
s3::client: Track memory in client uploads
code: Configure s3 clients' memory usage
s3::client: Construct client with shared semaphore
sstables::storage_manager: Introduce config
This new exception type inherits from std::bad_alloc and allows logalloc
code to add additional information about why the allocation failed. We
currently have 3 different throw sites for std::bad_alloc in logalloc.cc
and when investigating a coredump produced by --abort-on-lsa-bad-alloc,
it is impossible to determine, which throw-site activated last,
triggering the abort.
This patch fixes that by disambiguating the throw-sites and including it
in the error message printed, right before abort.
Refs: #15373Closesscylladb/scylladb#15503
when accessing AWS resources, uses are allowed to long-term security
credentials, they can also the temporary credentials. but if the latter
are used, we have to pass a session token along with the keys.
see also https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_use-resources.html
so, if we want to programatically get authenticated, we need to
set the "x-amz-security-token" header,
see
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#UsingTemporarySecurityCredentials
so, in this change, we
1. add another member named `token` in `s3::endpoint_config::aws_config`
for storing "AWS_SESSION_TOKEN".
2. populate the setting from "object_storage.yaml" and
"$AWS_SESSION_TOKEN" environment variable.
3. set "x-amz-security-token" header if
`s3::endpoint_config::aws_config::token` is not empty.
this should allow us to test s3 client and s3 object store backend
with S3 bucket, with the temporary credentials.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15486
When a piece is uploaded it's first flushed, then upload-copy is issued.
Both happen in the background and if piece flush calls resolves with
exception the exception remains unhandled. That's OK, since upload
finalization code checks that some pieces didn't complete (for whatever
reason) and fails the whole uploading, however, the ignored exception is
reported in logs. Not nice.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15491
as the size of `rapidxml::xml_document` size quite large, let's
allocate it on the heap. otherwise GCC 13.2.1 warns us like:
```
utils/s3/client.cc: In function ‘seastar::sstring s3::parse_multipart_copy_upload_etag(seastar::sstring&)’:
utils/s3/client.cc:455:9: warning: stack usage is 66208 bytes [-Wstack-usage=]
455 | sstring parse_multipart_copy_upload_etag(sstring& body) {
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15472
When uploading an object part, client spawns a background fiber that
keeps the buffers with data on the http request's write_body() lambda
capture. This generates unbound usage of memory with uploaded buffers
which is not nice. Even though s3 client is limited with http's client
max-connections parallelism, waiting for the available connection still
happens with buffers held in memory.
This patch makes the client claim the background memory from the
provided semaphore (which, in turn, sits on the shard-wide storage
manager instance). Once body writing is complete, the claimed units are
returned back to the semaphore allowing for more background writes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The semaphore will be used to cap memory consumption by client. This
patch makes sure the reference to a semaphore exists as an argument to
client's constructor, not more than that.
In scylla binary, the semaphore sits on storage_manager. In tests the
semaphore is some local object. For now the semaphore is unused and is
initialized locked as this patch just pushes the needed argument all the
way around, next patches will make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This series introduces a scylla-native nodetool. It is invokable via the main scylla executable as the other native tools we have. It uses the seastar's new `http::client` to connect to the specified node and execute the desired commands.
For now a single command is implemented: `nodetool compact`, invokable as `scylla nodetool compact`. Once all the boilerplate is added to create a new tool, implementing a single command is not too bad, in terms of code-bloat. Certainly not as clean as a python implementation would be, but good enough. The advantages of a C++ implementation is that all of us in the core team know C++ and that it is shipped right as part of the scylla executable..
Closes#14841
* github.com:scylladb/scylladb:
test: add nodetool tests
test.py: add ToolTestSuite and ToolTest
tools/scylla-nodetool: implement compact operation
tools/scylla-nodetool: implement basic scylla_rest_api_client
tools: introduce scylla-nodetool
utils: export dns_connection_factory from s3/client.cc to http.hh
utils/s3/client: pass logger to dns_connection_factory in constructor
tools/utils: tool_app_template::run_async(): also detect --help* as --help
We want to publish this class in a header so it can be used by others,
but it uses the s3 logger. We don't want future users to pollute the s3
logs, so allow users to pass their own loggers to the factory.
The added metrics include:
- http client metrics, which include the number of connections, the number of active connections and the number of new connections made so far
- IO metrics that mimic those for traditional IO -- total number of object read/write ops, total number of get/put/uploaded bytes and individual IO request delay (round-trip, including body transfer time)
fixes: #13369Closes#14494
* github.com:scylladb/scylladb:
s3/client: Add IO stats metrics
s3/client: Add HTTP client metrics
s3/client: Split make_request()
s3/client: Wrap http client with struct group_client
s3/client: Move client::stats to namespace scope
s3/client: Keep part size local variable
to be aligned with seastar's coding-style.md: scylladb uses seastar's
coding-style.md. so let's adhere to it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15345
seastar has deprecated the overload which accepts `server_name`,
let's use the one which accepts `tls::tls_options`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#15324
These metrics mimic the existing IO ones -- total number of read
operation, total number of read bytes and total read delay. And the same
for writing.
This patch makes no difference between wrting object with plain PUT vs
putting it with multipart uploading. Instead, it "measures" individual
IO writes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently an http client has several exported "numbers" regarding the
number of transport connections the client uses. This patch exports
those via S3 client's per-sched-group metrics and prepares the ground
for more metrics in next patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There will appear another make_request() helper that'll do mostly the
same. This split will help to avoid code duplication
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The http-client is per-sched-group. Next patch will need to keep metrics
per-sched-group too and this sched-group -> http-client map is the good
place to put them on. Wrapping struct will allow extending it with
metrics
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The stats is stats about object, not about client, so it's better if it
lives in namespace scope. Also it will avoid conflicts with client stats
that will be reported as metrics (later patch)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This serves two purposes. First, it fixes potential use-after-move since
the bufs are moved on lambda and bufs.size() are called in the same
statement with no defined evaluation order.
Second, this makes 'size' varable alive up to the time request is
complete thus making it possible to update stats with it (later patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Right now, the function allows for passing the path to a file as a seastar::sstring,
which is then converted to std::filesystem::path -- implicitly to the caller.
However, the function performs I/O, and there is no reason to accept any other type
than std::filesystem::path, especially because the conversion is straightforward.
Callers can perform it on their own.
This commit introduces the more constrained API.
Closes#15266