Commit Graph

1564 Commits

Author SHA1 Message Date
Kefu Chai
a1dcddd300 utils: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16833
2024-01-18 12:50:06 +02:00
Kefu Chai
f11a53856d utils/pretty_printers: add "I" specifier support
this is to mimic the formatting of `human_readable_value`, and
to prepare for consolidating these two formatters, so we don't have
two pretty printers in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:47 +08:00
Kefu Chai
7d627b328f utils/pretty_printers: use the formatting of to_hr_size()
keep the precision of 4 digits, for instance, so that we format
"8191" as "8191" instead of as "8 Ki". this is modeled after
the behavior of `to_hr_size()`. for better user experience.
and also prepares to consolidate these two formatters.

tests are updated to exercise both IEC and SI notations.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 14:33:03 +08:00
Patryk Wrobel
a64eb92369 utils: specialize fmt::formatter for utils::tagged_integer
This change introduces a specialization of fmt::formatter
for utils::tagged_integer. This enables the usage of this
type with FMTv10, which dropped the default generated formatter.

Usage of utils::tagged_integer without the defined formatter
resulted in compilation error when compiled with FMTv10.

Refs: #13245

Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#16715
2024-01-10 18:32:43 +03:00
Kefu Chai
c1beba1f7d utils: config_file: throw bpo::invalid_option_value() when seeing invalid option
before this change, `std::invalid_argument` is thrown by
`bpo::notify(configuration)` in `app_template::run_deprecated()` when
invalid option is passed in via command line. `utils::named_value`
throws `std::invalid_argument` if the given value is not listed in
`_allowed_values`. but we don't handle `std::invalid_argument` in
`app_template::run_deprecated()`. so the application aborts with
unhandled exception if the specified argument is not allowed.

in this change, we convert the `std::invalid_argument` to a
derived class of `bpo::error` in the customized notify handler,
so that it can be handled in `app_template::run_deprecated()`.

because `name_value::operator()` is also used otherwhere, we
should not throw a bpo::error there. so its exception type
is preserved.

Fixes #16687
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16688
2024-01-09 11:49:06 +02:00
Kefu Chai
db9e314965 treewide: apply codespell to the comments in source code
for less spelling errors in comment.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16408
2023-12-20 10:25:03 +02:00
Botond Dénes
a6200e99e6 Merge 'Handle S3 partial read overflows' from Pavel Emelyanov
The test case that validates upload-sink works does this by getting several random ranges from the uploaded object and checks that the content is what it should be. The range boundaries are generated like this:

```
    uint64_t len = random(1, chunk_size);
    uint64_t offset = random(file_size) - len;
```

The 2nd line is not correct, if random number happens less than the len the offset befomes "negative", i.e. -- very large 64-bit unsigned value.

Next, this offset:len gets into s3 client's get_object_contiguous() helper which in turn converts them into http range header's bytes-specifier format which is "first_bytet-last_byte" one. The math here is

```
    first_byte = offset;
    last_byte = offset + len - 1;
```

Here the overflow of the offset thing results in underflow of the last_byte -- it becomes less than the first_byte. According to RFC this range-specifier is invalid and (!) can be ignored by the server. This is what minio does -- it ignores invalid range and returns back full object.

But that's not all. When returning object portion the http request status code is PartialContent, but when the range is ignored and full object is returned, the status is OK. This makes s3 client's request fail with unexpected_status_error in the middle of the test. Then the object is removed with deferred action and actual error is printed into logs. In the end of the day logs look as if deletion of an object failed with OK status %)

fixes: #16133

Closes scylladb/scylladb#16324

* github.com:scylladb/scylladb:
  test/s3: Avoid object range overflow
  s3/client: Handle GET-with-Range overflows correctly
2023-12-18 10:00:32 +02:00
Kefu Chai
c485644303 utils: bit_cast: drop unused #includes
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-12-12 21:09:51 +08:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Pavel Emelyanov
3e9309caf4 s3/client: Handle GET-with-Range overflows correctly
The get_object_contiguous() accepts optional range argument in a form of
offset:lengh and then converts it into first_byte:last_byte pair to
satisfy http's Range header range-specifier.

If the lat_byte, which is offset + lenght - 1, overflows 64-bits the
range specifier becomes invalid. According to RFC9110 servers may ignore
invalid ranges if they want to and this is what minio does.

The result is pretty interesting. Since the range is specified, client
expect PartialContent response, but since the range is ignored by server
the result is OK, as if the full object was requested. So instead of
some sane "overflow" error, the get_object_contiguous() fails with
status "success".

The fix is in pre-checking provided ranges and failing early

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-07 10:50:55 +03:00
Tomasz Grabiec
083a0279a9 error_injection: Introduce poll_for_message()
To allow more complex waiting, which involves other exit conditions.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
ce0dc9e940 error_injection: Make is_enabled() public 2023-12-06 18:36:17 +01:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Benny Halevy
0bcce35abd treewide: get rid of now unused fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:49 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Tomasz Grabiec
d3d83869ce storage_service: Introduce session concept 2023-12-05 14:09:34 +01:00
Avi Kivity
60af2f3cb2 Merge 'New commitlog file format using tagged pages' from Calle Wilund
Prototype implementation of format suggested/requested by @avikivity:

Divides segments into disk-write-alignment sized pages, each tagged with segment ID + CRC of data content.
When read, we both verify sector integrity (CRC) to detect corruption, as well as matching ID read with expected one.

If the latter mismatches we have a prematurely terminated segment (read truncation), which, depending on whether the CL is
written in batch or periodic mode, as well as explicit sync, can mean data loss.

Note: all-zero pages are treated as kosher, both to align with newly allocated segments, as well as fully terminated (zero-page) ones.

Note: This is a preview/RFC - the rest of the file format is not modified. At least parts of entry CRC could probably be removed, but I have not done so yet (needs some thinking).

Note: Some slight abstraction breaks in impl. and probably less than maximal efficiency.

v2:
* Removed entry CRC:s in file format.
* Added docs on format v3
* Added one more test for recycling-truncation

v3:
* Fixed typos in size calc and docs
* Changed sect metadata order
* Explicit iter type

Closes scylladb/scylladb#15494

* github.com:scylladb/scylladb:
  commitlog_test: Add test for replaying large-ish mutation
  commitlog_test: Add additional test for segmnent truncation
  docs: Add docs on commitlog format 3
  commitlog: Remove entry CRC from file format
  commitlog: Implement new format using CRC:ed sectors
  commitlog: Add iterator adaptor for doing buffer splitting into sub-page ranges
  fragmented_temporary_buffer: Add const iterator access to underlying buffers
  commitlog_replayer: differentiate between truncated file and corrupt entries
2023-12-04 13:31:13 +01:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Piotr Smaroń
5fd30578d7 config: introduce value_status::Deprecated
Current mechanism to deprecate config options is implemented in a hacky
way in `main.cpp` and doesn't account for existing
`db::config/boost::po` API controlling lifetime of config options, hence
it's being replaced in this PR by adding yet another `value_status`
enumerator: `Deprecated`, so that deprecation of config options is
controlled in one place in `config.cc`,i.e. when specifying config options.
Motivation: https://docs.google.com/document/d/18urPG7qeb7z7WPpMYI2V_lCOkM5YGKsEU78SDJmt8bM/edit?usp=sharing

With this change, if a `Deprecated` config option is specified as
1. a command line parameter, scylla will run and log:
```
WARN  2023-11-25 23:37:22,623 [shard 0:main] init - background-writer-scheduling-quota option ignored (deprecated)
```
(Previously it was only a message printed to standard output, not a
scylla log of warn level).
2. an option in `scylla.yaml`, scylla will run and log:
```
WARN  2023-11-27 23:55:13,534 [shard 0:main] init - Option is deprecated : background_writer_scheduling_quota
```

Fixes #15887
Incorporates dropped https://github.com/scylladb/scylladb/pull/15928

Closes scylladb/scylladb#16184
2023-11-30 08:52:57 +03:00
Avi Kivity
cccd2e7fa7 Merge 'Generalize sstables TOC file reading' from Pavel Emelyanov
TOC file is read and parsed in several places in the code. All do it differently, and it's worth generalizing this place.
To make it happen also fix the S3 readable_file so that it could be used inside file_input_stream.

Closes scylladb/scylladb#16175

* github.com:scylladb/scylladb:
  sstable: Generalize toc file read and parse
  s3/client: Don't GET object contents on out-of-bound reads
  s3/client: Cache stats on readable_file
2023-11-29 19:18:31 +02:00
Kefu Chai
c40da20092 utils/pretty_printers: stop using undocumented fmt api
format_parse_context::on_error() is an undocumented API in fmt v9
and in fmt v10, see

- https://fmt.dev/9.1.0/api.html#_CPPv4I0EN3fmt16basic_format_argE
- https://fmt.dev/10.0.0/api.html#_CPPv4I0EN3fmt26basic_format_parse_contextE

despite that this API was once used in its document for fmt v10.0.0, see
https://fmt.dev/10.0.0/api.html#formatting-user-defined-types. it's
still, well, undocumented.

so, to have better compatibility, let's use the documented API in place
of undocumented one. please note, `throw_format_error()` was still
not a public API before 10.1.0, so before that release we have to
throw `fmt::format_error` explicitly. so we cannot use it yet during
the transitional period.

because the class of `fmt::format_error` is defined in `fmt/format.h`,
we need to include this header for using it.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16212
2023-11-29 12:49:04 +02:00
Pavel Emelyanov
c5d85bdf79 s3/client: Don't GET object contents on out-of-bound reads
If S3 readable file is used inside file input stream, the latter may
call its read methods with position that is above file size. In that
case server replies with generic http error and the fact that the range
was invalid is encoded into reply body's xml.

That's not great to catch this via wrong reply status exception and xml
parsing all the more so we can know that the read is out-of-bound in
advance.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:09:52 +03:00
Pavel Emelyanov
339182287f s3/client: Cache stats on readable_file
S3-based sstables components are immutable, so every time stat is called
there's no need to ping server again.

But the main intention of this patch is to provide stats for read calls
in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-29 12:06:54 +03:00
Pavel Emelyanov
210b01a5ce config: Make object storage config updateable_value_source
Now its plain updateable_value, but without the ..._source object the
updateable_value is just a no-op value holder. In order for the
observers to operate there must be the value source, updating it would
update the attached updateable values _and_ notify the observers.

In order for the config to be the u.v._source, config entries should be
comparable to each other, thus the <=> operator for it

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
855626f7de s3/client: Map http exceptions into storage_io_error
When http request resolves with excpetion it makes sense to translate
the network exception into storage exceptio to make upper layers think
that it was some sort of IO error, not SUDDENLY and http one.

The translation is, for now, pretty simple:

- 404 and 3xx -> ENOENT
- 403(forbidden) and 401(unauthorized) -> EACCESS
- anything else -> EIO

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 16:47:50 +03:00
Pavel Emelyanov
0e9428ab4a exceptions: Extend storage_io_error construction options
To make it possible to construct it with plain errno value and a string

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-11-21 13:37:52 +03:00
Calle Wilund
560364d278 fragmented_temporary_buffer: Add const iterator access to underlying buffers
Breaks abstraction a bit, but some (me) might need something like it...
2023-11-21 08:42:33 +00:00
Kefu Chai
691f7f6edb util: do not use variable length array
vla (variable length array) is an extension in GCC and Clang. and
it is not part of the C++ standard.

so let's avoid using it if possible, for better standard compliant.
it's also more consistent with other places where we calculate the size
of an array of T in the same source file.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16084
2023-11-20 23:02:41 +02:00
Kefu Chai
12f4f9f481 build: cmake: link against cryptopp::cryptopp
instead of linking against cryptopp, we should link against
crytopp::crytopp. the latter is the target exposed by
Findcryptopp.cmake, while the former is but a library name which
is not even exposed by any find_package() call.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16060
2023-11-15 17:14:04 +02:00
Pavel Emelyanov
f4fd5c7207 s3/client: Tag pieces of jumbo uploader
The jumbo sink is there to upload files that can be potentially larger
than 50Gb (10000*5Mb). For that the sink uploads a set of so called
"pieces" -- files up to 50Gb each -- then uses the copy-upload APi call
to squash the pieces together. After copying the piece is removed. In
case of a crash while uploading pieces remain in the bucket forever
which is not great.

This patch tags pieces with 'kind=piece' tag in order to tell pieces
from regular objects. This can be used, for example, by setting up the
lifecycle tag-based policy and collect dangling pieces eventually.

fixes: #13670

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#16023
2023-11-15 15:32:30 +02:00
Kefu Chai
efd65aebb2 build: cmake: add check-header target
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.

we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15913
2023-11-13 10:27:06 +02:00
Nadav Har'El
284534f489 Merge 'Nodetool additional commands 4/N' from Botond Dénes
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction

All commands come with tests and all tests pass with both the new and the current nodetool implementations.

Refs: https://github.com/scylladb/scylladb/issues/15588

Closes scylladb/scylladb#15939

* github.com:scylladb/scylladb:
  test/nodetool: add README.md
  tools/scylla-nodetool: implement enableautocompaction command
  tools/scylla-nodetool: implement disableautocompaction command
  tools/scylla-nodetool: implement the flush command
  tools/scylla-nodetool: extract keyspace/table parsing
  tools/scylla-nodetool: implement the drain command
  tools/scylla-nodetool: implement the snapshot command
  test/nodetool: add support for matching aproximate query parameters
  utils/http: make dns_connection_factory::initialize() static
2023-11-08 11:18:35 +02:00
Nadav Har'El
a3621dbd3e Merge 'Alternator: Support new ReturnValuesOnConditionCheckFailure feature' from Marcin Maliszkiewicz
alternator: add support for ReturnValuesOnConditionCheckFailure feature

As announced in https://aws.amazon.com/about-aws/whats-new/2023/06/amazon-dynamodb-cost-failed-conditional-writes/, DynamoDB added a new option for write operations (PutItem, UpdateItem, or DeleteItem), ReturnValuesOnConditionCheckFailure, which if set to ALL_OLD returns the current value of the item - but only if a condition check failed.

Fixes https://github.com/scylladb/scylladb/issues/14481

Closes scylladb/scylladb#15125

* github.com:scylladb/scylladb:
  alternator: add support for ReturnValuesOnConditionCheckFailure feature
  alternator: add ability to send additional fields in api_error
2023-11-07 23:19:51 +02:00
Botond Dénes
b61822900b utils/http: make dns_connection_factory::initialize() static
Said method can out-live the factory instance. This was not a problem
because the method takes care to keep all its need from `this` alive, by
copying them to the coroutine stack. However, this fact that this method
can out-live the instance is not obvious, and an unsuspecting developer
(me) added a new member (_logger) which was not kept alive.
This can cause a use-after-free in the factory. Fix by making
initialize() static, forcing the instance to pass all parameters
explicitely and add a comment explaining that this method can out-live
the instance.
2023-11-07 04:39:33 -05:00
Kefu Chai
ef023dae44 s3: use rapixml/rapidxml.hpp as a fallback
on debian derivatives librapidxml-dev installs rapidxml.h as
rapixml/rapidxml.hpp, so let's use it as a fallback.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15814
2023-11-01 10:25:40 +03:00
Marcin Maliszkiewicz
b4c77a373d alternator: add ability to send additional fields in api_error
While it may not be explicitly documented DynamoDB sometimes enchriches error
message by additional fields. For instance when ConditionalCheckFailedException
occurs while ReturnValuesOnConditionCheckFailure is set it will add Item object,
similarly for TransactionCanceledException it will add CancellationReasons object.
There may be more cases like this so generic json field is added to our error class.

The change will be used by future commit implementing ReturnValuesOnConditionCheckFailure
feature.
2023-10-30 15:13:06 +01:00
Botond Dénes
ceb866fa2e Merge 'Make s3 upload sink PUT small objects' from Pavel Emelyanov
When upload-sink is flushed, it may notice that the upload had not yet been started and fall-back to plain PUT in that case. This will make small files uploading much nicer, because multipart upload would take 3 API calls (start, part, complete) in this case

fixes: #13014

Closes scylladb/scylladb#15824

* github.com:scylladb/scylladb:
  test: Add s3_client test for upload PUT fallback
  s3/client: Add PUT fallback to upload sink
2023-10-25 10:03:46 +03:00
Kefu Chai
f8104b92f8 build: cmake: detect rapidxml
we use rapidxml for parsing XML, so let's detect it before using it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15813
2023-10-24 15:12:04 +03:00
Pavel Emelyanov
63f2bdca01 s3/client: Add PUT fallback to upload sink
When the non-jumbo sink is flushed and notices that the real upload is
not started yet, it may just go ahead and PUT the buffers into the
object with the single request.

For jumbo sink the fallback is not implemented as it likely doesn't make
and any sense -- jumbo sinks are unlikely to produce less than 5Mb of
data so it's going to be dead code anyway.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-10-24 10:59:46 +03:00
Avi Kivity
ee9cc450d4 logalloc: report increases of reserves
The log-structured allocator maintains memory reserves to so that
operations using log-strucutured allocator memory can have some
working memory and can allocate. The reserves start small and are
increased if allocation failures are encountered. Before starting
an operation, the allocator first frees memory to satisfy the reserves.

One problem is that if the reserves are set to a high value and
we encounter a stall, then, first, we have no idea what value
the reserves are set to, and second, we have no idea what operation
caused the reserves to be increased.

We fix this problem by promoting the log reports of reserve increases
from DEBUG level to INFO level and by attaching a stack trace to
those reports. This isn't optimal since the messages are used
for debugging, not for informing the user about anything important
for the operation of the node, but I see no other way to obtain
the information.

Ref #13930.

Closes scylladb/scylladb#15153
2023-10-23 13:37:50 +02:00
Botond Dénes
9231454acd mutation/json: extract generic streaming writer into utils/rjson.hh
This writer is generally useful, not just for writing mutations as json.
Make it generally available as well.
2023-10-20 10:04:56 -04:00
Michael Huang
75109e9519 cql3: Fix invalid JSON parsing for JSON objects with ASCII keys
For JSON objects represented as map<ascii, int>, don't treat ASCII keys
as a nested JSON string. We were doing that prior to the patch, which
led to parsing errors.

Included the error offset where JSON parsing failed for
rjson::parse related functions to help identify parsing errors
better.

Fixes: #7949

Signed-off-by: Michael Huang <michaelhly@gmail.com>

Closes scylladb/scylladb#15499
2023-10-05 22:26:08 +03:00
Avi Kivity
e600f35d1e Merge 'logalloc, reader_concurrency_semaphore: cooperate on OOM kills' from Botond Dénes
Consider the following code snippet:
```c++
future<> foo() {
    semaphore.consume(1024);
}

future<> bar() {
    return _allocating_section([&] {
        foo();
    });
}
```

If the consumed memory triggers the OOM kill limit, the semaphore will throw `std::bad_alloc`. The allocating section will catch this, bump std reserves and retry the lambda. Bumping the reserves will not do anything to prevent the next call to `consume()` from triggering the kill limit. So this cycle will repeat until std reserves are so large that ensuring the reserve fails. At this point LSA gives up and re-throws the `std::bad_alloc`. Beyond the useless time spent on code that is doomed to fail, this also results in expensive LSA compaction and eviction of the cache (while trying to ensure reserves).
Prevent this situation by throwing a distinct exception type which is derived from `std::bad_alloc`. Allocating section will not retry on seeing this exception.
A test reproducing the bug is also added.

Fixes: #15278

Closes scylladb/scylladb#15581

* github.com:scylladb/scylladb:
  test/boost/row_cache_test: add test_cache_reader_semaphore_oom_kill
  utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled()
  reader_concurrency_semaphore: use utils::memory_limit_reached exception
  utils: add memory_limit_reached exception
2023-10-05 19:47:21 +03:00
Pavel Emelyanov
c4f1929eea s3: Abort multipart upload if finalize request fails
It may happen that wrapping up multipart upload fails too. However,
before sending the request the driver clears the _upload_id field thus
marking the whole process as "all is OK". So in case the finalization
method fails and thrown, the upload context remains on the server side
forever.

Fix this by keeping the _upload_id set, so even if finalization throws,
closing the uploader notices this and calls abort.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#15521
2023-10-03 09:47:33 +03:00
Botond Dénes
c0da6bcfb8 utils/logalloc: handle utils::memory_limit_reached in with_reclaiming_disabled()
Said method catches bad-allocs and retries the passed-in function after
raising the reserves. This does nothing to help the function succeed if
the bad alloc was throw from the semaphore, because the kill limit was
reached. In this case the read should be left to fail and terminate.
Now that the semaphore is throwing utils::memory_limit_reached in this
case, we can distinguish this case and just re-throw the exception.
2023-09-27 10:28:00 -04:00
Botond Dénes
721ffa319d utils: add memory_limit_reached exception
A distinct exception derived from std::bad_alloc, used in cases when
memory didn't really run out, but the process or task reached the memory
limit alloted for it. Using a distinct type for this case allows for LSA
to correctly react to this case.
2023-09-27 10:26:41 -04:00
Kefu Chai
ac3406e537 utils/s3/creds: rename aws_config member variables
- s/key/access_key_id/
- s/secret/secret_access_key/
- s/token/session_token/

so they are more aligned with the AWS document.
for instance, in
https://docs.aws.amazon.com/AmazonS3/latest/userguide/RESTAuthentication.html#ConstructingTheAuthenticationHeader
AWSAccessKeyId is used in the "Authorization" header.

this would help with the readability and maintainability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-09-23 14:28:07 +08:00
Avi Kivity
1da6a939fe Merge 'Track memory usage of S3 object uploads' from Pavel Emelyanov
The S3 uploading sink needs to collect buffers internally before sending them out, because the minimal upload-able part size is 5Mb. When the necessary amount of bytes is accumulated, the part uploading fibers starts in the background. On flush the sink waits for all the fibers to complete and handles failure of any.

Uploading parallelism is nowadays limited by the means of the http client max-connections parameter. However, when a part uploading fibers waits for it connection it keeps the 5Mb+ buffers on the request's body, so even though the number of uploading parts is limited, the number of _waiting_ parts is effectively not.

This PR adds a shard-wide limiter on the number of background buffers S3 clients (and theirs http clients) may use.

Closes scylladb/scylladb#15497

* github.com:scylladb/scylladb:
  s3::client: Track memory in client uploads
  code: Configure s3 clients' memory usage
  s3::client: Construct client with shared semaphore
  sstables::storage_manager: Introduce config
2023-09-21 18:24:42 +03:00
Botond Dénes
a0c5dee2aa utils/logalloc: introduce logalloc::bad_alloc
This new exception type inherits from std::bad_alloc and allows logalloc
code to add additional information about why the allocation failed. We
currently have 3 different throw sites for std::bad_alloc in logalloc.cc
and when investigating a coredump produced by --abort-on-lsa-bad-alloc,
it is impossible to determine, which throw-site activated last,
triggering the abort.
This patch fixes that by disambiguating the throw-sites and including it
in the error message printed, right before abort.

Refs: #15373

Closes scylladb/scylladb#15503
2023-09-21 17:43:53 +03:00
Kefu Chai
0819788207 utils/s3: use structured binding when appropriate
and use `sstring::starts_with()`, for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15487
2023-09-21 13:26:49 +03:00