Optimize the various constructors a little, and add an std::from_range_t
constructor.
Minor improvement, so no backports.
Closesscylladb/scylladb#21399
* github.com:scylladb/scylladb:
utils: chunked_vector: add from_range_t constructor
utils: chunked_vector: optimize initializer_list constructor
utils: chunked_vector: iterator constructor: copy spanwise
utils: chunked_vector: reserve for forward iterators, not just random access iterators, on construction
Task manager GET /status method returns two counters that reflect task progress -- total and completed. To make caller reason about their meaning, additionally there's progress_units field next to those counters.
This patch implements this progress report for backup task. The units are bytes, the total counter is total size of files that are being uploaded, and the completed counter is total amount of bytes successfully sent with PUT requests. To get the counters, the client::upload_file() is extended to calculate those.
fixes#20653Closesscylladb/scylladb#21144
* github.com:scylladb/scylladb:
backup_task: Report uploading progress
s3/client: Account upload progress for real
s3/client: Introduce upload_progress
s3: Extract client_fwd.hh
std::ranges::to<> has a little protocol with containers. Implement it
to get optimized construction.
Similar to the iterator pair constructor, if the range's size can be
obtained (even with an O(N) algorithm), favor that to avoid reallocations.
Copy elements spanwise to promote optimization to memcpy when possible.
Instead of copying element-by-element, copy contiguous spans. This
is much faster if the input is a span and the constructor is trivial,
since the whole thing translates to a memcpy.
Make the two branches constexpr to reduce work for the compiler in
optimizing the other branch away.
For a forward iterator, prefer a two pass algorithm to first count
the number of elements, reserver, then copy the elements, to a single
pass algorithm that involves reallocation and copying.
The wrapper object denotes that injection should run a handler and
wait_for_message() on it. Wrapper carries the timeout used to call the
mentioned method. It's currently unused, next patches will start enjoing
it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Otherwise, the read will be considered as on-cpu during promoted index
search, which will severely underutlize the disk because by default
on-cpu concurrency is 1.
I verified this patch on the worst case scenario, where the workload
reads missing rows from a large partition. So partition index is
cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522).
But there is IO during promoted index search (via cached_file).
Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s.
The problem is much less pronounced if there is data file or partition index IO involved
because that IO will signal read concurrency semaphore to invite more concurrency.
Fixes#21325Closesscylladb/scylladb#21323
* github.com:scylladb/scylladb:
utils: cached_file: Mark permit as awaiting on page miss
utils: cached_file: Push resource_unit management down to cached_file
Before upload starts file size is checked, so this is the place that
updates progress.total counter.
Uploading a file happens by reading unit_size bytes from file input
stream and writing the buffer into http body writer stream. This is the
place to update progress.uploaded counter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is a structure with "total" and "uploaded" counters that's passed
by user to client::upload_file() method so that client would update it
with the progress.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is to export some simple structures to users without the need to
include client.hh itself (rather large already)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Most of inject() overloads check if the injection is enabled, then
optionally clear the one-shot one, then do the injection. Everything
but doing the injection is implemented in the enter() method, it's
perfectly worth re-using one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21285
Otherwise, the read will be considered as on-cpu during promoted index
search, which will severely underutlize the disk because by default
on-cpu concurrency is 1.
I verified this patch on the worst case scenario, where the workload
reads missing rows from a large partition. So partition index is
cached (no IO) and there is no data file IO. But there is IO during
promoted index search (via cached_file). Before the patch this
workload was doing 4k req/s, after the patch it does 30k req/s.
The problem is much less pronounced if there is data file or index
file IO involved because that IO will signal read concurrency
semaphore to invite more concurrency.
File uploading code spawns all parts uploading into background. If this "spawning" fails (not the uploading code itself), any fiber that was spawned before is orphaned. It will eventually stop on its own, by while it's alive it may use(-after-free) the do_upload_file object.
Another issue with not handling spawn exception, is that multipart upload object is not aborted in this case. So it's leaked until garbage collector picks it up, which is not critical, but unpleasant.
Closesscylladb/scylladb#21139
* github.com:scylladb/scylladb:
s3/client: Restore indentation after previous patch
s3/client: Catch do_upload_file::upload_part() exceptions
Rewrite
_begin + n > _capacity_end
as
n > _capacity_end - _begin
and then as
n > capacity()
for two reasons:
- The last form is easier to read than the first form.
- Per N4950 (the final C++23 working draft), [expr.add] paragraph 4, the
expression
_begin + n (i.e., P + J)
is defined only if
0 ≤ 0 + n ≤ _capacity_end - _begin (i.e., 0 ≤ i + j ≤ n)
equivalently, only if
_begin ≤ _begin + n ≤ _capacity_end
Therefore, the expression
_begin + n
invokes undefined behavior exactly when we'd expect our check
_begin + n > _capacity_end
to evaluate to true.
gcc and clang have been aggressively equating undefined behavior to "never
happens"; let's prevent that here.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Closesscylladb/scylladb#21213
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.
in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
when compiling date.h, clang 20 complains:
```
/home/kefu/.local/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"Debug\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/build/rust -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/Debug/seastar/gen/include -isystem /usr/include/p11-kit-1 -isystem /home/kefu/dev/scylladb/abseil -g -Og -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -std=c++23 -Werror=unused-result -fstack-clash-protection -fsanitize=address -fsanitize=undefined -DSEASTAR_API_LEVEL=7 -DSEASTAR_BUILD_SHARED_LIBS -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_DEBUG -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEBUG_PROMISE -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_TYPE_ERASE_MORE -DFMT_SHARED -DWITH_GZFILEOP -MD -MT lang/CMakeFiles/lang.dir/Debug/lua.cc.o -MF lang/CMakeFiles/lang.dir/Debug/lua.cc.o.d -o lang/CMakeFiles/lang.dir/Debug/lua.cc.o -c /home/kefu/dev/scylladb/lang/lua.cc
In file included from /home/kefu/dev/scylladb/lang/lua.cc:18:
/home/kefu/dev/scylladb/utils/date.h:836:34: error: identifier '_d' preceded by whitespace in a literal operator declaration is deprecated [-Werror,-Wdeprecated-literal-operator]
836 | CONSTCD11 date::day operator "" _d(unsigned long long d) NOEXCEPT;
| ~~~~~~~~~~~~^~
| operator""_d
```
because, in
[CWG2521](https://wg21.link/CWG2521), it proposes that compiler should
consider
```c++
string operator "" _i18n(const char*, std::size_t); // OK, deprecated
```
as "OK, deprecated".
and Clang implemented this proposal, as it was accepted by C++23. since
scylladb uses C++23 standard. let's remove the space between `"` and
`_` to be more compliant to the C++23 standard and to silence the
warning, which is taken as an error.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21194
std::ranges::to<>() has a little protocol with containers to
allow them to optimize their construction from ranges. Implement it
for small_vector. It optimizes ranges that can have their size determined
quickly, or that can be traversed twice to determine the size by reserving
up front. Single-pass ranges (std::ranges::input_range) use the less
efficient push_back method.
A unit test (which fails without the new constructor) is added.
Closesscylladb/scylladb#21094
in main.cc, we use yaml-cpp library directly. so we are obliged to
detect this library in scylla and link against it instead of relying
on other library to do this. currently, Seastar detects it and pulls
in yaml-cpp for us, but we should not take this for granted and rely
on this.
in this change, we detect and link against yaml-cpp to make this
dependency explicit.
the same applies to the "utils" library.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
`typeof` is a GNU extension, and is part of C23, but it is not included
by C++23.
if we compile the tree with c++23 instead of gnu++23, the compilation
fails like:
```
FAILED: repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o
/home/kefu/.local/bin/clang++ -DSCYLLA_BUILD_MODE=release -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -isystem /home/kefu/dev/scylladb/abseil -isystem /home/kefu/dev/scylladb/seastar/include -isystem /home/kefu/dev/scylladb/build/RelWithDebInfo/seastar/gen/include -isystem /usr/include/p11-kit-1 -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb/build=. -march=westmere -Xclang -fexperimental-assignment-tracking=disabled -mllvm -inline-threshold=2500 -fno-slp-vectorize -std=c++23 -Werror=unused-result -DSEASTAR_API_LEVEL=7 -DSEASTAR_SSTRING -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_LOGGER_TYPE_STDOUT -DFMT_SHARED -DWITH_GZFILEOP -MD -MT repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o -MF repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o.d -o repair/CMakeFiles/repair.dir/RelWithDebInfo/repair.cc.o -c /home/kefu/dev/scylladb/repair/repair.cc
In file included from /home/kefu/dev/scylladb/repair/repair.cc:21:
In file included from /home/kefu/dev/scylladb/service/storage_service.hh:19:
In file included from /home/kefu/dev/scylladb/service/qos/service_level_controller.hh:19:
In file included from /home/kefu/dev/scylladb/auth/service.hh:23:
In file included from /home/kefu/dev/scylladb/auth/permissions_cache.hh:22:
/home/kefu/dev/scylladb/utils/loading_cache.hh:754:66: error: use of undeclared identifier 'typeof'; did you mean 'typeid'?
754 | static_assert(SectionHitThreshold <= std::numeric_limits<typeof(_touch_count)>::max() / 2, "SectionHitThreshold value is too big");
| ^
/home/kefu/dev/scylladb/utils/loading_cache.hh:754:66: error: template argument for template type parameter must be a type
754 | static_assert(SectionHitThreshold <= std::numeric_limits<typeof(_touch_count)>::max() / 2, "SectionHitThreshold value is too big");
| ^~~~~~~~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/limits:311:21: note: template parameter is declared here
311 | template<typename _Tp>
| ^
2 errors generated.
```
in this change, we trade `typeof` for a more standard compliant
`decltype`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21116
This method spawns part uploading in the background, but still may
throw, e.g. preparing http request or claiming memory. In this case any
outstanding part upload fibers are not waited on, and the whole
do_upload_file object can be freed from under their feet. Also, the
multipart upload is not aborted, thus losing track of it until g.c.
happens.
To fix it, catch any exception from upload_part() too, and if it
happens, do what the regular upload_sink would do -- close the gate thus
picking up any outstanding activity that may happen there and abort the
multipart upload.
Indentation is deliberately left broken
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
To avoid depending on two similar libraries (boost ranges and std \<ranges), replace
uses of the former with the latter. This series tackles the utils/ directory.
Code cleanup, no backport.
Closesscylladb/scylladb#20997
* github.com:scylladb/scylladb:
utils: logalloc: replace boost with std
utils: lsa: chunked_managed_vector: replace boost with std
utils: config_file: replace boost with std
utils: loading_cache: replace boost with std
utils: fragment_range: replace boost with std
utils: error_injector: replace boost with std
utils: crc: replace boost for_each with built-in range for
utils: class_registrator: replace boost with std
utils: chunked_vector: replace boost with std
utils: observable: replace boost with std
fixes#20517
Adds `aws_error` which possibly can contain errors from the S3 response body. Adds to the multipart upload completion a check for possible error and issues a retry if the error is retryable
Closesscylladb/scylladb#20518
* github.com:scylladb/scylladb:
test: add complete_multipart_upload completion tests
code: s3 client error handling
code: add response parsing and error handling to the complete_multipart_upload
code: Introduce AWS errors parsing
Unfortunately, the replacement for boost::range::join(),
std::views::concat(), is in C++26 (and not implemented in libstdc++ 14).
We use array/transform/join to simulate it.
unconst is a small help that converts a const iterator to a non-const
iterator with the help of the container. Currently it is using the
boost iterator/range libraries.
Convert it to <ranges> as part of an effort to standardize on a single
range library. Its only user in mutation_partition is converted as well.
Due to more iteroperability problems between <range> and boost, some
calls to boost::adaptors::reversed have to be converted as well.
The <ranges> library checks that an iterator's operator++() returns
a reference to the same type. intrusive_btree's iterator do not; instead
they return some base type and rely on implicit conversion to the real
iterator type. This causes interoperatibility problems with <range>.
Fix by using the CRTP pattern to inform iterator_base about what
type we really are, and cast to it. Enforce it with static_assert.
Note we can't static_assert in class scope since it is checked too
early and fails. Checking in function scope delays the check.
these unused includes are identified by clang-include-cleaner.
after auditing the source files, all of the reports have been
confirmed.
please note, since we have `using seastar::shared_ptr` in
`seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>`
unnecessary if we don't need the full definition of `seastar::shared_ptr`.
so, in this change, all the unused includes are removed.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#20963
* github.com:scylladb/scylladb:
.github: add db to iwyu's CLEANER_DIR
db: remove unused includes
when compiling with clang-19 and the standard library from GCC-14.2,
we have:
```
/usr/bin/cmake -E __run_co_compile --tidy="clang-tidy;--checks=-*,bugprone-use-after-move;--extra-arg-before=--driver-mode=g++" --source=/__w/scylladb/scylladb/utils/bloom_filter.cc -- /usr/bin/clang++ -DBOOST_REGEX_DYN_LINK -DBOOST_REGEX_NO_LIB -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -I/__w/scylladb/scylladb -I/__w/scylladb/scylladb/seastar/include -I/__w/scylladb/scylladb/build/seastar/gen/include -I/__w/scylladb/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++23 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/__w/scylladb/scylladb/build=. -march=wes
Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:81:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error]
81 | filter_ptr create_filter(int hash, large_bitset&& bitset, filter_format format) {
| ^
Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:82:12: error: no viable conversion from returned value of type '__detail::__unique_ptr_t<murmur3_bloom_filter>' (aka 'unique_ptr<utils::filter::murmur3_bloom_filter>') to function return type 'int' [clang-diagnostic-error]
82 | return std::make_unique<murmur3_bloom_filter>(hash, std::move(bitset), format);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:85:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error]
85 | filter_ptr create_filter(int hash, int64_t num_elements, int buckets_per, filter_format format) {
| ^
Error: /__w/scylladb/scylladb/utils/bloom_filter.cc:86:12: error: no viable conversion from returned value of type '__detail::__unique_ptr_t<murmur3_bloom_filter>' (aka 'unique_ptr<utils::filter::murmur3_bloom_filter>') to function return type 'int' [clang-diagnostic-error]
86 | return std::make_unique<murmur3_bloom_filter>(hash, large_bitset(get_bitset_size(num_elements, buckets_per)), format);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Error: /__w/scylladb/scylladb/utils/bloom_filter.hh:93:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error]
93 | filter_ptr create_filter(int hash, large_bitset&& bitset, filter_format format);
| ^
Error: /__w/scylladb/scylladb/utils/bloom_filter.hh:94:1: error: unknown type name 'filter_ptr' [clang-diagnostic-error]
94 | filter_ptr create_filter(int hash, int64_t num_elements, int buckets_per, filter_format format);
| ^
Error: /__w/scylladb/scylladb/utils/i_filter.hh:17:25: error: no template named 'unique_ptr' in namespace 'std' [clang-diagnostic-error]
17 | using filter_ptr = std::unique_ptr<i_filter>;
| ~~~~~^
Error: /__w/scylladb/scylladb/utils/i_filter.hh:54:12: error: unknown type name 'filter_ptr' [clang-diagnostic-error]
54 | static filter_ptr get_filter(int64_t num_elements, double max_false_pos_prob, filter_format format);
| ^
4 warnings and 8 errors generated.
```
apparently, the definition of `std::unique_ptr` is missing where it is
used. so let's include `<memory>`, so that `i_filter.hh` is more
self-contained.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20971
Commit aa1270a00c changed most uses
of `assert` in the codebase to `SCYLLA_ASSERT`.
But the comment fixed in this patch is talking specifically about
`assert`, and shouldn't have been changed. It doesn't make sense
after the change.
Closesscylladb/scylladb#20967
Currently, `cached_file::stream` (currently used only by index_reader,
to read index pages), works as follows.
Assume that the caller requested a read of the range [pos, pos + size).
Then:
- If the first page of the requested range is uncached,
the entire [pos, pos + size) range is read from disk (even if some
later pieces of it are cached), the resulting pages are added to the cache,
and the read completes (most likely) from the cached pages.
- If the first page of the read is cached, then the rest of the read
is handled page-by-page, in a sequential loop, serving each page
either from cache (if present) or from disk.
For example, assume that pages 0, 1, 2, 3, 4 are requested.
If exactly pages 1, 2 are cached, then `stream` will read the entire [0, 4] range
from disk and insert the missing 0, 3, 4, and then it will continue serving the
read from cache.
If exactly pages 0 and 3 are cached, then it will serve 0 from cache,
then it will read 1 from disk and insert it into cache,
then it will read 2 from disk and insert it into cache,
then it will serve 3 from cache,
then it will read 4 from disk and insert it into cache.
If exactly the first page is cached, a 128 kiB read turns
into 31 I/O sequential read ops.
This is weird, and doesn't look intended. In one case, we are reading even pages
we already have, just to avoid fragmenting the read, and in the other case
we are reading pages one-by-one (sequentially!) even if they are neighbours.
I'm not sure if cached_file should minimize IOPS or byte throughput,
but the current state is surely suboptimal. Even if its read strategy
is somehow optimal, it should still at least coalesce contiguous reads
and perform the non-contiguous reads in parallel.
This patch leans into minimizing IOPS. After the patch, we serve
as many front pages from the cache as we can, but when we see
an uncached page, we read the entire remainder of the read from disk.
As if we trimmed the read request by the longest cached prefix,
and then performed the rest using the logic from before the patch.
For example, if exactly pages 0 and 3 are cached,
then we serve 0 from cache,
then we read [1, 4] from disk and insert everything into cache.
For partially-cached files, this will result in more bytes read
from disk, but less IOPS. This might be a bad thing. But if so,
then we should lean the other way in a more explicit and efficient
way than we currently do.
Closesscylladb/scylladb#20935
these unused includes are identified by clang-include-cleaner.
after auditing the source files, all of the reports have been
confirmed.
please note, since we have `using seastar::shared_ptr` in
`seastarx.h`, this renders `#include <seastar/core/shared_ptr.hh>`
unnecessary if we don't need the full definition of `seastar::shared_ptr`.
so, in this change, all the unused includes are removed. but there are
some headers which are actually used, while still being identified by
this tool. these includes are marked with "IWYU pragma: keep".
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
'static inline' is always wrong in headers - if the same header is
included multiple times, and the function happens not to be inlined,
then multiple copies of it will be generated.
Fix by mechanically changing '^static inline' to 'inline'.
Handle the `finalize_upload` possible exception to abort the upload (which also can throw) and show the right error originated from the `finalize_upload`
Instead of ignoring the response for multipart upload completion start parsing it and look for a possible errors in the response body. If the error is found throw an exception