this change extracts the storage class and its derived classes
out into storage.cc and storage.hh. for couple reasons:
- for better readability. the sstables.hh is over 1005 lines.
and sstables.cc 3602 lines. it's a little bit difficult to figure
out how the different parts in these sources interact with each
other. for instance, with this change, it's clear some of helper
functions are only used by file_system_storage.
- probably less inter-source dependency. by extracting the sources
files out, they can be compiled individually, so changing one .cc
file does not impact others. this could speed up the compilation
time.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
since #13452, we switched most of the caller sites from std::regex
to boost::regex. in this change, all occurences of `#include <regex>`
are dropped unless std::regex is used in the same source file.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13765
functinoality wise, `uint64_t_tri_compare()` is identical to the
three-way comparison operator, so no need to keep it. in this change,
it is dropped in favor of <=>.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13794
The code was incorrectly passing a data_value of type bytes due to
implicit conversion of the result of serialize() (bytes_opt) to a
data_value object of type bytes_type via:
data_value(std::optional<NativeType>);
mutation::set_static_cell() accepts a data_value object, which is then
serialized using column's type in abstract_type::decompose(data_value&):
bytes b(bytes::initialized_later(), serialized_size(*this, value._value));
auto i = b.begin();
value.serialize(i);
Notice that serialized_size() is taken from the column type, but
serialization is done using data_value's type. The two types may have
a compatible CQL binary representation, but may differ in native
types. serialized_size() may incorrectly interpret the native type and
come up with the wrong size. If the size is too smaller, we end up with
stack or heap corruption later after serialize().
For example, if the column type is utf8 but value holds bytes, the
size will be wrong because even though both use the basic_sstring
type, they have a different layout due to max_size (15 vs 31).
Fixes#13717Closes#13787
When requesting memory via `reader_permit::request_memory()`, the
requested amount is added to `_requested_memory` member of the permit
impl. This is because multiple concurrent requests may be blocked and
waiting at the same time. When the requests are fulfilled, the entire
amount is consumed and individual requests track their requested amount
with `resource_units` to release later.
There is a corner-case related to this: if a reader permit is registered
as inactive while it is waiting for memory, its active requests are
killed with `std::bad_alloc`, but the `_requested_memory` fields is not
cleared. If the read survives because the killed requests were part of
a non-vital background read-ahead, a later memory request will also
include amount from the failed requests. This extra amount wil not be
released and hence will cause a resource leak when the permit is
destroyed.
Fix by detecting this corner case and clearing the `_requested_memory`
field. Modify the existing unit test for the scenario of a permit
waiting on memory being registered as inactive, to also cover this
corner case, reproducing the bug.
Fixes: #13539Closes#13679
For some reason Scylla crashes on `aarch64` in release mode when calling
`fmt::format` in `raft_removenode` and `raft_decommission`. E.g. on this
line:
```
group0_command g0_cmd = _group0->client().prepare_command(std::move(change), guard, fmt::format("decomission: request decomission for {}", raft_server.id()));
```
I found this in our configure.py:
```
def get_clang_inline_threshold():
if args.clang_inline_threshold != -1:
return args.clang_inline_threshold
elif platform.machine() == 'aarch64':
# we see miscompiles with 1200 and above with format("{}", uuid)
# also coroutine miscompiles with 600
return 300
else:
return 2500
```
but reducing it to `0` didn't help.
I managed to get the following backtrace (with inline threshold 0):
```
void boost::intrusive::list_impl<boost::intrusive::mhtraits<seastar::thread_context, boost::intrusive::list_member_hook<>, &seastar::thread_context::_all_link>, unsigned long, false, void>::clear_and_dispose<boost::intrusive::detail::null_disposer>(boost::intrusive::detail::null_disposer) at /usr/include/boost/intrusive/list.hpp:751
(inlined by) boost::intrusive::list_impl<boost::intrusive::mhtraits<seastar::thread_context, boost::intrusive::list_member_hook<>, &seastar::thread_context::_all_link>, unsigned long, false, void>::clear() at /usr/include/boost/intrusive/list.hpp:728
(inlined by) ~list_impl at /usr/include/boost/intrusive/list.hpp:255
void fmt::v9::detail::buffer<wchar_t>::append<wchar_t>(wchar_t const*, wchar_t const*) at ??:?
void fmt::v9::detail::vformat_to<char>(fmt::v9::detail::buffer<char>&, fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<std::conditional<std::is_same<fmt::v9::type_identity<char>::type, char>::value, fmt::v9::appender, std::back_insert_iterator<fmt::v9::detail::buffer<fmt::v9::type_identity<char>::type> > >::type, fmt::v9::type_identity<char>::type> >, fmt::v9::detail::locale_ref) at ??:?
fmt::v9::vformat[abi:cxx11](fmt::v9::basic_string_view<char>, fmt::v9::basic_format_args<fmt::v9::basic_format_context<fmt::v9::appender, char> >) at ??:?
std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > fmt::v9::format<utils::tagged_uuid<raft::server_id_tag>&>(fmt::v9::basic_format_string<char, fmt::v9::type_identity<utils::tagged_uuid<raft::server_id_tag>&>::type>, utils::tagged_uuid<raft::server_id_tag>&) at /usr/include/fmt/core.h:3206
(inlined by) service::storage_service::raft_removenode(utils::tagged_uuid<locator::host_id_tag>) at ./service/storage_service.cc:3572
```
Maybe it's a bug in `fmt` library?
In any case replacing the call with `::format` (i.e. `seastar::format`
from seastar/core/print.hh) helps.
Do it for the entire file for consistency (and avoiding this bug).
Also, for the future, replace `format` calls with `::format` - now it's
the same thing, but the latter won't clash with `std::format` once we
switch to libstdc++13.
Fixes#13707Closes#13711
This is a follow-up to #13399, the patch
addresses the issues mentioned there:
* linesep can be split between blocks;
* linesep can be part of UTF-8 sequence;
* avoid excessively long lines, limit to 256 chars;
* the logic of the function made simpler and more maintainable.
Closes#13427
* github.com:scylladb/scylladb:
pylib_test: add tests for read_last_line
pytest: add pylib_test directory
scylla_cluster.py: fix read_last_line
scylla_cluster.py: move read_last_line to util.py
to avoid the FTBFS after we bump up the Seastar submodule which bumped up its API level to v7. and API v7 is a breaking change. so, in order to unbreak the build, we have to hardwire the API level to 6. `configure.py` also does this.
Closes#13780
* github.com:scylladb/scylladb:
build: cmake: disable deprecated warning
build: cmake: use Seastar API level 6
we introduced the linkage to Boost::unit_test_framework in
fe70333c19, this library is used by
test/lib/test_utils.cc, so update CMake accordingly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13781
This is a follow-up to #13399, the patch
addresses the issues mentioned there:
* linesep can be split between blocks;
* linesep can be part of UTF-8 sequence;
* avoid excessively long lines, limit to 512 chars;
* the logic of the function made simpler and more
maintainable.
There are two of them currently with slightly different declaration. Better to leave only one.
Closes#13772
* github.com:scylladb/scylladb:
test: Deduplicate test::filename() static overload
test: Make test::filename return fs::path
The method in question suffers from scylladb/seastar#1298. The PR fixes it and makes a bit shorter along the way
Closes#13776
* github.com:scylladb/scylladb:
sstable: Close file at the end
sstables: Use read_entire_stream_cont() helper
This commit changes the configuration in the conf.py
file to make branch-5.2 the latest version and
remove it from the list of unstable versions.
As a result, the docs for version 5.2 will become
the default for users accessing the ScyllaDB Open Source
documentation.
This commit should be merged as soon as version 5.2
is released.
Closes#13681
One is unused, the other one is not really required in public
Closes#13771
* github.com:scylladb/scylladb:
file_writer: Remove static make() helper
sstable: Use toc_filename() to print TOC file path
The test case consists of two internal sub-test-cases. Making them
explicit kills three birds with one stone
- improves parallelizm
- removes env's tempdir wiping
- fixes code indentation
refs: #12707
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13768
since Seastar now deprecates a bunch of APIs which accept io_priority_class,
we started to have deprecated warnings. before migrating to V7 API,
let's disable this warning.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
to avoid the FTBFS after we bump up the Seastar submodule
which bumped up its API level to v7. and API v7 is a breaking
change. so, in order to unbreak the build, we have to hardwire
the API level to 6. `configure.py` also does this.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
* seastar 02d5a0d7c...f94b1bb9c (12):
> Merge 'Unify CPU scheduling groups and IO priority classes' from Pavel Emelyanov
> scripts: addr2line: relax regular expression for matching kernel traces
> add dirs for clangd to .gitignore
> http::client: Log failed requests' body
> build: always quote the ENVIRONMENT with quotes
> exception_hacks: Change guard check order to work around static init fail
> shared_future: remove support for variadic futures
> iotune: Don't close file that wasn't opened
Fixes#13439
> Merge 'Relax per tick IO grab threshold' from Pavel Emelyanov
> future: simplify constraint on then() a little
> Merge 'coroutine: generator: initialize const member variable and enable generator tests' from Kefu Chai
> future: drop libc++ std::tuple compatibility hack
Closes#13777
The thing is than when closing file input stream the underlying file is
not .close()-d (see scylladb/seastar#1298). The remove_by_toc_name() is
buggy in this sense. Using with_closeable() fixes it and makes the code
shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The remove_by_toc_name() wants to read the whole stream into a sstring.
There's a convenience helper to facilitate that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In https://github.com/scylladb/scylladb/pull/13482 we renamed the reader permit states to more descriptive names. That PR however only covered only the states themselves and their usages, as well as the documentation in `docs/dev`.
This PR is a followup to said PR, completing the name changes: renaming all symbols, names, comments etc, so all is consistent and up-to-date.
Closes#13573
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: misc updates w.r.t. recent permit state name changes
reader_concurrency_semaphore: update permit members w.r.t. recent permit state name changes
reader_concurrency_semaphore: update RAII state guard classes w.r.t. recent permit state name changes
reader_concurrency_semaphore: update API w.r.t. recent permit state name changes
reader_concurrency_semaphore: update stats w.r.t. recent permit state name changes
e2c9cdb576 moved the validation of the range tombstone change to the place where it is actually consumed, so we don't attempt to pass purged or discarded range tombstones to the validator. In doing so however, the validate pass was moved after the consume call, which moves the range tombstone change, the validator having been passed a moved-from range tombstone. Fix this by moving he validation to before the consume call.
Refs: #12575Closes#13749
* github.com:scylladb/scylladb:
test/boost/mutation_test: add sanity test for mutation compaction validator
mutation/mutation_compactor: add validation level to compaction state query constructor
mutation/mutation_compactor: validate range tombstone change before it is moved
Current S3 client was tested over minio and it takes few more touches to work with amazon S3.
The main challenge here is to support singed requests. The AWS S3 server explicitly bans unsigned multipart-upload requests, which in turn is the essential part of the sstables S3 backend, so we do need signing. Signing a request has many options and requirements, one of them is -- request _body_ can be or can be not included into signature calculations. This is called "(un)signed payload". Requests sent over plain HTTP require payload signing (i.e. -- request body should be included into signature calculations), which can a bit troublesome, so instead the PR uses unsigned payload (i.e. -- doesn't include the request body into signature calculation, only necessary headers and query parameters), but thus also needs HTTPS.
So what this set does is makes the existing S3 client code sign requests. In order to sign the request the code needs to get AWS key and secret (and region) from somewhere and this somewhere is the conf/object_storage.yaml config file. The signature generating code was previously merged (moved from alternator code) and updated to suit S3 client needs.
In order to properly support HTTPS the PR adds special connection factory to be used with seastar http client. The factory makes DNS resolving of AWS endpoint names and configures gnutls systemtrust.
fixes: #13425Closes#13493
* github.com:scylladb/scylladb:
doc: Add a document describing how to configure S3 backend
s3/test: Add ability to run boost test over real s3
s3/client: Sign requests if configured
s3/client: Add connection factory with DNS resolve and configurable HTTPS
s3/client: Keep server port on config
s3/client: Construct it with config
s3/client: Construct it with sstring endpoint
sstables: Make s3_storage with endpoint config
sstables_manager: Keep object storage configs onboard
code: Introduce conf/object_storage.yaml configuration file
Commit ecbd112979
`distributed_loader: reshard: consider sstables for cleanup`
caused a regression in loading new sstables using the `upload`
directory, as seen in e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/230/testReport/migration_test/TestMigration/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split000___test_migrate_sstable_without_compression_3_0_md_/
```
query = "SELECT COUNT(*) FROM cf"
statement = SimpleStatement(query)
s = self.patient_cql_connection(node, 'ks')
result = list(s.execute(statement))
> assert result[0].count == expected_number_of_rows, \
"Expected {} rows. Got {}".format(expected_number_of_rows, list(s.execute("SELECT *
FROM ks.cf")))
E AssertionError: Expected 1 rows. Got []
E assert 0 == 1
E +0
```
The reason for the regression is that the call to `do_for_each_sstable` in `collect_all_shared_sstables` to search for sstables that need cleanup caused the list of sstables in the sstable directory to be moved and cleared.
parallel_for_each_restricted moves the container passed to it into a `do_with` continuation. This is required for parallel_for_each_restricted.
However, moving the container is destructive and so, the decision whether to move or not needs to be the caller's, not the callee.
This patch changes the signature of parallel_for_each_restricted to accept a container rather than a rvalue reference, allowing the callers to decide whether to move or not.
Most callers are converted to move the container, except for `do_for_each_sstable` that copies `_unshared_local_sstables`, allowing callers to call `dir.do_for_each_sstable` multiple times without moving the list contents.
Closes#13526
* github.com:scylladb/scylladb:
sstable_directory: coroutinize parallel_for_each_restricted
sstable_directory: parallel_for_each_restricted: use std::ranges for template definition
sstable_directory: parallel_for_each_restricted: do not move container
There are two of them currently, both returning fs::path for sstable
components. One is static and can be dropped, callers are patched to use
the non-static one making the code tiny bit shorter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The sstable::filename() is private and is not supposed to be used as a
path to open any files. However, tests are different and they sometimes
know it is. For that they use test wrapper that has access to private
members and may make assumptions about meaning of sstable::filename().
Said that, the test::filename() should return fs::path, not sstring.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Commit 1cb95b8cf caused a small regression in the debug printer.
After that commit, range tombstones are printed to stdout,
instead of the target stream.
In practice, this causes range tombstones to appear in test logs
out of order with respect to other parts of the debug message.
Fix that.
Closes#13766
The sstable::write_toc() gets TOC filename from file writer, while it
can get it from itself. This makes the file_writer::get_filename()
private and actually improves logging, as the writer is not required
to have the filename onboard, while sstable always has it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
storage_service uses raft_group0 but the during shutdown the later is
destroyed before the former is stopped. This series move raft_group0
destruction to be after storage_service is stopped already. For the
move to work some existing dependencies of raft_group0 are dropped
since they do not really needed during the object creation.
Fixes#13522
In case an sstable unit test case is run individually, it would fail
with exception saying that S3_... environment is not set. It's better to
skip the test-case rather than fail. If someone wants to run it from
shell, it will have to prepare S3 server (minio/AWS public bucket) and
provide proper environment for the test-case.
refs: #13569
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13755
Raft replication doesn't guarantee that all replicas see
identical Raft state at all times, it only guarantees the
same order of events on all replicas.
When comparing raft state with gossip state on a node, first
issue a read barrier to ensure the node has the latest raft state.
To issue a read barrier it is sufficient to alter a non-existing
state: in order to validate the DDL the node needs to sync with the
leader and fetch its latest group0 state.
Fixes#13518 (flaky topology test).
Closes#13756
raft_group0 does not really depends on cdc::generation_service, it needs
it only transiently, so pass it to appropriate methods of raft_group0
instead of during its creation.
Commit ecbd112979
`distributed_loader: reshard: consider sstables for cleanup`
caused a regression in loading new sstables using the `upload`
directory, as seen in e.g. https://jenkins.scylladb.com/view/master/job/scylla-master/job/dtest-daily-release/230/testReport/migration_test/TestMigration/Run_Dtest_Parallel_Cloud_Machines___FullDtest___full_split000___test_migrate_sstable_without_compression_3_0_md_/
```
query = "SELECT COUNT(*) FROM cf"
statement = SimpleStatement(query)
s = self.patient_cql_connection(node, 'ks')
result = list(s.execute(statement))
> assert result[0].count == expected_number_of_rows, \
"Expected {} rows. Got {}".format(expected_number_of_rows, list(s.execute("SELECT * FROM ks.cf")))
E AssertionError: Expected 1 rows. Got []
E assert 0 == 1
E +0
E -1
```
The reason for the regression is that the call to `do_for_each_sstable`
in `collect_all_shared_sstables` to search for sstables that need
cleanup caused the list of sstables in the sstable directory to be
moved and cleared.
parallel_for_each_restricted moves the container passed to it
into a `do_with` continuation. This is required for
parallel_for_each_restricted.
However, moving the container is destructive and so,
the decision whether to move or not needs to be the
caller's, not the callee.
This patch changes the signature of parallel_for_each_restricted
to accept a lvalue reference to the container rather than a rvalue reference,
allowing the callers to decide whether to move or not.
Most callers are converted to move the container, as they effectively do
today, and a new method, `filter_sstables` was added for the
`collect_all_shared_sstables` us case, that allows the `func` that
processes each sstable to decide whether the sstable is kept
in `_unshared_local_sstables` or not.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
it turns out the only places where we have compiler warnings of -W-parentheses-equality is the source code generated by ANTLR. strictly speaking, this is valid C++ code, just not quite readable from the hygienic point of view. so let's enable this warning in the source tree, but only disable it when compiling the sources generated by ANTLR.
please note, this warning option is supported by both GCC and Clang, so no need to test if it is supported.
for a sample of the warnings, see:
```
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality]
if ( (LA4_0 == '$'))
~~~~~~^~~~~~
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: note: remove extraneous parentheses around the comparison to silence this warning
if ( (LA4_0 == '$'))
~ ^ ~
```
Closes#13762
* github.com:scylladb/scylladb:
build: only apply -Wno-parentheses-equality to ANTLR generated sources
compaction: disambiguate format_to()
it turns out the only places where we have compiler warnings of
-W-parentheses-equality is the source code generated by ANTLR. strictly
speaking, this is valid C++ code, just not quite readable from the
hygienic point of view. so let's enable this warning in the source tree,
but only disable it when compiling the sources generated by ANTLR.
please note, this warning option is supported by both GCC and Clang,
so no need to test if it is supported.
for a sample of the warnings, see:
```
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: error: equality comparison with extraneous parentheses [-Werror,-Wparentheses-equality]
if ( (LA4_0 == '$'))
~~~~~~^~~~~~
/home/kefu/dev/scylladb/build/cmake/cql3/CqlLexer.cpp:21752:38: note: remove extraneous parentheses around the comparison to silence this warning
if ( (LA4_0 == '$'))
~ ^ ~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
we should always qualify `format_to` with its namespace. otherwise
we'd have following failure when compiling with libstdc++ from GCC-13:
```
/home/kefu/dev/scylladb/compaction/table_state.hh:65:16: error: call to 'format_to' is ambiguous
return format_to(ctx.out(), "{}.{} compaction_group={}", s->ks_name(), s->cf_name(), t.get_group_id());
^~~~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13760
Support the AWS_S3_EXTRA environment vairable that's :-split and the
respective substrings are set as endpoint AWS configuration. This makes
it possible to run boost S3 test over real S3.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
If the endpoint config specifies AWS key, secret and region, all the
S3 requests get signed. Signature should have all the x-amz-... headers
included and should contain at least three of them. This patch includes
x-ams-date, x-amz-content-sha256 and host headers into the signing list.
The content can be unsigned when sent over HTTPS, this is what this
patch does.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Existing seastar's factories work on socket_address, but in S3 we have
endpoint name which's a DNS name in case of real S3. So this patch
creates the http client for S3 with the custom connection factory that
does two things.
First, it resolves the provided endpoint name into address.
Second, it loads trust-file from the provided file path (or sets system
trust if configured that way).
Since s3 client creation is no-waiting code currently, the above
initialization is spawned in afiber and before creating the connection
this fiber is waited upon.
This code probably deserves living in seastar, but for now it can land
next to utils/s3/client.cc.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently the code temporarily assumes that the endpoint port is 9000.
This is what tests' local minio is started with. This patch keeps the
port number on endpoint config and makes test get the port number from
minio starting code via environment.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Similar to previous patch -- extent the s3::client constructor to get
the endpoint config value next to the endpoint string. For now the
configs are likely empty, but they are yet unused too.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>