Fixes#22106
Moves the shared compress components to sstables, and rename to
match class type.
Adjust includes, removing redundant/unneeded ones where possible.
Closesscylladb/scylladb#25103
Convert all necessary methods to be awaitable. Start using `make_data_or_index_source`
when creating data_source for data and index components.
For proper working of compressed/checksummed input streams, start passing
stream creator functors to `make_(checksummed/compressed)_file_(k_l/m)_format_input_stream`.
The latter class is invented to let tests access private fields of an
sstable (mostly methods). The former is in fact an extended version of
that also does some checks. Howerver, they don't inherit from each
other, and the sstable_assertions partially duplicates some funtionality
of the test one.
Add the inheritance, remove the duplicated methods from the child class,
update the callers (the test class returns future<>s, the assertions one
"knows" it runs in seastar thread) and marm sstable::read_toc() private.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23697
Cleanup patch. After we moved the ownership of compressors
to sstables, compressor objects never have shared lifetime.
`unique_ptr` is more appropriate for them than `shared_ptr` now.
(And besides expressing the intent better, using `unique_ptr`
prevents an accidental cross-shard `shared_ptr` copy).
SSTable readers and writers use `compressor` objects to compress and
decompress chunks of SSTable data files.
`compressor` objects are read-only, so only one of them is needed
for each SSTable. Before this commit, each reader and writer has
its own `compressor` object. This isn't necessary, but it's okay.
But later in this series it will stop being okay, because the creation
of a `compressor` will become an expensive cross-shard
operation (because it might require sharing a compression dictionary
from another shard). So we have to adjust the code so that there is
only once `compressor` per sstable, not one per reader/writer.
We stuff the ownership of this compressor into `sstable::compression`.
To make the ownership clear, we remove `compression_ptr` shared
pointers from readers and writers, and make them access the
compressor via the `sstable::compression` instead.
Replace explicit `statistics` type with `auto` in sstable_test to
resolve name collision. This addresses ambiguity introduced by commit
87c221cb which added `struct statistics` in
`seastar/include/seastar/net/api.hh`, conflicting with the existing
definition in `scylladb/sstables/types.hh` when the `seastar` namespace
is opened.
The `auto` keyword avoids the need to explicitly reference either type,
cleanly resolving the collision while maintaining functionality.
This change prepares for the upcoming change to bump up seastar
submodule.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23249
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.
in this change, we:
- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
`std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
to feed `std::views::transform()` a view range.
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
limitations:
there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21700
No other usages of the former helper other than immediatelly followed by
the latter, no point in keepint it around.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
No other usages of the former helper other than immediatelly followed by
the latter, no point in keepint it around.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The statistics_rewrite test case copies an sstable from resources two
times:
- first time -- explicitly by listing resource components and copying
files to the test temp dir
- second time -- implicitly, by calling create_links() linking copied
files by new set in the staging/ subdirectory
The 2nd step is not needed and the history of changes justifies that.
The test itself appeared with 70b793e4d3 and it only contained the 2nd
"copying" -- test linked files from resource directory and then worked
in the newly created set.
Later, commit 59c57861ae added the first step and copied the files
from resource into test temp dir. At this point linking copied files
because pointless, but was preserved. Let's remove it now.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21097
This PR builds upon the PR for checksum validation (#20207) to further enhance scrub's corruption detection capabilities by validating digests as well. The digest (full checksum) is the checksum over the entire data, as opposed to per-chunk checksums which apply to individual chunks. Until now, digests were not examined on any code paths. This PR integrates digest checking into the compressed/checksummed data sources as an optional feature and enables it only through the validation path of the sstable layer (`sstable::validate()`). The validation path is used by the following tools:
* scrub in validate mode
* `sstable validate`
All other reads, including normal user reads, are unaffected by this change.
The PR consists of:
* Extensions to the compressed and checksummed data sources to support digest checking. The data sources receive the expected digest as a parameter and calculate the actual digest incrementally across multiple get() calls. The check happens on the get() call that reaches EOF and results to an exception if the digest is invalid. A digest check requires reading the whole file range. Therefore, a partial read or skip() is treated as an internal error.
* A new shareable digest component loaded on demand by the validation code. No lifecycle management.
* Grouping of old scrub/validate tests for compressed and uncompressed SSTables to reduce code duplication.
* scrub/validate tests for SSTables with valid checksums but invalid digests, and SSTables with no digests at all.
* scrub/validate tests with 3.x Cassandra SSTables to ensure compatibility.
Refs #19058.
New feature, no backport is needed.
Closesscylladb/scylladb#20720
* github.com:scylladb/scylladb:
test: Test scrub/validate with SSTables from Cassandra
compaction: Make quarantine optional for perform_sstable_scrub()
test: Make random schema optional in scrub_test_framework
test: Add tests for invalid digests
test: Merge scrub/validate tests for compressed and uncompressed cases
sstables: Verify digests on validation path
sstables: Check if digest component exists
sstables: Add digest in the SSTable components
sstables: Add digest check in compressed data source
sstables: Add digest check in checksummed data source
Following the addition of digest check in the checksummed data source,
add the same feature to the compressed data source as well. This ensures
consistent behavior across any type of SSTable.
This is added as an optional feature so that we can preserve the current
behavior, that is verify only the per-chunk checksums during normal user
reads. To ensure zero cost at runtime when disabled, we introduce the
on/off switch as a template parameter.
The digest calculation for compressed SSTables depends on the SSTable
format, hence the new template argument for the checksum mode. This is
consistent with the compressed data sink.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
That's the most mysterious wrapper in this set as it doesn't need
sstable itself at all, it just duplicates the existing non-class
function out there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Same as previous patch -- callers can come with const reference to
summary, so they can live with existing public sstable::get_summary().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Just call the public sstable::get_statistics(). The callers would get
const reference on it, but they don't need more than that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The wrapper just changes the order of arguments for a public method.
Drop it, and call the wrapee directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Everything, but perf test is straightforward switch.
The perf-test generated regular columns dynamically via vector, with
builder the vector goes away.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All lambdas passed to test_using_reusable_sst() conform to the prototype
void (test_env&, sstable_ptr)
All lambdas passed to test_using_reusable_sst_returning() conform to the
prototype
NON_VOID (test_env&, sstable_ptr)
The common parameter list of both prototypes can be expressed with the
concept
std::invocable<test_env&, sstable_ptr>
Once a "Func" template parameter (i.e., function type) satisfying this
concept is taken, then "Func"'s void or non-void return type can be
commonly expressed with
std::invoke_result_t<Func, test_env&, sstable_ptr>
In turn, test_env::do_with_async_returning<...> can be instantiated with
this return type, even if it happens to be "void".
([stmt.return] specifies, "[a] return statement with an operand of type
void shall be used only in a function that has a cv void return type",
meaning that
return func(env)
will do the right thing in the body of
test_env::do_with_async_returning<void>().)
Merge test_using_reusable_sst() and test_using_reusable_sst_returning()
into one. Preserve the function name from the former, and the
test_env::do_with_async_returning<...>() call from the latter.
Suggested-by: Avi Kivity <avi@scylladb.com>
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Closesscylladb/scylladb#20090
The large_partition_schema() call returns a copy of the "schema_ptr"
object that points to an effectively statically initialized thread_local
"schema" object. The large_partition_schema() call has no bearing on
whether, or when, the "schema" object is constructed, and has no side
effects (other than copying an "lw_shared_ptr" object). Furthermore, the
return value of large_partition_schema() is not used for anything in
promoted_index_read().
This redundant call seems to date back to original commit 3dd079fb7a
("tests: add test for reading parts of a large partition", 2016-08-07).
Remove the call and the variable.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
All lambdas passed to test_using_reusable_sst() and
test_using_reusable_sst_returning() have been converted to future::get()
calls (according to the seastar::thread context that they are now executed
in). None of the lambdas return futures anymore; they all directly return
void or non-void. Therefore, drop futurize_invoke(...).get() around the
lambda invocations in test_using_reusable_sst*().
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
For better readability, replace the future::then() chaining (and the
associated manual fiddling with object lifecycles) with future::get() (and
rely on seastar::thread's stack). We're already in seastar::thread
context.
Similarly, replace the future::finally() underlying with_closeable() with
deferred_close(); with the assumption that mutation_reader::close() never
fails (and is therefore safe to call in the "deferred_close" destructor).
This is actually guaranteed, as mutation_reader::close() is marked
"noexcept".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
According to early patch "test/sstable: rewrite test_using_reusable_sst()
with async" in this series, lambdas passed to test_using_reusable_sst()
are invoked:
(a) less importantly here, in seastar::thread context,
(b) more importantly here, futurized (temporarily so).
The test case not_find_key_composite_bucket0() doesn't chain futures;
therefore it needs no conversion to future::get() for purpose (a);
however, we can eliminate its empty future return. Fact (b) will cover for
that, until all such lambdas are converted to direct "void" returns (at
which point we can remove the futurization from
test_using_reusable_sst()).
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
For better readability, replace future::then() chaining with
future::get(). (We're already in seastar::thread context.)
This patch is best viewed with "git show -b".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
According to early patch "test/sstable: rewrite test_using_reusable_sst()
with async" in this series, lambdas passed to test_using_reusable_sst()
are invoked:
(a) less importantly here, in seastar::thread context,
(b) more importantly here, futurized (temporarily so).
The test cases find_key_map(), find_key_set(), find_key_list(),
find_key_composite(), all_in_place() don't chain futures; therefore they
need no conversion to future::get() for purpose (a); however, we can
eliminate their empty future returns. Fact (b) will cover for that, until
all such lambdas are converted to direct "void" returns (at which point we
can remove the futurization from test_using_reusable_sst()).
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
All three lambdas passed to write_and_validate_sst() now use future::get()
rather than future::then() chaining; in other words, the future::get()
calls inside all these seastar::thread contexts have been pushed down to
the lambdas. Change all these lambdas' return types from future<> to void.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
The lambda passed to write_and_validate_sst() already runs in
seastar::thread context; replace future::then() chaining with
future::get() calls.
We're going to eliminate the trailing "return make_ready_future<>()"
later.
This patch is best viewed with "git show -W -b".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
The lambda passed to write_and_validate_sst() already runs in
seastar::thread context; replace future::then() chaining with
future::get() calls.
We're going to eliminate the trailing "return make_ready_future<>()"
later.
This patch is best viewed with "git show -W -b".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
The lambda passed to write_and_validate_sst() already runs in
seastar::thread context; replace future::then() chaining with
future::get() calls.
We're going to eliminate the trailing "return make_ready_future<>()"
later.
This patch is best viewed with "git show -W -b".
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
check_component_integrity() does not rely on any deferred close or stop
operations; turn it into a coroutine therefore, for best readability.
This conversion demonstrates particularly well how much the stack eases
coding. We no longer need to artificially extend the lifetime of "tmp"
with a final
.then([tmp] {})
future. Consequently, "tmp" no longer needs to be a shared pointer to an
on-heap "tmpdir" object; "tmp" can just be a "tmpdir" object on the stack.
While at it, eliminate the single-use local objects "s" and "gen", for
movability's sake. (We could use std::move() on these variables, but it
seems easier to just flatten the function calls that produce the
corresponding rvalues into the write_sst_info() argument list.)
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
The lambda passed to test_using_reusable_sst() is now invoked --
futurized, transitorily -- in seastar::thread context; stop returning an
explicit make_ready_future<>() from the lambda.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
summary_query_fail() does not rely on any deferred close or stop
operations; turn it into a coroutine therefore, for best readability.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
simple_index_read() and composite_index_read() do not rely on any deferred
close or stop operations; turn them into coroutines therefore, for best
readability.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Improve the readability of test_using_reusable_sst() by replacing
future::then() chaining with test_env::do_with_async() and future::get().
Unlike seastar::async(), test_env::do_with_async() restricts its input
lambda to returning "void". Because of this, introduce the variant
test_using_reusable_sst_returning(), based on
test_env::do_with_async_returning(), for lambdas returning non-void. Put
the latter to use in index_read() at once.
Subsequently, we'll gradually convert the lambdas passed to
test_using_reusable_sst() and test_using_reusable_sst_returning() from
returning futures to returning direct values. In order for
test_using_reusable_sst() and test_using_reusable_sst_returning() to cope
with both types of lambdas, wrap the lambdas into futurize_invoke().get().
In the seastar::thread context, future::get() will gracefully block on
genuine futures, and return immediately on direct values that were
futurized on the spot.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Make test_using_working_sst() easier to read by:
(1) replacing test_env::do_with() with seastar::async(),
seastar::defer(), and future::get();
(2) replacing seastar::async() and seastar::defer() with
test_env::do_with_async().
Technically speaking, this change does not perfectly preserve exceptional
behavior. Namely, test_env::do_with() uses future::finally() to link
test_env::stop() to the chain of futures, and future::finally() permits
test_env::stop() itself to throw an exception -- potentially leading to a
seastar::nested_exception being thrown, which would carry both the
original exception and the one thrown by test_env::stop().
Contrarily, the test_env::stop() deferred with seastar::defer() runs in a
destructor, and therefore test_env::stop() had better not throw there.
However, we will assume that test_env::stop() does not throw, albeit not
marked "noexcept". Prior commits 8d704f2532 ("sstable_test_env:
Coroutinize and move to .cc test_env::stop()", 2023-10-31) and
2c78b46c78 ("sstables::test_env: Carry compaction manager on board",
2023-10-31) show that we've considered individual actions in
test_env::stop() not to throw before.
The 128KB stack of seastar::thread (which underlies seastar::async())
should be a tolerable cost in a test case, in exchange for the improved
readability.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Currently change_dir_for_test() is synchronous. Make it return a future,
so that we can use async operations in change_dir_for_test() overrides.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:
e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"
as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.
The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit
026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"
In turn, flat_mutation_reader was introduced in 2017 in commit
748205ca75 "Introduce flat_mutation_reader"
To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.
Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.
Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.
Some notes about the transition:
- files were also renamed. In one case (flat_mutation_reader_test.cc), the
rename target already existed, so we rename to
mutation_reader_another_test.cc.
- a namespace 'mutation_reader' with two definitions existed (in
mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
class. As a result, a few #includes had to be adjusted.
Closesscylladb/scylladb#19356
since Boost.Test relies on operator<< or `boost_test_print_type()`
to print the value of variables being compared, instead of defining
the fallback formatter of `boost_test_print_type()` for each
individual test, let's define it in `test/lib/test_utils.hh`, so
that it can be shared across tests.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18260
Some tests want to ignore out_of_range exception in continuation and go
the longer route for that
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18216