these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The sstable_test_env::reusable_sst() has default value for the format
argument. Patch the test cases that don't use one while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The reusable_sst() is intented to be used to load the pre-existing
sstable from the test/resources directory and .load() them. Some test
cases, however, still do it "by hand".
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The former is a convenience wrapper over the latter. There's no real benefit in using it, but having two test_env-s is worse than just one.
Closes#12794
* github.com:scylladb/scylladb:
sstable_utils: Move the test_setup to perf/
sstable_utils: Remove unused wrappers over test_env
sstable_test: Open-code do_with_cloned_tmp_directory
sstable_test: Asynchronize statistics_rewrite case
tests: Replace test_setup::do_with_tmp_directory with test_env::do_with(_async)?
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.
Closes#12858
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.
mutation_reader remains in the readers/ module.
mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.
This is a step forward towards librarization or modularization of the
source base.
Closes#12788
The former helper is just a wrapper over the _async version of the
latter and also creates a tempdir and calls the fn with tempdir as an
argument. The test_env already has its own temp dir on board, so callers
can can be switched to using it.
Some test cases use the do_with_tmp_directory but generate chain of
futures without in fact using the async context. This patch addresses
that, so the change is not 100% mechanical unfortunately.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We currently have two method families to generate partition keys:
* make_keys() in test/lib/simple_schema.hh
* token_generation_for_shard() in test/lib/sstable_utils.hh
Both work only for schemas with a single partition key column of `text` type and both generate keys of fixed size.
This is very restrictive and simplistic. Tests, which wanted anything more complicated than that had to rely on open-coded key generation.
Also, many tests started to rely on the simplistic nature of these keys, in particular two tests started failing because the new key generation method generated keys of varying size:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation
These two tests seems to depend on generated keys all being of the same size. This makes some sense in the case of the key count estimation test, but makes no sense at all to me in the case of the sstable run test.
Closes#12657
* github.com:scylladb/scylladb:
test/lib/sstable_utils: remove now unused token_generation_for_shard() and friends
test/lib/simple_schema: remove now unused make_keys() and friends
test: migrate to tests::generate_partition_key[s]()
test/lib/test_services: add table_for_tests::make_default_schema()
test/lib: add key_utils.hh
test/lib/random_schema.hh: value_generator: add min_size_in_bytes
Makes sstable_set::all() interface robust, and introduces sstable_set::size() to avoid copies when retrieving set size.
Closes#12716
* github.com:scylladb/scylladb:
treewide: Use new sstable_set::size() wherever possible
sstables: Introduce sstable_set::size()
sstables: Fix fragility of sstable_set::all() interface
Preferred aternative to sstable_set->all()->size(), which may
involve of copy elements from a single set or multiple ones
if compound_sstable_set is used.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.
Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Use the newly introduced key generation facilities, instead of the the
old inflexible alternatives and hand-rolled code.
Most of the migrations are mechanic, but there are two tests that
were tricky to migrate:
* sstable_compaction_test.sstable_run_based_compaction_test
* sstable_mutation_test.test_key_count_estimation
These two tests seems to depend on generated keys all being of the same
size. This makes some sense in the case of the key count estimation
test, but makes no sense at all to me in the case of the sstable run
test.
The sstable::filename() is going to become private method. Lots of tests
call it, but tests do call a lot of other sstable private methods,
that's OK. Make the sstable::filename() yet another one of that kind in
advance.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This method is currently used in two places: sstable::snapshot() and
sstable::seal_sstable(). The latter additionally touches the target
backup/ subdir.
This patch moves the whole thing on storage and adds touch for all the
cases. For snapshots this might be excessive, but harmless.
Tests get their private-disclosure way to access sstable._storage in
few places to call create_links directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method became unused since 70e5252a (table: no longer accept online
loading of SSTable files in the main directory) and the whole concept of
reshuffling sstables was dropped later by 7351db7c (Reshape upload files
and reshard+reshape at boot).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#12165
The deletion log concept uses the fact that files are on a POSIX
filesystem. Support for another storage type will have to reimplement
this place, so keep the FS-specific code in _directory.cc file.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This fixes a regression introduced by 1e7a444, where table::get_sstable_set() isn't exposing all sstables, but rather only the ones in the main set. That causes user of the interface, such as get_sstables_by_partition_key() (used by API to return sstable name list which contains a particular key), to miss files in the maintenance set.
Fixes https://github.com/scylladb/scylladb/issues/11681.
Closes#11682
* github.com:scylladb/scylladb:
replica: Return all sstables in table::get_sstable_set()
sstables: Fix cloning of compound_sstable_set
The intention was that its clone() would actually clone the content
of an existing set into a new one, but the current impl is actually
moving the sets instead of copying them. So the original set
becomes invalid. Luckily, this problem isn't triggered as we're
not exposing the compound set in the table's interface, so the
compound_sstable_set::clone() method isn't being called.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This class exists for one purpose only: to serve as glue code between
dht::ring_position and boost::icl::interval_map. The latter requires
that keys in its intervals are:
* default constructible
* copyable
* have standalone compare operations
For this reason we have to wrap `dht::ring_position` in a class,
together with a schema to provide all this. This is
`compatible_ring_position`. There is one further requirement by code
using the interval map: it wants to do lookups without copying the
lookup key(s). To solve this, we came up with
`compatible_ring_position_or_view` which is a union of a key or a key
view + schema. As we recently found out, boost::icl copies its keys **a
lot**. It seems to assume these keys are cheap to copy and carelessly
copies them around even when iterating over the map. But
`compatible_ring_position_or_view` is not cheap to copy as it copies a
`dht::ring_position` which allocates, and it does that via an
`std::optional` and `std::variant` to add insult to injury.
This patch make said class cheap to copy, by getting rid of the variant
and storing the `dht::ring_position` via a shared pointer. The view is
stored separately and either points to the ring position stored in the
shared pointer or to an outside ring position (for lookups).
Fixes: #11669Closes#11670
After commit 0796b8c97a, sstable_run won't accept a fragment
that introduces key overlapping. But once we split large partitions,
fragments in the same run may store disjoint clustering ranges
of the same partition. So we're extending sstable_run to look
at clustering dimension, so fragments storing disjoint clustering
ranges of the same large partition can co-exist in the same run.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
With first partition's first position and last partition's last
partition, we'll be able to determine which fragments composing a
sstable run store a large partition that was split.
Then sstable run will be able to detect if all fragments storing
a given large partition are disjoint in the clustering level.
Fixes#10637.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
This new method allows sstable to load the first row of the first
partition and last row of last partition.
That's useful for incremental reading of sstable run which will
be split at clustering boundary.
To get the first row, it consumes the first row (which can be
either a clustering row or range tombstone change) and returns
its position_in_partition.
To get the last row, it does the same as above but in reverse
mode instead.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Said method currently emits a partition-end. This method is only called
when the last fragment in the stream is a range tombstone change with a
position after all clustered rows. The problem is that
consume_partition_end() is also called unconditionally, resulting in two
partition-end fragments being emitted. The fix is simple: make this
method a no-op, there is nothing to do there.
Also add two tests: one targeted to this bug and another one testing the
crawling reader with random mutations generated for random schema.
Fixes: #11421Closes#11422
Rather than defining generate_random,
and use respectively in unit tests.
(It was inherited from raft::internal::tagged_id.)
This allows us to shorten counter_id's definition
to just using utils::tagged_uuid<struct counter_id_tag>.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
`generation_type` is (supposed to be) conceptually different from
`int64_t` (even if physically they are the same), but at present
Scylla code still largely treats them interchangeably.
In addition to using `generation_type` in more places, we
provide (no-op) `generation_value()` and `generation_from_value()`
operations to make the smoke-and-mirrors more believable.
The churn is considerable, but all mechanical. To avoid even
more (way, way more) churn, unit test code is left untreated for
now, except where it uses the affected core APIs directly.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Add column_index_auto_scale_threshold_in_kb to the configuration
(defaults to 10MB).
When the promoted index (serialized) size gets to this
threshold, it's halved by merging each two adjacent blocks
into one and doubling the desired_block_size.
Fixes#4217
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Dealing with the handful of tests that check range tombstones in
interesting ways and need more than search-and-replace.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Attempting to call advance_to() on the index, after it is positioned at
EOF, can result in an assert failure, because the operation results in
an attempt to move backwards in the index-file (to read the last index
page, which was already read). This only happens if the index cache
entry belonging to the last index page is evicted, otherwise the advance
operation just looks-up said entry and returns it.
To prevent this, we add an early return conditioned on eof() to all the
partition-level advance-to methods.
A regression unit test reproducing the above described crash is also
added.
If SSTable write fails, it will leave a partial sst which contains
a temporary TOC in addition to other components partially written.
temporary TOC content is written upfront, to allow us from deleting
all partial components using the former content if write fails.
After commit e5fc4b6, partial sst cannot be deleted because deletion
procedure is incorrectly assuming all SSTs being deleted unconditionally
have TOC, but partial SSTs only have TMP TOC instead.
That happens because parent_path() requires all path components to
exist due to its usage of fs::path::canonical.
The consequence of this is that space of partial files cannot be
reclaimed, making it worse for Scylla to recover from ENOSPC,
which could happen by selecting a set of files for compaction with
higher chance of suceeeding given the free space.
This is fixed by only calling parent_path() on TMP TOC, which is
guaranteed to exist prior to calling fsync_directory().
Fixes#10410.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"
This patchset adds two new operations to scylla-sstable:
* validate-checksums - helps identifying whether an sstable is intact or
not, but checking the digest and the per-chunk checksums against the
data on disk.
* decompress - helps when one wants to manually examine the content of a
compressed sstable.
Refs: #497
Tests: unit(dev)
"
* 'scylla-sstable-validate-checksums-decompress/v3' of https://github.com/denesb/scylla:
tools/scylla-sstable: consume_sstables(): s/no_skips/use_crawling_reader/
tools/scylla-sstable: add decompress operation
tools/scylla-sstables: add validate-checksums operation
sstables/sstable: add validate_checksums()
sstables/sstable: add raw_stream option to data_stream()
sstables/sstable: make data_stream() and data_read() public
utils/exceptions: add maybe_rethrow_exception()
Sstables have two kind of checksums: per-chunk checksums and
full-checksum (digest) calculated over the entire content of Data.db.
The full-checksum (digest) is stored in Digest.crc
(component_type::Digest).
When compression is used, the per-chunk checksum is stored directly
inside Data.db, after each compressed chunk. These are validated on
read, when decompressing the respective chunks.
When no compression is used, the per-chunk checksum is stored separately
in CRC.db (component_type::CRC). Chunk size is defined and stored in said
component as well.
In both compressed and uncompressed sstables, checksums are calculated
on the data that is actually written to disk, so in case of compressed
data, on the compressed data.
This method validates both the full checksum and the per-chunk checksum
for the entire Data.db.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
The sstables::sstable class has two methods for writing sstables:
1) sstable_writer get_writer(...);
2) future<> write_components(flat_mutation_reader, ...);
(1) directly exposes the writer type, so we have to update all users of
it (there is not that many) in this same patch. We defer updating
users of (2) to a follow-up commits.
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.
Memtables are also used outside the replica, in two places:
- in some virtual tables; this is also in some way inside the replica,
(virtual readers are installed at the replica level, not the
cooordinator), so I don't consider it a layering violation
- in many sstable unit tests, as a convenient way to create sstables
with known input. This is a layering violation.
We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).
Test: unit (dev)
Closes#10120
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.
As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
Add schema parameter so that:
* Caller has better control over schema -- especially relevant for
reverse reads where it is not possible to follow the convention of
passing the query schema which is reversed compared to that of the
mutations.
* Now that we don't depend on the mutations for the schema, we can lift
the restriction on mutations not being empty: this leads to safer
code. When the mutations parameter is empty, an empty reader is
created.
Add "make_" prefix to follow convention of similar reader factory
functions.
Tests: unit(dev)
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20211115155614.363663-1-bdenes@scylladb.com>
There are already four of them. Those working with the mutation reader
can be folded into one with some default args.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch is two-fold. First it changes the signature of the
local helper to facilitate next patching. Second, it makes more
relevant places in the test use this helper.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently all the code operates on the range_tombstone class.
and many of those places get the range tombstone in question
from the range_tombstone_list. Next patches will make that list
carry (and return) some new object called range_tombstone_entry,
so all the code that expects to see the former one there will
need to patched to get the range_tombstone from the _entry one.
This patch prepares the ground for that by introdusing the
range_tombstone& tombstone() { return *this; }
getter on the range_tombstone itself and patching all future
users of the _entry to call .tombstone() right now.
Next patch will remove those getters together with adding the new
range_tombstone_entry object thus automatically converting all
the patched places into using the entry in a proper way.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>