The reconcilable_result is built as it would be constructed for
forward read queries for tables with reversed order.
Mutations constructed for reversed queries are consumed forward.
Drop overloaded reversed functions that reverse read_command and
reconcilable_result directly and keep only those requiring smart
pointers. They are not used any more.
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Now that the origin is available inside the sstable object, no need to
pass it to the methods called in the write path.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Pass origin when opening the sstable from the writer and store it in the
sstable object. This will make the origin available for the entire write
path.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).
It's easy to mix up the two and break something as a result.
The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.
The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.
The problem fixed by this patch is that the writer was wrongly creating
the compressor objects based on its own schema, but using them based
based on the sstable's schema the sstable's schema.
This patch forces the writer to use the sstable's schema for both.
There are two schema's associated with a sstable writer:
the sstable's schema (i.e. the schema of the table at the time when the
sstable object was created), and the writer's schema (equal to the schema
of the reader which is feeding into the writer).
It's easy to mix up the two and break something as a result.
The writer's schema is needed to correctly interpret and serialize the data
passing through the writer, and to populate the on-disk metadata about the
on-disk schema.
The sstables's schema is used to configure some parameters for newly created
sstable, such as bloom filter false positive ratio, or compression.
The problem fixed by this patch is that the writer was wrongly creating
the filter based on its own schema, while the layer outside the writer
was interpreting it as if it was created with the sstable's schema.
This patch forces the writer to pick the filter's parameters based on the
sstable's schema instead.
Log the sstable origin when its bloom filter is being rebuilt. The
origin has to be passed to the method by the caller as it is not
available in the sstable object when the filter is rebuilt.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#19601
The bloom filters are built with partition estimates, as the actual
partition count might not be available in all the cases. If the estimate
was bad, the bloom filters might end up too large or too small than
their optimal sizes. Rebuild such bloom filters with actual partition
count before the filter is written to disk and the sstable is sealed.
Fixes#19049
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
flat_mutation_reader_v2 was introduced in a pair of commits in 2021:
e3309322c3 "Clone flat_mutation_reader related classes into v2 variants"
08b5773c12 "Adapt flat_mutation_reader_v2 to the new version of the API"
as a replacement for flat_mutation_reader, using range_tombstone_change
instead of range_tombstone to represent represent range tombstones. See
those commits for more information.
The transition was incremental; the last use of the original
flat_mutation_reader was removed in 2022 in commit
026f8cc1e7 "db: Use mutation_partition_v2 in mvcc"
In turn, flat_mutation_reader was introduced in 2017 in commit
748205ca75 "Introduce flat_mutation_reader"
To transition from a mutation_reader that nested rows within
a partition in a separate stream, to a flat reader that streamed
partitions and rows in the same stream.
Here, we reclaim the original name and rename the awkward
flat_mutation_reader_v2 to mutation_reader.
Note that mutation_fragment_v2 remains since we still use the original
for compatibilty, sometimes.
Some notes about the transition:
- files were also renamed. In one case (flat_mutation_reader_test.cc), the
rename target already existed, so we rename to
mutation_reader_another_test.cc.
- a namespace 'mutation_reader' with two definitions existed (in
mutation_reader_fwd.hh). Its contents was folded into the mutation_reader
class. As a result, a few #includes had to be adjusted.
Closesscylladb/scylladb#19356
When issuing warnings about partitions with the number of rows above a configured threshold,
the large partitions handler does not take into consideration the number of range tombstone
markers in the total rows count. This fix adds the number of range tombstone markers to the
total number of rows and saves this total in system.large_partitions.rows (if it is above
the threshold). It also adds a new column range_tombstones to the system.large_partitions
table which only contains the number of range tombstone markers for the given partition.
This PR fixes the first part of issue #13968
It does not cover distinguishing between live and dead rows. A subsequent PR will handle that.
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.
in this change,
* utils: drop the range formatters in to_string.hh and to_string.c, as
we don't use them anymore. and the tests for them in
test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
we are not able to print the elements using operator<< anymore
after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
due to a bug in {fmt} v9, {fmt} fails to format a range whose
element type is `basic_sstring<uint8_t>`, as it considers it
as a string-like type, but `basic_sstring<uint8_t>`'s char type
is signed char, not char. this issue does not exist in {fmt} v10,
so, in this change, we add a workaround to explicitly specialize
the type trait to assure that {fmt} format this type using its
`fmt::formatter` specialization instead of trying to format it
as a string. also, {fmt}'s generic ranges formatter calls the
pair formatter's `set_brackets()` and `set_separator()` methods
when printing the range, but operator<< based formatter does not
provide these method, we have to include this change in the change
switching to {fmt}, otherwise the change specializing
`fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
for comparing values. but without the operator<< based formatters,
Boost.Test would not be able to print them. after removing
the homebrew formatters, we need to use the generic
`boost_test_print_type()` helper to do this job. so we are
including `test_utils.hh` in tests so that we can print
the formattable types.
* treewide: add "#include "utils/to_string.hh" where
`fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.
this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:
```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
254 | return formatter<std::string_view>::format(it->second, ctx);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
2759 | FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
| ^ ~~~~~~~~~~~~
```
because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#18299
The message says "index-data" but when printing the position, the data
position is printed first, causing confusion. Fix this and while at it,
also print the position of the partition start.
The validate() consumes the content of partitions in a consume-loop.
Every time the consumer asks for a "break", the next PI block is loaded
and set on the validator, so it can validate that further clustering
elements are indeed from this block.
This loop assumed the consumer would only request interruption when the
current clustering block is finished. This is wrong, the consumer can
also request interruption when yielding is needed. When this is the
case, the next PI block doesn't have to be loaded yet, the current one
is not exhausted yet. Check this condition, before loading the next PI
block, to prevent false positive errors, due to mismatched PI block
and clustering elements from the sstable.
It is possible that the next partition has no PI and thus there won't be
a new PI block to overwrite the old one. This will result in
false-positive messages about rows being outside of the finished PI
block.
Promoted index entries can be written on any clustering elements,
icluding range tombstones. So the validating consumer also has the check
whether the current expected clustering block is finished, when
consuming a range tombstone. If it is, consumption has to be
interrupted, so that the outer-loop can load up the next promoted index
block, before moving on to the next clustering element.
For range tombstone end-bounds, the validate_fragment_order() should be
passed a null tombstone, not a disengaged optional. The latter means no
change in the current tombstone. This caused the end bound of range
tombstones to not make it to the validator and the latter complained
later on partition-end that the partition has unclosed range tombstone.
key_view::explode() contains a blatant use-after-free:
unless the input is already linearized, it returns a view to a local temporary buffer.
This is rare, because partition keys are usually not large enough to be fragmented.
But for a sufficiently large key, this bug causes a corrupted partition_key down
the line.
Fixes#17625Closesscylladb/scylladb#17626
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for `sstables::bound_kind_m`,
drop its operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define formatters for
`cached_promoted_index::promoted_index_block`, and drop its
operator<<.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17415
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.
Replace with seastar::future::get(), which does the same thing.
Currently, we pass an effective_replication_map_ptr to sstable_writer,
so that we can get a stable dht::sharder for writing the sharding metadata.
This is needed because with tablets, the sharder can change dynamically.
However, this is both bad and unnecessary:
- bad: holding on to an effective_replication_map_ptr is a barrier
for topology operations, preventing tablet migrations (etc) while
an sstable is being written
- unnecessary: tablets don't require sharding metadata at all, since
two tablets cannot overlap (unlike two sstables from different shards in
the same node). So the first/last key is sufficient to determine the
shard/tablet ownership.
Given that, just pass the sharder for vnode sstables, and don't generate
sharding metadata for tablet sstables.
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Validation scrub bypasses the usual compaction machinery, though it
still needs to be tracked with compaction_progress_monitor so that
we could reach its progress from compaction task executor.
Track sstable scrub in validate mode with read monitors.
when the local_deletion_time is too large and beyond the
epoch time of INT32_MAX, we cap it to INT32_MAX - 1.
this is a signal of bad configuration or a bug in scylla.
so let's add more information in the logging message to
help track back to the source of the problem.
Fixes#15015
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node.
That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317.
To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead.
Without further ado,
before:
`INFO 2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.`
after:
`INFO 2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.`
Fixes#12998.
Fixes#14317.
Closes#14469
* github.com:scylladb/scylladb:
test: Extend cleanup correctness test to cover more cases
compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range
sstables: Close SSTable reader if index exhaustion is detected in fast forward call
sstables: Simplify sstable reader initialization
compaction: Extend make_sstable_reader() interface to work with mutation_source
test: Extend sstable partition skipping test to cover fast forward using token
When wiring multi range reader with cleanup, I found that cleanup
wouldn't be able to release disk space of input SSTables earlier.
The reason is that multi range reader fast forward to the next range,
therefore it enables mutation_reader::forwarding, and as a result,
combined reader cannot release readers proactively as it cannot tell
for sure that the underlying reader is exhausted. It may have reached
EOS for the current range, but it may have data for the next one.
The concept of EOS actually only applies to the current range being
read. A reader that returned EOS will actually get out of this
state once the combined reader fast forward to the next range.
Therefore, only the underlying reader, i.e. the sstable reader,
can for certain know that the data source is completely exhausted,
given that tokens are read in monotonically increasing order.
For reversed reads, that's not true but fast forward to range
is not actually supported yet for it.
Today, the SSTable reader already knows that the underlying SSTable
was exhausted in fast_forward_to(), after it call index_reader's
advance_to(partition_range), therefore it disables subsequent
reads. We can take a step further and also check that the index
was exhausted, i.e. reached EOF.
So if the index is exhausted, and there's no partition to read
after the fast_forward_to() call, we know that there's nothing
left to do in this reader, and therefore the reader can be
closed proactively, allowing the disk space of SSTable to be
reclaimed if it was already deleted.
We can see that the combined reader, under multi range reader,
will incrementally find a set of disjoint SSTable exhausted,
as it fast foward to owned ranges
1:
INFO 2023-07-05 10:51:09,570 [shard 0] mutation_reader - flat_multi_range_mutation_reader(): fast forwarding to range [{-4525396453480898112, start},{-4525396453480898112, end}]
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-1-big-Data.db, start == *end, eof ? true
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - closing reader 0x60100029d800 for /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-1-big-Data.db
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-3-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-4-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-5-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-6-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-7-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-8-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-9-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,570 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-10-big-Data.db, start == *end, eof ? false
2:
INFO 2023-07-05 10:51:09,572 [shard 0] mutation_reader - flat_multi_range_mutation_reader(): fast forwarding to range [{-2253424581619911583, start},{-2253424581619911583, end}]
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-2-big-Data.db, start == *end, eof ? true
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - closing reader 0x60100029d400 for /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-2-big-Data.db
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-4-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-5-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-6-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-7-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-8-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-9-big-Data.db, start == *end, eof ? false
INFO 2023-07-05 10:51:09,572 [shard 0] sstable - sstable /tmp/scylla-9831a31a-66f3-4541-8681-000ac8e21bbb/me-10-big-Data.db, start == *end, eof ? false
And so on.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
It's odd that we see things like:
if (!is_initialized()) {
return initialize().then([this] {
if (!is_initialized()) {
and
return ensure_initialized().then([this, &pr] {
if (!is_initialized()) {
One might think initialize will actually initialize the reader by
setting up context, and ensure_initialized() will even have stronger
guarantees, meaning that the reader must be initialized by it.
But none are true.
In the context of single-partition read, it can happen initialize()
will not set up context, meaning is_initialized() returns false,
which is why initialization must be checked even after we call
ensure_initialized().
Let's merge ensure_initialized() and initialize() into a
maybe_initialize() which returns a boolean saying if the reader
is initialized.
It makes the code initializing the reader easier to understand.
Prevent switch case statements from falling through without annotation
([[fallthrough]]) proving that this was intended.
Existing intended cases were annotated.
Closes#14607
We need to keep sharding metadata consistent with tablet mapping to
shards in order for node restart to detect that those sstables belong
to a single shard and that resharding is not necessary. Resharding of
sstables based on tablet metadata is not implemented yet and will
abort after this series.
Keeping sharding metadata accurate for tablets is only necessary until
compaction group integration is finished. After that, we can use the
sstable token range to determine the owning tablet and thus the owning
shard. Before that, we can't, because a single sstable may contain
keys from different tablets, and the whole key range may overlap with
keys which belong to other shards.
There are some headers that include tracing/*.hh ones despite all they
need is forward-declared trace_state_ptr
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#14155
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).
So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command
The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields
Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)
Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile
The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13963
data_consume_rows keeps an input_stream member that must be closed.
In particular, on the error path, when we destroy it possibly
with readaheads in flight.
Fixes#13836
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#13840
Working with the low-level sstable parser and index reader, this
validator also cross-checks the index with the data file, making sure
all partitions are located at the position and in the order the index
describes. Furthermore, if the index also has promoted index, the order
and position of clustering elements is checked against it.
This is above the usual fragment kind order, partition key order and
clustering order checks that we already had with the reader-level
validator.
Currently mp_row_consumer_m creates an alias to data_consumer::proceed.
Code in the rest of the file uses both unqualified name and
mp_row_consumer_m::proceed. Remove the alias and just use
`data_consumer::proceed` directly everywhere, leads to cleaner code.
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.
fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.
in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.
sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.
also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.
please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.
this change is a cleanup to modernize the code base with C++20
features.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13687
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print following classes without the help of `operator<<`.
- partition_key_view
- partition_key
- partition_key::with_schema_wrapper
- key_with_schema
- clustering_key_prefix
- clustering_key_prefix::with_schema_wrapper
the corresponding `operator<<()` are dropped dropped in this change,
as all its callers are now using fmtlib for formatting now. the helper
of `print_key()` is removed, as its only caller is
`operator<<(std::ostream&, const
clustering_key_prefix::with_schema_wrapper&)`.
the reason why all these operators are replaced in one go is that
we have a template function of `key_to_str()` in `db/large_data_handler.cc`.
this template function is actually the caller of operator<< of
`partition_key::with_schema_wrapper` and
`clustering_key_prefix::with_schema_wrapper`.
so, in order to drop either of these two operator<<, we need to remove
both of them, so that we can switch over to `fmt::to_string()` in this
template function.
Refs scylladb#13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The method needs to create two data sinks -- for Data and for Index
files -- and then wrap it with more stuff (compression, checksums,
streams, etc.). With S3 backend using file-output-stream won't work,
becase S3 storage cannot provide writable file API (it has data_sink
instead).
This patch extracts file_data_sink creation so that it could be
virtualized with storage API later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>