This is to push the service towards general idea that each
component should have its own config and db::config to stay
in main.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All of them are references taken from 'this', since the function is
the generation_service method it can use 'this' directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The generation service already has all it needs to do it. This
keeps storage_service smaller and less aware about cdc internals.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The recently introduced make_new_generation() method just calls
another one by passing more this->... stuff as arguments. Relax
the flow by teaching the latter to use 'this' directly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It has everything needed onboard. Only two arguments are required -- the
booststrap tokens and whether or not to inject a delay.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This patch adds a reproducer for issue #7586 - that Alternator queries
(Query) operating in reverse order (ScanIndexForward = false) are
artificially limited to 100 MB partitions because of their memory use.
This test generates a partition over 100 MB in size and then tries various
reverse queries on it - with or without Limit, starting at the end or
the middle of the partition. The test currently fails when a reverse query
refuses to operate on such a large partition - the log reports this:
ERROR ... Memory usage of reversed read exceeds hard limit of 104857600
(configured via max_memory_for_unlimited_query_hard_limit), while reading
partition K1H6ON3A1C
With yet-uncommitted reverse-scan improvements, the test proceeds further,
but still fails where we test that a reverse query with Limit not
explicitly specified should still be limited to a certain size (e.g. 1MB)
and cannot return the entire 100 MB partition in one response.
Please note that this is not a comprehensive test for Scylla's reverse
scan implementation: In particular we do not have separate tests for
reverse scan's implementation on different sources - memtables, sstables,
or the cache. Nor do we check all sorts of edge cases. We assume that
Scylla's reverse scan implementation will have its own unit tests
elsewhere that will check these things - and this test can focus on the
Alternator use case.
This test is marked "xfail" because it still fails on Alternator. It is
marked "veryslow" because it's a (relatively) slow test, taking multiple
seconds to set up the 100 MB partition. So run the test with the
pytest options "--runxfail --runveryslow" to see how it fails.
Refs #7586
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210930063700.407511-1-nyh@scylladb.com>
The cql_config_updater is a sharded<> service that exists in main and
whose goal is to make sure some db::config's values are propagated into
cql_config. There's a more handy updateable_value<> glue for that.
tests: unit(dev)
refs: #2795
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210927090402.25980-1-xemul@scylladb.com>
Fixed some errors in .github/CODEOWNERS (which is used by Github to
recommend who should review which pull request), and also add a few
additional ownerships I thought of.
This file could still use more work - if you can think of specific
files or directories you'd like to review changes in, please send a
patch for this file to add yourself to the appropriate paths.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210929141118.378930-1-nyh@scylladb.com>
Recently we observed an OOM caused by the partition based splitting
writer going crazy, creating 1.7K buckets while scrubbing an especially
broken sstable. To avoid situations like that in the future, this patch
provides a max limit for the number of live buckets. When the number of
buckets reach this number, the largest bucket is closed and replaced by
a bucket. This will end up creating more output sstables during scrub
overall, but now they won't all be written at the same time causing
insane memory pressure and possibly OOM.
Scrub compaction sets this limit to 100, the same limit the TWCS's
timestamp based splitting writer uses (implemented through the
classifier -
time_window_compaction_strategy::max_data_segregation_window_count).
Fixes: #9400
Tests: unit(dev)
Closes#9401
There are now 231 translation units that indirectly include commitlog.hh
due to the need to have access to db::commitlog::force_sync.
Move that type to a new file commitlog_types.hh and make it available
without access to the commitlog class.
This reduces the number of translation units that depend on commitlog.hh
to 84, improving compile time.
Unfortunately, defining metrics in Scylla requires some code
duplication, with the metrics declared in one place but exported in a
different place in the code. When we duplicated this code in Alternator,
we accidentally dropped the first metric - for BatchGetItem. The metric
was accounted in the code, but not exported to Prometheus.
In addition to fixing the missing metric, this patch also adds a test
that confirms that the BatchGetItem metric increases when the
BatchGetItem operation is used. This test failed before this patch, and
passes with it. The test only currently tests this for BatchGetItem
(and BatchWriteItem) but it can be later expanded to cover all the other
operations as well.
Fixes#9406
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210929121611.373074-1-nyh@scylladb.com>
Currently no mutation-source supports reading in reverse natively but
we are working on changing that, adding native reverse read support to
memtable, cache and sstable readers. To ensure that all mutation
sources work in a correct and uniform manner when reading in reverse,
we add a reverse test to the mutation source test suite. This test
reverses the data that it passes to `populate()`, then reads in
forward order (in reverse compared to the data order). For this we use
the currently established reverse read API: reverse schema (schema
order == query order) and half-reversed (legacy) slice. All mutation
sources are prepared to work with reversed reads, using the
`make_reversing_reader()` adapter. As we progress with our native
reverse support, we will replace these adapters with native reversing
support. As part of this, we push down the reversing reader adapter
currently existing on the `query::consume_page()` level, to the
individual mutation sources.
Closes#9384
* github.com:scylladb/scylla:
test: mutation_reader_test: reversed version of test_clustering_order_merger_sstable_set
querier: consume_page(): remove now unused max_size parameter
test/lib: mutation_source_test: test reading in reverse
test: mutation_reader_test: clustering_combined_reader_mutation_source_test: prepare for reading in reverse
test: flat_mutation_reader_test: test_reverse_reader_is_mutation_source: prepare for reading in reverse
test: mutation_reader_test: test_manual_paused_evictable_reader_is_mutation_source: use query schema instead of table schema
treewide: move reversing to the mutation sources
mutation_query: reconcilable_result_builder: document reverse query preconditions
sstable_set: time_series_sstable_set: reverse mode
mutlishard_mutation_query: set max result size on used permits
db/virtual_table: streaming_virtual_table::as_mutation_source(): use query schema instead of table schema
flat_mutation_reader: make_reversing_reader(): add convenience stored slice
mutation_reader: evictable_reader: add reverse read support
flat_mutation_reader: make_flat_mutation_reader_from_fragments(): add reverse read support
flat_mutation_reader: flat_mutation_reader_from_mutations(): add reverse read support
flat_mutation_reader: flat_mutation_reader_from_mutations(): document preconditions
query-request: introduce `half_reverse_slice`
flat_mutation_reader_assertions: log what's expected
"
Backlog tracker isn't updated correctly when facing a schema change, and
may leak a SSTable if compaction strategy is changed, which causes
backlog to be computed incorrectly. Most of these problems happen because
sstable set and tracker are updated independently, so it could happen
that tracker lose track (pun intended) of changes applied to set.
The first patch will fix the leak when strategy is changed, and the third
patch will make sure that tracker is updated atomically with sstable set,
so these kind of problems will not happen anymore.
Fixes#9157
"
* 'fixes_to_backlog_tracker_v4' of github.com:raphaelsc/scylla:
compaction: Update backlog tracker correctly when schema is updated
compaction: Don't leak backlog of input sstable when compaction strategy is changed
compaction: introduce compaction_read_monitor_generator::remove_exhausted_sstables()
compaction: simplify removal of monitors
To ensure all mutation sources uniformly support the current API of
reverse reading: reversed schema and half-reversed slice. This test will
also ensure that once we switch to native-reverse slice, all
mutation-sources will keep on working.
For reversed reads we must adjust the lower/upper bounds used by the
`position_reader_queue` and `clustering_combined_reader`. The bounds are
calculated using the mutation schema, but we need bounds calculated
using the query schema which is reversed.
The mutation source test suite will soon test reads in reverse. Prepare
for this by checking the reversed flag on the slice and not reversing
the data when set. The test will have two modes effectively:
* Forward mode: data is reversed before read, the reversed again during
read.
* Reverse mode: data is already reversed and it is reversed back during
read.
The two might not be the same in case the schema was upgraded or if we
are reading in reverse. It is important to use the passed-in query
schema consistently during a read.
Push down reversing to the mutation-sources proper, instead of doing it
on the querier level. This will allow us to test reverse reads on the
mutation source level.
The `max_size` parameter of `consume_page()` is now unused but is not
removed in this patch, it will be removed in a follow-up to reduce
churn.
DynamoDB limits the number of items that a BatchWriteItem call can write
to 25. As noted in issue #5057, in Alternator we don't have this limit
or any limit on the number of items in a BatchWriteItem - which probably
isn't wise.
This patch adds a simple xfailing test for this.
Refs #5057
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210912140736.76995-1-nyh@scylladb.com>
The `expression` type (an std::variant) suffers from bad ergonomics:
- std::variant has poor/no constraints, so compiler error messages are long and uninformative
- it cannot be forward-declared (since std::variant does not support incomplete types)
- the type name is long, polluting compiler error messages and debug symbols
- it requires an artificial `nested_expression` when one expression is nested inside another
This series fixes those drawbacks by wrapping the variant in a class, adding constraints, and adding an extra indirection.
Test: unit (dev)
Closes#9402
* github.com:scylladb/scylla:
cql3: expr: drop nested_expression
cql3: expr: make expression forward declarable, easier to use
cql3: expr: construct column_value explicitly
cql3: expr: introduce as/as_if/is
cql3: expr: introduce expr::visit, replacing std::visit
On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils.
This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave.
cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service.
To configure scaling_governor correctly, we have to disable this service.
Fixes#9324Closes#9325
Now that expression can be nested in its component types
directly, we can remove nested_expression. Most of the patch
adjusts uses to drop the dereference that was needed for
nested_expression.
Make expression a class, holding a unique_ptr to a variant,
instead of just a variant.
This has some advantages:
- the constructor can be properly constrained
- the type can be forward-declared
- the type name is just "expression", rather than
a huge variant. This makes compiler error messages easier
to read.
- the internal indirection allows removal of nested_expression
(later in the series)
We have a few cases where a column_definition* is converted
directly to an expression without an explicit call to column_value{}.
The new expression implementation will not allow this, so make
these cases explicit. IMO this is better form than to rely
on the compiler picking the right expression subtype.
Simple wrappers for std::get, std::get_if, std::holds_alternative.
The new names are shorter and IMO more readable.
Call sites are updated.
We will later replace the implementation.
The new expr::visit() is just a wrapper around std::visit(),
but has better constraints. A call to expr::visit() with a
visitor that misses an overload will produce an error message
that points at the missing type. This is done using the new
invocable_on_expression concept. Note it lists the expression
types one by one rather than using template magic, since
otherwise we won't get the nice messages.
Later, we will change the implementation when expression becomes
our own type rather than std::variant.
Call sites are updated.
`time_series_sstable_set` uses `clustering_combined_reader` to implement
efficient single-partition reads. It provides a `position_reader_queue`
to the reader. This queue returns readers to the sstables from the set
in order of the sstables' lower bounds, and with each reader it provides
an upper bound for the positions-in-partition returned by the reader.
Until now we would assume non-reversed queries only. Reversed queries
were implemented by performing forward query in the lower layers
and reversing the results at the upper-most layer of the reader stack.
Before pushing the reversing down to the sources (in particular,
to sstable readers), we need to support the reverse mode in
`time_series_sstable_set` and the queue it provides to
`clustering_combined_reader`.
This requires using different lower and upper bounds in the queue.
For non-reversed reads we used `sstable::min_position()` as the lower
bound and `sstable::max_position()` as the upper bound. For reversed
reads all comparisons performed by `clustering_combined_reader` will be
reversed, as it will use a reversed schema. We can then use
`sstable::max_position().reversed()` for the lower bound and
`sstable::min_position().reversed()` for the upper bound.
08042c1688 added the query max result size
to the permit but only set it for single partition queries. This patch
does the same for range-scans in preparation of `query::consume_page()`
not propagating max size soon.
The two might not be the same in case the schema was upgraded (unlikely
for virtual tables) or if we are reading in reverse. It is important to
use the passed-in query schema consistently during a read.
This serves as a convenience slice storage for reads that have to
store an edited slice somewhere. This is common for reads that work
with a native-reversed slice and so have to convert the one used in the
query -- which is in half-reversed format.
Evictable reader has to be made aware of reverse reads as it checks/edits the
slice. This shouldn't require reverse awareness normally, it is only
required because we still use the half-reversed (legacy) slice format
for reversed reads. Once we switch to the native format this commit can
be reverted.
* scylla-dev/raft-misc-v4-docedit:
raft: document pre-voting and protection against disruptive leaders
raft: style edits of README.md.
raft: document snapshot API
Currently the following can happen:
1) there's ongoing compaction with input sstable A, so sstable set
and backlog tracker both contains A.
2) ongoing compaction replaces input sstable A by B, so sstable set
contains only B now.
3) schema is updated, so a new backlog tracker is built without A
because sstable set now contains only B.
4) ongoing compaction tries to remove A from tracker, but it was
excluded in step 3.
5) tracker can now have a negative value if table is decreasing in
size, which leads to log(<negative number>) == -NaN
This problem happens because backlog tracker updates are decoupled
from sstable set updates. Given that the essential content of
backlog tracker should be the same as one of sstable set, let's move
tracker management to table.
Whenever sstable set is updated, backlog tracker will be updated with
the same changes, making their management less error prone.
Fixes#9157
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
The generic backlog formula is: ALL + PARTIAL - COMPACTING
With transfer_ongoing_charges() we already ignore the effect of
ongoing compactions on COMPACTING as we judge them to be pointless.
But ongoing compactions will run to completion, meaning that output
sstables will be added to ALL anyway, in the formula above.
With stop_tracking_ongoing_compactions(), input sstables are never
removed from the tracker, but output sstables are added, which means
we end up with duplicate backlog in the tracker.
By removing this tracking mechanism, pointless ongoing compaction
will be ignored as expected and the leaks will be fixed.
Later, the intention is to force a stop on ongoing compactions if
strategy has changed as they're pointless anyway.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>