"
Same thing was done for compaction class some time ago, now
it's time for streaming to keep repair-generated IO in bounds.
This set mostly resembles the one for compaction IO class with
the exception that boot-time reshard/reshape currently runs in
streaming class, but that's nod great if the class is throttled,
so the set also moves boot-time IO into default IO class.
"
* 'br-streaming-class-throttling-2' of https://github.com/xemul/scylla:
distributed_loader: Populate keyspaces in default class
streaming: Maintain class bandwidth
streaming: Pass db::config& to manager constructor
config: Add stream_io_throughput_mb_per_sec option
sstables: Keep priority class on sstable_directory
This patch makes memtable_flush_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
This patch makes compaction_static_shares liveupdateable
to avoid having to restart the cluster after updating
this config.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
It's going to control the bandwidth for the streaming prio class.
For now it's jsut added but does't work for real
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Recently we noticed a regression where with certain versions of the fmt
library,
SELECT value FROM system.config WHERE name = 'experimental_features'
returns string numbers, like "5", instead of feature names like "raft".
It turns out that the fmt library keep changing their overload resolution
order when there are several ways to print something. For enum_option<T> we
happen to have to conflicting ways to print it:
1. We have an explicit operator<<.
2. We have an *implicit* convertor to the type held by T.
We were hoping that the operator<< always wins. But in fmt 8.1, there is
special logic that if the type is convertable to an int, this is used
before operator<<()! For experimental_features_t, the type held in it was
an old-style enum, so it is indeed convertible to int.
The solution I used in this patch is to replace the old-style enum
in experimental_features_t by the newer and more recommended "enum class",
which does not have an implicit conversion to int.
I could have fixed it in other ways, but it wouldn't have been much
prettier. For example, dropping the implicit convertor would require
us to change a bunch of switch() statements over enum_option (and
not just experimental_features_t, but other types of enum_option).
Going forward, all uses of enum_option should use "enum class", not
"enum". tri_mode_restriction_t was already using an enum class, and
now so does experimental_features_t. I changed the examples in the
comments to also use "enum class" instead of enum.
This patch also adds to the existing experimental_features test a
check that the feature names are words that are not numbers.
Fixes#11003.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closes#11004
Currently, applying schema mutations involves flushing all schema
tables so that on restart commit log replay is performed on top of
latest schema (for correctness). The downside is that schema merge is
very sensitive to fdatasync latency. Flushing a single memtable
involves many syncs, and we flush several of them. It was observed to
take as long as 30 seconds on GCE disks under some conditions.
This patch changes the schema merge to rely on a separate commit log
to replay the mutations on restart. This way it doesn't have to wait
for memtables to be flushed. It has to wait for the commitlog to be
synced, but this cost is well amortized.
We put the mutations into a separate commit log so that schema can be
recovered before replaying user mutations. This is necessary because
regular writes have a dependency on schema version, and replaying on
top of latest schema satisfies all dependencies. Without this, we
could get loss of writes if we replay a write which depends on the
latest schema on top of old schema.
Also, if we have a separate commit log for schema we can delay schema
parsing for after the replay and avoid complexity of recognizing
schema transactions in the log and invoking the schema merge logic.
I reproduced bad behavior locally on my machine with a tired (high latency)
SSD disk, load driver remote. Under high load, I saw table alter (server-side part) taking
up to 10 seconds before. After the patch, it takes up to 200 ms (50:1 improvement).
Without load, it is 300ms vs 50ms.
Fixes#8272Fixes#8309Fixes#1459Closes#10333
* github.com:scylladb/scylla:
config: Introduce force_schema_commit_log option
config: Introduce unsafe_ignore_truncation_record
db: Avoid memtable flush latency on schema merge
db: Allow splitting initiatlization of system tables
db: Flush system.scylla_local on change
migration_manager: Do not drop system.IndexInfo on keyspace drop
Introduce SCHEMA_COMMITLOG cluster feature
frozen_mutation: Introduce freeze/unfreeze helpers for vectors of mutations
db/commitlog: Improve error messages in case of unknown column mapping
db/commitlog: Fix error format string to print the version
db: Introduce multi-table atomic apply()
The node now refuses to boot if schema tables were truncated.
This adds a config option to ignore truncation records as a
workaround if user truncated them manually.
The option is used, but is not implemented. If attaching implementation
to it right a once the compaction will slow down to 16MB/s on all nodes.
Make it zero (unbound) by default and mard live-updateable while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache (e.g. Adding a user and changing their permissions)
can become pretty annoying.
This patch series make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
It also adds an API for flushing the cache to make it easier for users who
don't want to modify their permissions_cache config.
branch: https://github.com/igorribeiroduarte/scylla/tree/make_permissions_cache_live_updateable
CI: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/1005/
dtests: https://github.com/igorribeiroduarte/scylla-dtest/tree/test_permissions_cache
* https://github.com/igorribeiroduarte/scylla/make_permissions_cache_live_updateable:
loading_cache_test: Test loading_cache::reset and loading_cache::update_config
api: Add API for resetting authorization cache
authorization_cache: Make permissions cache and authorized prepared statements cache live updateable
auth_prep_statements_cache: Make aut_prep_statements_cache accept a config struct
utils/loading_cache.hh: Add update_config method
utils/loading_cache.hh: Rename permissions_cache_config to loading_cache_config and move it to loading_cache.hh
utils/loading_cache.hh: Add reset method
Currently, for users who have permissions_cache configs set to very high
values (and thus can't wait for the configured times to pass) having to restart
the service every time they make a change related to permissions or
prepared_statements cache(e.g.: Adding a user) can become pretty annoying.
This patch make permissions_validity_in_ms, permissions_update_interval_in_ms
and permissions_cache_max_entries live updateable so that restarting the
service is not necessary anymore for these cases.
Signed-off-by: Igor Ribeiro Barbosa Duarte <igor.duarte@scylladb.com>
Add column_index_auto_scale_threshold_in_kb to the configuration
(defaults to 10MB).
When the promoted index (serialized) size gets to this
threshold, it's halved by merging each two adjacent blocks
into one and doubling the desired_block_size.
Fixes#4217
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Before this patch, the experimental TTL (expiration time) feature in
Alternator scans tables for expiration in a tight loop - starting the
next scan one second after the previous one completed.
In this patch we introduce a new configuration option,
alternator_ttl_period_in_seconds, which determines how frequently
to start the scan. The default is 24 hours - meaning that the next
scan is started 24 hours after the previous one started.
The tests (test/alternator/run) change this configuration back to one
second, so that expiration tests finish as quickly as possible.
Please note that the scan is *not* slowed down to fill this 24 hours -
if it finishes in one hour, it will then sleep for 23 hours. Additional
work would be needed to slow down the scan to not finish too quickly.
One idea not yet implemented is to move the expiration service from
the "maintenance" scheduling group which it uses today to a new
scheduling group, and modifying the number of shares that this group
gets.
Another thing worth noting about the configurable period (which defaults
to 24 hours) is that when TTL is enabled on an Alternator table, it can
take that amount of time until its scan starts and items start expiring
from it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Initialize it to "md" until ME format support is
complete (i.e. storing originating host id in sstable stats metadata
is implemented), so at present there is no observable change by
default.
Also declare "enable_sstables_md_format" unused -- the idea, going
forward, being that only "sstable_format" controls the written sstable
file format and that no more per-format enablement config options
shall be added.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
This reverts commit 23da2b5879. It causes
the node to quickly run out of memory when many schema changes are made
within a small time window.
Fixes#10071.
The system.config virtual tables prints each configuration variable of
type T based on the JSON printer specified in the config_type_for<T>
in db/config.cc.
For two variable types - experimental_features and tri_mode_restriction,
the specified converter was wrong: We used value_to_json<string> or
value_to_json<vector<string>> on something which was *not* a string.
Unfortunately, value_to_json silently casted the given objects into
strings, and the result was garbage: For example as noted in #10047,
for experimental_features instead of printing a list of features *names*,
e.g., "raft", we got a bizarre list of one-byte strings with each feature's
number (which isn't documented or even guaranteed to not change) as well
as carriage-return characters (!?).
So solution is a new printable_to_json<T> which works on a type T that
can be printed with operator<< - as in fact the above two types can -
and the type is converted into a string or vector of strings using this
operator<<, not a cast.
Also added a cql-pytest test for reading system.config and in particular
options of the above two types - checking that they contain sensible
strings and not "garbage" like before this patch.
Fixes#10047.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220209090421.298849-1-nyh@scylladb.com>
If version is absent in cache, it will be fetched from the
coordinator. This is not expensive, but if the version is not known,
it must be also "synced". It means that the node will do a full schema
pull from the coordinator. This pull is expensive and can take seconds.
If the coordinator we pull from is at an old version, the pull will do
nothing and current node will soon forget the old version, initiating
another pull.
If some nodes stay at an old version for a long time for some reason,
this will make new coordinators initiate pulls frequently.
Increase the expiration period to 15 minutes to reduce the impact in
such scenarios.
Fixes#10042.
Message-Id: <20220207122317.674241-1-tgrabiec@scylladb.com>
The system.config table allows changing config parameters, but this
change doesn't survive restarts and is considered to be dangerous
(sometimes). Add an option to disable the table updates. The option
is LiveUpdate and can be set to false via CQL too (once).
fixes#9976
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220201121114.32503-1-xemul@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
Just in case the new algorithm turns out to be buggy, or give a
performance regression, add a flag to fall-back to the old algorithm for
use in the field.
The gc_grace_seconds is a very fragile and broken design inherited from
Cassandra. Deleted data can be resurrected if cluster wide repair is not
performed within gc_grace_seconds. This design pushes the job of making
the database consistency to the user. In practice, it is very hard to
guarantee repair is performed within gc_grace_seconds all the time. For
example, repair workload has the lowest priority in the system which can
be slowed down by the higher priority workload, so that there is no
guarantee when a repair can finish. A gc_grace_seconds value that is
used to work might not work after data volume grows in a cluster. Users
might want to avoid running repair during a specific period where
latency is the top priority for their business.
To solve this problem, an automatic mechanism to protect data
resurrection is proposed and implemented. The main idea is to remove the
tombstone only after the range that covers the tombstone is repaired.
In this patch, a new table option tombstone_gc is added. The option is
used to configure tombstone gc mode. For example:
1) GC a tombstone after gc_grace_seconds
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ;
This is the default mode. If no tombstone_gc option is specified by the
user. The old gc_grace_seconds based gc will be used.
2) Never GC a tombstone
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'};
3) GC a tombstone immediately
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'};
4) GC a tombstone after repair
cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'};
In addition to the 'mode' option, another option 'propagation_delay_in_seconds'
is added. It defines the max time a write could possibly delay before it
eventually arrives at a node.
A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc
option can only be used after the whole cluster supports the new
feature. A mixed cluster works with no problem.
Tests: compaction_test.py, ninja test
Fixes#3560
[avi: resolve conflicts vs data_dictionary]
This change makes row cache support reverse reads natively so that reversing wrappers are not needed when reading from cache and thus the read can be executed efficiently, with similar cost as the forward-order read.
The database is serving reverse reads from cache by default after this. Before, it was bypassing cache by default after 703aed3277.
Refs: #1413
Tests:
- unit [dev]
- manual query with build/dev/scylla and cache tracing on
Closes#9454
* github.com:scylladb/scylla:
tests: row_cache: Extend test_concurrent_reads_and_eviction to run reverse queries
row_cache: partition_snapshot_row_cursor: Print more details about the current version vector
row_cache: Improve trace-level logging
config: Use cache for reversed reads by default
config: Adjust reversed_reads_auto_bypass_cache description
row_cache: Support reverse reads natively
mvcc: partition_snapshot: Support slicing range tombstones in reverse
test: flat_mutation_reader_assertions: Consume expected range tombstones before end_of_partition
row_cache: Log produced range tombstones
test: Make produces_range_tombstone() report ck_ranges
tests: lib: random_mutation_generator: Extract make_random_range_tombstone()
partition_snapshot_row_cursor: Support reverse iteration
utils: immutable-collection: Make movable
intrusive_btree: Make default-initialized iterator cast to false
Consider
1) n1, n2, n3, n4, n5
2) n2 and n3 are both down
3) start n6 to replace n2
4) start n7 to replace n3
We want to replace the dead nodes n2 and n3 to fix the cluster to have 5
running nodes.
Replace operation in step 3 will fail because n3 is down.
We would see errors like below:
replace[25edeec0-57d4-11ec-be6b-7085c2409b2d]: Nodes={127.0.0.3} needed
for replace operation are down. It is highly recommended to fix the down
nodes and try again.
In the above example, currently, there is no way to replace any of the
dead nodes.
Users can either fix one of the dead nodes and run replace or run
removenode operation to remove one of the dead nodes then run replace
and run bootstrap to add another node.
Fixing dead nodes is always the best solution but it might not be
possible. Running removenode operation is not better than running
replace operation (with best effort by ignoring the other dead node) in
terms of data consistency. In addition, users have to run bootstrap
operation to add back the removed node. So, allowing replacing in such
case is a clear win.
This patch adds the --ignore-dead-nodes-for-replace option to allow run
replace operation with best effort mode. Please note, use this option
only if the dead nodes are completely broken and down, and there is no
way to fix the node and bring it back. This also means the user has to
make sure the ignored dead nodes specified are really down to avoid any
data consistency issue.
Fixes#9757Closes#9758
* seastar f8a038a0a2...8d15e8e67a (21):
> core/program_options: preserve defaultness of CLI arguments
> log: Silence logger when logging
> Include the core/loop.hh header inside when_all.hh header
> http: Fix deprecated wrappers
> foreign_ptr: Add concept
> util: file: add read_entire_file
> short_streams: move to util
> Revert "Merge: file: util: add read_entire_file utilities"
> foreign_ptr: declare destroy as a static method
> Merge: file: util: add read_entire_file utilities
> Merge "output_stream: handle close failure" from Benny
> net: bring local_address() to seastar::connected_socket.
> Merge "Allow programatically configuring seastar" from Botond
> Merge 'core: clean up memory metric definitions' from John Spray
> Add PopOS to debian list in install-dependencies.sh
> Merge "make shared_mutex functions exception safe and noexcept" from Benny
> on_internal_error: set_abort_on_internal_error: return current state
> Implementation of iterator-range version of when_any
> net: mark functions returning ethernet_address noexcept
> net: ethernet_address: mark functions noexcept
> shared_mutex: mark wake and unlock methods noexcept
Contains patch from Botond Dénes <bdenes@scylladb.com>:
db/config: configure logging based on app_template::seastar_options
Scylla has its own config file which supports configuring aspects of
logging, in addition to the built-in CLI logging options. When applying
this configuration, the CLI provided option values have priority over
the ones coming from the option file. To implement this scylla currently
reads CLI options belonging to seastar from the boost program options
variable map. The internal representation of CLI options however do not
constitute an API of seastar and are thus subject to change (even if
unlikely). This patch moves away from this practice and uses the new
shiny C++ api: `app_template::seastar_options` to obtain the current
logging options.
In the very recent commit 3c0e703 fixing issue #8757, we changed the
default prometheus_address setting in scylla.yaml to "localhost", to
match the default listen_address in the same file. We explained in that
commit how this helped developers who use an unchanged scylla.yaml, and
how it didn't hurt pre-existing users who already had their own scylla.yaml.
However, it was quickly noted by Tzach and Amnon that there is one use case
that was hurt by that fix:
Our existing documentation, such as the installation guide
https://www.scylladb.com/download/?platform=centos ask the user to take
our initial scylla.yaml, and modify listen_address, rpc_address, seeds,
and cluster_name - and that's it. That document - and others - don't
tell the user to also override prometheus_address, so users will likely
forget to do so - and monitoring will not work for them.
So this patch includes a different solution to #8757.
What it does is:
1. The setting of prometheus_address in scylla.yaml is commented out.
2. In config.cc, prometheus_address defaults to empty.
3. In main.cc, if prometheus_address is empty (i.e., was not explicitly
set by the user), the value of listen_address is used instead.
In other words, the idea is that prometheus_address, if not explicitly set
by the user, should default to listen_address - which is the address used
to listen to the internal Scylla inter-node protocol.
Because the documentation already tells the user to set listen_address
and to not leave it set to localhost, setting it will also open up
prometheus, thereby solving #9701. Meanwhile, developers who leave the
default listen_address=localhost will also get prometheus_address=localhost,
so the original #8757 is solved as well. Finally, for users who had an old
scylla.yaml where prometheus_address was explicitly set to something,
this setting will continue to be used. This was also a requirement of
issue #8757.
Fixes#9701.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211129155201.1000893-1-nyh@scylladb.com>
The experimental_features_t has an all() method, supposedly returning
all values of the enum - but it's easy to forget to update it when
adding a new experimental feature - and it's currently out-of-sync
(it's missing the ALTERNATOR_TTL option).
We already have another method, map(), where a new experimental feature
must be listed otherwise it can't be used, so let's just take all()'s
values from map(), automatically, instead of forcing developers to keep
both lists up-to-date.
Note that using the all() function to enable all experimental features
is not recommended - the best practice is to enable specific experimental
features, not all of them. Nevertheless, this all() function is still used
in one place - in the cql_repl tool - which uses it to enable all
experimental features.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211108135601.78460-1-nyh@scylladb.com>
Fixes#9630
Adds support for importing a CRL certificate reovcation list. This will be
monitored and reloaded like certs/keys. Allows blacklisting individual certs.
Closes#9655
Other than looking sane, this change continues the founded by the
--workdir option tradition of freeing the developer form annoying
necessity to type too many options when scylla is started by hand
for devel purposes.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20211116104815.31822-1-xemul@scylladb.com>
There soon will appear an updateable system.config table that
will push sstrings into names_value-s. Prepare for this change
by adding the respective .set_value() call. Since the update
only works for LiveUpdate-able options, and inability to do it
can be propagated back to the caller make this method return
true/false whether the update took place or not.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The helper works on named_value() and throws and exception containing
the option name for convenient error reporting.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Setting a value of "warn" will still allow the create or alter commands,
but will warn the user, with a message that will appear both at the log
and also at cqlsh for example.
This is another step towards deprecating DTCS. Users need to know we're
moving towards this direction, and setting the default value to warn
is needed for this. Next step is to set it to false, and finally remove
it from the code base.
Refs #8914.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211029184503.102936-1-raphaelsc@scylladb.com>
When an experimental feature graduates from being experimental, we want
to continue allow the old "--experimental-features=..." option to work,
in case some user's configuration uses it - just do nothing. The way
we do it is to map in db::experimental_features_t::map() the feature's
name to the UNUSED value - this way the feature's name is accepted, but
doesn't change anything.
When the CDC feature graduated from being experimental, a new bit
UNUSED_CDC was introduced to do the same thing. This separate bit was
not actually necessary - if we ever check for UNUSED_CDC bit anywhere
in the code it means the flag isn't actually unused ;-) And we don't
check it.
So simplify the code by conflating UNUSED_CDC into UNUSED.
This will also make it easy to build from db::experimental_features_t::map()
a list of current experimental features - now it will simply be those that
do not map to UNUSED.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013105107.123544-1-nyh@scylladb.com>
Earlier we added experimental (and very incomplete) support for
Alternator's TTL feature, but forgot to set a *name* for this
experimental feature. As a result, this feature can be enabled only with
the blanket "--experimental" option and not with a specific
"--experimental-features=..." option.
Since issue #9467 deprecated the blanket "--experimental" option
and users are encouraged to only enable specific experimental
features, it is important that we have a name for it.
So the name chosen in this patch is "alternator-ttl".
Eventually this feature might evolve beyond Alternator-only,
but for now, I think it's a good name and we'll probably
graduate the experimental Alternator TTL feature before
supporting CQL, so it will be a new experimental feature
anyway.
Refs #9467.
db/config.cc
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211012110312.719654-1-nyh@scylladb.com>
The help string from the "--experimental-features" command-line option
lists the available experimental features, to helping a user who might
want to enable them. But this help string was manually written, and has
since drifted from reality:
* Two of the listed "experimental" features, cdc and lwt, have actually
graduated from being experimental long ago. Although technically a user
may still use the words "cdc" and "lwt" in the "experimental-features"
parameter, doing so is pointless, and worse: This text in the help
string can mislead a user into thinking that these two features are
still experimental - while they are not!
* One experimental feature - alternator-ttl - is missing from this list.
Instead of updating the help string text now - and needing to do this
again and again in the future as we change experimental features - what
this patch does is to construct the list of features automatically from
the map of supported feature names - excluding any features which map
to UNUSED.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20211013122635.132582-1-nyh@scylladb.com>
The documentation for --experimental config option states
that it enables all experimental features, but this is no
longer true, i.e.: raft feature is not enabled with it and
should be explicitly enabled via `--experimental-features=raft`
switch (we don't want to enable it by default alongside
other features).
Since the flag doesn't do what it's intended to, we should
mark it as "deprecated", because documenting each exception
(there could be more than only raft in the future) will be
a burden and docs will constantly go out-of-sync with the
code.
Adjust the description for the option to reflect that, mark
it "deprecated" and suggest using --experimental-features, instead.
Fixes: #9467
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20211012093005.20871-2-pa.solodovnikov@scylladb.com>
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.
The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.
We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.
We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.
We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.
The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.
Details in commit messages. Read the commits in topological order
for best review experience.
Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)
Closes#9281
* github.com:scylladb/scylla:
table: add option to automatically bypass cache for reversed queries
test: reverse sstable reader with random schema and random mutations
sstables: mx: implement reversed single-partition reads
sstables: mx: introduce partition_reversing_data_source
sstables: index_reader: add support for iterating over clustering ranges in reverse
clustering_key_filter: clustering_key_filter_ranges owning constructor
flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
clustering_key_filter: document clustering_key_filter_ranges::get_ranges
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.
If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
The --replace-token and --replace-node were added some time ago, but
have never been used since then, just parsed and immediatelly aborted.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210930102222.16294-1-xemul@scylladb.com>