Implement the new data_dictionary interface using the existing
::database, ::keyspace, and ::table classes. The implementation
is straightforward. This will allow the coordinator code to access
the full schema without depending on the gnarly bits that compose
::database, like reader_concurrency_semaphore or the backlog
controller.
Add metadata-only counterparts to ::database, ::keyspace, and ::table.
Apart from being metadata-only objects suitable for the coordinator,
the new types are also type-erased and so they can be mocked without
being linked to ::database and friends.
We use a single abstract class to mediate between data_dictionary
objects and the objects they represent (data_dictionary::impl).
This makes the data_dictionary objects very lightweight - they
only contain a pointer to the impl object (of which only one
needs to be instantiated), and a reference to the object that
is represented. This allows these objects to be easily passed
by value.
The abstraction is leaky: in one place it is outright breached
with data_dictionary::database::real_database() that returns
a ::database reference. This is used so that we can perform the
transition incrementally. Another place is that one of the
methods returns a secondary_index_manager, which in turn grants
access to the real objects. This will be addressed later, probably
by type erasing as well.
This patch only contains the interface, and no implementation. It
is somewhat messy since it mimics the years-old evolution of the
real objects, but maybe it will be easier to improve it now.
The new module will contain all schema related metadata, detached from
actual data access (provided by the database class). User types is the
first contents to be moved to the new module.
These were thrown in the respective database::find_*
function as nested exception since
d3fe0c5182.
Wrapping them in nested exceptions just makes it
harder to figure out and work with and apprently serves
no purpose.
Without these nested_exception we can correctly detect
internal errors when synchronously failing to find
a uuid returned by find_uuid(ks, cf).
Fixes#9753
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
find_uuid returns a uuid found for ks_name.table_name.
In some cases, we immediately and synchronously use that
uuid to lookup other information like the table&
or the schema. Failing to find that uuid indicates
an internal error when no preemption is possible.
Note that yielding could allow deletion of the table
to sneak in and invalidate the uuis.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than masquerading all errors as std::out_of_range("")
convert only the std::out_of_range error from _ks_cf_to_uuid.at()
to no_such_column_family(ks, cf). That relieves all callers of
fund_uuid from doing that conversion themselves.
For example, get_uuid in api/column_family now only deals with converting
no_such_column_family to bad_param_exception, as it needs to do
at the api level, rather than generating a similar error from scratch.
Other call sites required no intervention.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Currently stateful (readers being saved and resumed on page boundaries)
multi-range scans are broken in multiple ways. Trying to use them can
result in anything from use-after-free (#6716) or getting corrupt data
(#9718). Luckily no-one is doing such queries today, but this started to
change recently as code such as Alternator TTL and distributed
aggregate reads started using this.
This series fixes both problems and adds a unit test too exercising this
previously completely unused code-path.
Fixes: #6716Fixes: #9718
Tests: unit(dev, release, debug)
"
* 'fix-stateful-multi-range-scans/v1' of https://github.com/denesb/scylla:
test/boost/multishard_mutation_query_test: add multi-range test
test/boost/multishard_mutation_query_test: add multi-range support
multishard_mutation_query: don't drop data during stateful multi-range reads
multishard_combining_reader: reader_lifecycle_policy: allow saving read range on fast-forward
Define the "staging", "upload", and "snapshots" subdirectory
names as named const expressions in the sstables namespace
rather than relying on their string representation,
that could lead to typo mistakes.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The reader_lifecycle_policy API was created around the idea of shard
readers (optionally) being saved and reused on the next page. To do
this, the lifecycle policy has to also be able to control the lifecycle
of by-reference parameters of readers: the slice and the range. This was
possible from day 1, as the readers are created through the lifecycle
policy, which can intercept and replace the said parameters with copies
that are created in stable storage. There was one whole in the design
though: fast-forwarding, which can change the range of the read, without
the lifecycle policy knowing about this. In practice this results in
fast-forwarded readers being saved together with the wrong range, their
range reference becoming stale. The only lifecycle implementation prone
to this is the one in `multishard_mutation_query.cc`, as it is the only
one actually saving readers. It will fast-forward its reader when the
query happens over multiple ranges. There were no problems related to
this so far because no one passes more than one range to said functions,
but this is incidental.
This patch solves this by adding an `update_read_range()` method to the
lifecycle policy, allowing the shard reader to update the read range
when being fast forwarded. To allow the shard reader to also have
control over the lifecycle of this range, a shared pointer is used. This
control is required because when an `evictable_reader` is the top-level
reader on the shard, it can invoke `create_reader()` with an edited
range after `update_read_range()`, replacing the fast-forwarded-to
range with a new one, yanking it out from under the feet of the
evictable reader itself. By using a shared pointer here, we can ensure
the range stays alive while it is the current one.
Don't shutdown the keyspaces just yet,
since they are needed during shutdown.
FIXME: restore when #8995 is fixed and no queries are issued
after the database shuts down.
Refs #8995Fixes#9684
Test: unit(dev)
- scylla-gdb test fails locally with #9677
DTest: update_cluster_layout_tests.py:TestUpdateClusterLayout.simple_add_new_node_while_adding_info_{1,2}_test(dev)
- running now into #8995. dtest fails with unexpected error: "storage_proxy - Exception when communicating with
127.0.62.4, to read from system_distributed.service_levels:
seastar::gate_closed_exception (gate closed)"
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211127083348.146649-2-bhalevy@scylladb.com>
release the keyspace effective_replication_map during
shutdown so that effective_replication_map_factory
can be stopped cleanly with no outstanding e_r_m:s.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
fmt 8 checks format strings at compile time, and requires that
non-compile-time format strings be wrapped with fmt::runtime().
Do that, and to allow coexistence with fmt 7, supply our own
do-nothing version of fmt::runtime() if needed. Strictly speaking
we shouldn't be introducing names into the fmt namespace, but this
is transitional only.
Closes#9640
"
Scylla can be configured via a bunch of config files plus
a bunch of commandline options. Collecting these altogether
can be challenging.
The proposed table solves a big portion of this by dupming
the db::config contents as a table. For convenience (and,
maybe, to facilitate Benny's CLI) it's possible to update
the 'value' column of the table with CQL request.
There exists a PR with a table that exports loglevels in a
form of a table. The updating technique used in this set
is applicable to that table as well.
tests: compilation(dev, release, debug), unit(debug)
"
* 'br-db-config-virtual-table-3' of https://github.com/xemul/scylla:
tests: Unit test for system.config virtual table
system_keyspace: Table with config options
code: Push db::config down to virtual tables
storage_proxy: Propagate virtual table exceptions messages
table: Virtual writer hook (mutation applier)
table: Rewrap table::apply()
table: Mark virtual reader branch with unlikely
utils: Add config_src::source_name() method
utils: Ability to set_value(sstring) for an option
utils: Internal change of config option
utils: Mark some config_file methods noexcept
The main motivation is to have future returning apply (to be used
by next patches). As a side effect -- indentation fix and private
dirty_memory_region_group() method.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
Draining the database is now scattered across the do_drain()
method of the storage_service. Also it tells shutdown drain
from API drain.
This set packs this logic into the database::drain() method.
tests: unit(dev), start-stop-drain(dev)
"
* 'br-database-drain' of https://github.com/xemul/scylla:
database, storage_service: Pack database::drain() method
storage_service: Shuffle drain sequence
storage_service, database: Move flush-on-drain code
storage_service: Remove bool from do_drain
The storage_service::do_drain() now ends up with shutting down
compaction manager, flushing CFs and shutting down commitlog.
All three belong to the database and deserve being packed into
a single database::drain() method.
A note -- these steps are cross-shard synchronized, but database
already has a barrier for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Flushing all CFs on shutdown is now fully managed in storage service
and it looks weird. Some better place for it seems to be the database
itself.
Moving the flushing code also imples moving the drain_progress thing
and patching the relevant API call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
That's intended to fix a bad layer violation as table was given the
responsibility of disabling compaction for a given table T, but that
logic clearly belongs to compaction_manager instead.
Additionally, gate will be used instead of counter, as former provides
manager with a way to synchronize with functions running under
run_with_compaction_disabled. so remove() can wait for their
termination.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
We shouldn't be using Seastar as a text formatting library; that's
not its focus. Use fmt directly instead. fmt::print() doesn't return
the output stream which is a minor inconvenience, but that's life.
Closes#9556
Currently we update the effective_replication_map
only on non-system keyspace, leaving the system keyspace,
that uses the local replication strategy, with the empty
replication_map, as it was first initialized.
This may lead to a crash when get_ranges is called later
as seen in #9494 where get_ranges was called from the
perform_sstable_upgrade path.
This change updates the effective_replication_map on all
keyspaces rather than just on the non-system ones
and adds a unit test that reproduces #9494 without
the fix and passes with it.
Fixes#9494
Test: unit(dev), database_test(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211020143217.243949-1-bhalevy@scylladb.com>
And use it from cql3 check_restricted_replication_strategy and
keyspace_metadata ctor that defined their own `replication_class_strategy`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It is not used any more.
Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Provide a sync get_ranges method by effective_replication_map
that uses the precalculated map to get all token ranges owned by or
replicated on a given endpoint.
Reuse do_get_ranges as common infrastructure for all
3 cases: get_ranges, get_primary_ranges, and get_primary_ranges_within_dc.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And functions that use it, like:
keyspace::update_from
database::update_keyspace
database::create_in_memory_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
A switch statement where every case returns triggers a gcc
warning if the surrounding function doesn't return/abort.
Fix by adding an abort(). The abort() will never trigger since we
have a warning on unhandled switch cases.
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.
The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.
We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.
We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.
We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.
The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.
Details in commit messages. Read the commits in topological order
for best review experience.
Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)
Closes#9281
* github.com:scylladb/scylla:
table: add option to automatically bypass cache for reversed queries
test: reverse sstable reader with random schema and random mutations
sstables: mx: implement reversed single-partition reads
sstables: mx: introduce partition_reversing_data_source
sstables: index_reader: add support for iterating over clustering ranges in reverse
clustering_key_filter: clustering_key_filter_ranges owning constructor
flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
clustering_key_filter: document clustering_key_filter_ranges::get_ranges
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.
If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
This PR adds the function:
```c++
constant evaluate(const expression&, const query_options&);
```
which evaluates the given expression to a constant value.
It binds all the bound values, calls functions, and reduces the whole expression to just raw bytes and `data_type`, just like `bind()` and `get()` did for `term`.
The code is often similar to the original `bind()` implementation in `lists.cc`, `sets.cc`, etc.
* For some reason in the original code, when a collection contains `unset_value`, then the whole collection is evaluated to `unset_value`. I'm not sure why this is the case, considering it's impossible to have `unset_value` inside a collection, because we forbid bind markers inside collections. For example here: cc8fc73761/cql3/lists.cc (L134)
This seems to have been introduced by Pekka Enberg in 50ec81ee67, but he has left the company.
I didn't change the behaviour, maybe there is a reason behind it, although maybe it would be better to just throw `invalid_request_exception`.
* There was a strange limitation on map key size, it seems incorrect: cc8fc73761/cql3/maps.cc (L150), but I left it in.
* When evaluating a `user_type` value, the old code tolerated `unset_value` in a field, but it was later converted to NULL. This means that `unset_value` doesn't work inside a `user_type`, I didn't change it, will do in another PR.
* We can't fully get rid of `bind()` yet, because it's used in `prepare_term` to return a `terminal`. It will be removed in the next PR, where we finally get rid of `term`.
Closes#9353
* github.com:scylladb/scylla:
cql3: types: Optimize abstract_type::contains_collection
cql3: expr: Convert evaluate_IN_list to use evaluate(expression)
cql3: expr: Use only evaluate(expression) to evaluate term
cql3: expr: Implement evaluate(expr::function_call)
cql3: expr: Implement evaluate(expr::usertype_constructor)
cql3: expr: Implement evaluate(expr::collection_constructor)
cql3: expr: Implement evaluate(expr::tuple_constructor)
cql3: expr: Implement evaluate(expr::bind_variable)
cql3: Add contains_collection/set_or_map to abstract_type
cql3: expr: Add evaluate(expression, query_options)
cql3: Implement term::to_expression for function_call
cql3: Implement term::to_expression for user_type
cql3: Implement term::to_expression for collections
cql3: Implement term::to_expression for tuples
cql3: Implement term::to_expression for marker classes
cql3: expr: Add data_type to *_constructor structs
cql3: Add term::to_expression method
cql3: Reorganize term and expression includes
There's a circular dependency:
query processor needs database
database owns large_data_handler and compaction_manager
those two need qctx
qctx owns a query_processor
Respectively, the latter hidden dependency is not "tracked" by
constructor arguments -- the query processor is started after
the database and is deferred to be stopped before it. This works
in scylla, because query processor doesn't really stop there,
but in cql_test_env it's problematic as it stops everything,
including the qctx.
Recent database start-stop sanitation revealed this problem --
on database stop either l.d.h. or compaction manager try to
start (or continue) messing with the query processor. One problem
was faced immediatelly and pluged with the 75e1d7ea safety check
inside l.d.h., but still cql_test_env tests continue suffering
from use after free on stopped query processor.
The fix is to partially revert the 4b7846da by making the tests
stop some pieces of the database (inclusing l.d.h. and compaction
manager) as it used to before. In scylla this is, probably, not
needed, at least now -- the database shutdown code was and still
is run right before the stopping one.
tests: unit(debug)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210924080248.11764-1-xemul@scylladb.com>
Make term.hh include expression.hh instead of the other way around.
expression can't be forward declared.
expression is needed in term.hh to declare term::to_expression().
Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>
Database no longer needs it. Since the only user of the old-style
notification is gone -- remove it as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
On start database is subscribed on messaging-service connection drop
notification to drop the hit-rate from column families. However, the
updater and reader of those hit-rates is the storage_proxy, so it
must be the _proxy_ who drops the hit-rate.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Tests don't have sstable format selector and enforce the needed
format by hands with the help of special database:: method. It's
more natural to provide it via convig. Doing this makes database
initialization in main and cql_test_env closer to each other.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Setting sstables format into database and into sstables_manager is
all plain assignments. Mark them as noexcept, next patch will become
apparently exception safe after that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some large-data-handler-related helpers left after previous
patches, they can be removed altogehter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After stop_database() became shard-local, it's possible to merge
it with database::stop() as they are both called one after another
on scylla stop. In cql-test-env there are few more steps in
between, but they don't rely on the database being partially
stopped.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method need to perform four steps cross-shard synchronously:
first stop compaction manager, then close user and, after it,
system tables, finally shutdown the large data handler.
This patch reworks this synchronization with the help of cross-shard
barrier added to the database previously. The motivation is to merge
.stop_database() with .stop().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Make sure a node-wide barrier exists on a database when scylla starts.
Also provide a barrier for cql_test_env. In all other cases keep a
solo-mode barrier so that single-shard db stop doesn't get blocked.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>