release the keyspace effective_replication_map during
shutdown so that effective_replication_map_factory
can be stopped cleanly with no outstanding e_r_m:s.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Scylla can be configured via a bunch of config files plus
a bunch of commandline options. Collecting these altogether
can be challenging.
The proposed table solves a big portion of this by dupming
the db::config contents as a table. For convenience (and,
maybe, to facilitate Benny's CLI) it's possible to update
the 'value' column of the table with CQL request.
There exists a PR with a table that exports loglevels in a
form of a table. The updating technique used in this set
is applicable to that table as well.
tests: compilation(dev, release, debug), unit(debug)
"
* 'br-db-config-virtual-table-3' of https://github.com/xemul/scylla:
tests: Unit test for system.config virtual table
system_keyspace: Table with config options
code: Push db::config down to virtual tables
storage_proxy: Propagate virtual table exceptions messages
table: Virtual writer hook (mutation applier)
table: Rewrap table::apply()
table: Mark virtual reader branch with unlikely
utils: Add config_src::source_name() method
utils: Ability to set_value(sstring) for an option
utils: Internal change of config option
utils: Mark some config_file methods noexcept
The db::config reference is available on the database, which
can be get from the virtual_table itself. The problem is that
it's a const refernece, while system.config will be updateable
and will need non-const reference.
Adding non-const get_config() on the database looks wrong. The
database shouldn't be used as config provider, even the const
one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Symmetrically to virtual reader one, add the virtual writer
callback on a table that will be in charge of applying the
provided mutation.
If a virtual table doesn't override this apply method the
dedicated exception is thrown. Next patch will catch it and
propagate back to caller, so it's a new exception type, not
existing/std one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The main motivation is to have future returning apply (to be used
by next patches). As a side effect -- indentation fix and private
dirty_memory_region_group() method.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
Draining the database is now scattered across the do_drain()
method of the storage_service. Also it tells shutdown drain
from API drain.
This set packs this logic into the database::drain() method.
tests: unit(dev), start-stop-drain(dev)
"
* 'br-database-drain' of https://github.com/xemul/scylla:
database, storage_service: Pack database::drain() method
storage_service: Shuffle drain sequence
storage_service, database: Move flush-on-drain code
storage_service: Remove bool from do_drain
"
table_state is being introduced for compaction subsystem, to remove table dependency
from compaction interface, fix layer violations, and also make unit testing
easier as table_state is an abstraction that can be implemented even with no
actual table backing it.
In this series, compaction strategy interfaces are switching to table_state,
and eventually, we'll make compact_sstables() switch to it too. The idea is
that no compaction code will directly reference a table object, but only work
with the abstraction instead. So compaction subdirectory can stop
including database.hh altogether, which is a great step forward.
"
* 'table_state_v5' of https://github.com/raphaelsc/scylla:
sstable_compaction_test: switch to table_state
compaction: stop including database.hh for compaction_strategy
compaction: switch to table_state in estimated_pending_compactions()
compaction: switch to table_state in compaction_strategy::get_major_compaction_job()
compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction()
DTCS: reduce table dependency for task estimation
LCS: reduce table dependency for task estimation
table: Implement table_state
compaction: make table param of get_fully_expired_sstables() const
compaction_manager: make table param of has_table_ongoing_compaction() const
Introduce table_state
The storage_service::do_drain() now ends up with shutting down
compaction manager, flushing CFs and shutting down commitlog.
All three belong to the database and deserve being packed into
a single database::drain() method.
A note -- these steps are cross-shard synchronized, but database
already has a barrier for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Flushing all CFs on shutdown is now fully managed in storage service
and it looks weird. Some better place for it seems to be the database
itself.
Moving the flushing code also imples moving the drain_progress thing
and patching the relevant API call.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is the first implementation of table_state, intended to be used
within compaction. It contains everything needed for compaction
strategies. Subsequently, compaction strategy interface will replace
table by table_state, and later all compaction procedures.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That's intended to fix a bad layer violation as table was given the
responsibility of disabling compaction for a given table T, but that
logic clearly belongs to compaction_manager instead.
Additionally, gate will be used instead of counter, as former provides
manager with a way to synchronize with functions running under
run_with_compaction_disabled. so remove() can wait for their
termination.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently we update the effective_replication_map
only on non-system keyspace, leaving the system keyspace,
that uses the local replication strategy, with the empty
replication_map, as it was first initialized.
This may lead to a crash when get_ranges is called later
as seen in #9494 where get_ranges was called from the
perform_sstable_upgrade path.
This change updates the effective_replication_map on all
keyspaces rather than just on the non-system ones
and adds a unit test that reproduces #9494 without
the fix and passes with it.
Fixes#9494
Test: unit(dev), database_test(debug)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211020143217.243949-1-bhalevy@scylladb.com>
It is not used any more.
Methods either use the token_metadata_ptr in the
effective_replication_map, or receive an ad-hoc
token_metadata.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Every time the token_metadata changes we need to update the
effective_replication_map on all non-system keyspaces.
Do that in replicate_to_all_cores after the updated token_metadata
has been replicated to all cores.
We first prepare and clone the token_metadata, then prepare
and clone the new effective_replication_maps. Any failure
at this stage is recoverable, handle via rollback and the exception
is returned.
Note that any failure to _apply_ the pending token_metadata or the
effective_replication_map will cause scylla to abort.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And functions that use it, like:
keyspace::update_from
database::update_keyspace
database::create_in_memory_keyspace
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Enable creating shared_ptr<BaseClass> in nonstatic_class_registry
using BaseClass::ptr_type and use that for
abstract_replication_strategy.
While at it, also clean up compressor with that respect
to define compressor::ptr_type as shared_ptr<compressor>
thus simplifying compressor_registry.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Besides being more modern and more efficient for
the compiler, this #ifndef block confuses my editor
that greys out the whole block.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
There are two knobs here -- global and per-table one. Both were added
without any synchronisation, but the former one was later fixed to
become serialized and not to be available "too early".
This patch unifies both toggles to be serialized with each-other and
not be enabled too early.
The justification for this change is to move the global toggle from out
of the storage service, as it really belongs to the database, not the
storage service. Respectively, the current synchronization, that depends
on storage service internals, should be replaced with something else.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Until now reversed queries were implemented inside
`querier::consume_page` (more precisely, inside the free function
`consume_page` used by `querier::consume_page`) by wrapping the
passed-in reader into `make_reversing_reader` and then consuming
fragments from the resulting reversed reader.
The first couple of commits change that by pushing the reversing down below
the `make_combined_reader` call in `table::query`. This allows
working on improving reversing for memtables independently from
reversing for sstables.
We then extend the `index_reader` with functions that allow
reading the promoted index in reverse.
We introduce `partition_reversing_data_source`, which wraps an sstable data
file and returns data buffers with contents of a single chosen partition
as if the rows were stored in reverse order.
We use the reversing source and the extended index reader in
`mx_sstable_mutation_reader` to implement efficient (at least in theory)
reversed single-partition reads.
The patchset disables cache for reversed reads. Fast-forwarding
is not supported in the mx reader for reversed queries at this point.
Details in commit messages. Read the commits in topological order
for best review experience.
Refs: #9134
(not saying "Fixes" because it's only for single-partition queries
without forwarding)
Closes#9281
* github.com:scylladb/scylla:
table: add option to automatically bypass cache for reversed queries
test: reverse sstable reader with random schema and random mutations
sstables: mx: implement reversed single-partition reads
sstables: mx: introduce partition_reversing_data_source
sstables: index_reader: add support for iterating over clustering ranges in reverse
clustering_key_filter: clustering_key_filter_ranges owning constructor
flat_mutation_reader: mention reversed schema in make_reversing_reader docstring
clustering_key_filter: document clustering_key_filter_ranges::get_ranges
Currently the new reversing sstable algorithms do not support fast
forwarding and the cache does not yet handle reversed results. This
forced us to disable the cache for reversed queries if we want to
guarantee bounded memory. We introduce an option that does this
automatically (without specifying `bypass cache` in the query) and turn
it on by default.
If the user decides that they prefer to keep the cache at the
cost of fetching entire partitions into memory (which may be viable
if their partitions are small) during reversed queries, the option can
be turned off. It is live-updateable.
"This series removes layer violation in compaction, and also
simplifies compaction manager and how it interacts with compaction
procedure."
* 'compaction_manager_layer_violation_fix/v4' of github.com:raphaelsc/scylla:
compaction: split compaction info and data for control
compaction_manager: use task when stopping a given compaction type
compaction: remove start_size and end_size from compaction_info
compaction_manager: introduce helpers for task
compaction_manager: introduce explicit ctor for task
compaction: kill sstables field in compaction_info
compaction: kill table pointer in compaction_info
compaction: simplify procedure to stop ongoing compactions
compaction: move management of compaction_info to compaction_manager
compaction: move output run id from compaction_info into task
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.
This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.
From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.
This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There are now 231 translation units that indirectly include commitlog.hh
due to the need to have access to db::commitlog::force_sync.
Move that type to a new file commitlog_types.hh and make it available
without access to the commitlog class.
This reduces the number of translation units that depend on commitlog.hh
to 84, improving compile time.
Currently the following can happen:
1) there's ongoing compaction with input sstable A, so sstable set
and backlog tracker both contains A.
2) ongoing compaction replaces input sstable A by B, so sstable set
contains only B now.
3) schema is updated, so a new backlog tracker is built without A
because sstable set now contains only B.
4) ongoing compaction tries to remove A from tracker, but it was
excluded in step 3.
5) tracker can now have a negative value if table is decreasing in
size, which leads to log(<negative number>) == -NaN
This problem happens because backlog tracker updates are decoupled
from sstable set updates. Given that the essential content of
backlog tracker should be the same as one of sstable set, let's move
tracker management to table.
Whenever sstable set is updated, backlog tracker will be updated with
the same changes, making their management less error prone.
Fixes#9157
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
There's a circular dependency:
query processor needs database
database owns large_data_handler and compaction_manager
those two need qctx
qctx owns a query_processor
Respectively, the latter hidden dependency is not "tracked" by
constructor arguments -- the query processor is started after
the database and is deferred to be stopped before it. This works
in scylla, because query processor doesn't really stop there,
but in cql_test_env it's problematic as it stops everything,
including the qctx.
Recent database start-stop sanitation revealed this problem --
on database stop either l.d.h. or compaction manager try to
start (or continue) messing with the query processor. One problem
was faced immediatelly and pluged with the 75e1d7ea safety check
inside l.d.h., but still cql_test_env tests continue suffering
from use after free on stopped query processor.
The fix is to partially revert the 4b7846da by making the tests
stop some pieces of the database (inclusing l.d.h. and compaction
manager) as it used to before. In scylla this is, probably, not
needed, at least now -- the database shutdown code was and still
is run right before the stopping one.
tests: unit(debug)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210924080248.11764-1-xemul@scylladb.com>
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.
This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.
From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.
This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
"
Backlog tracker isn't updated correctly when facing a schema change, and
may leak a SSTable if compaction strategy is changed, which causes
backlog to be computed incorrectly. Most of these problems happen because
sstable set and tracker are updated independently, so it could happen
that tracker lose track (pun intended) of changes applied to set.
The first patch will fix the leak when strategy is changed, and the third
patch will make sure that tracker is updated atomically with sstable set,
so these kind of problems will not happen anymore.
Fixes#9157
test: mode(debug)
"
* 'fixes_to_backlog_tracker_v3' of https://github.com/raphaelsc/scylla:
compaction: Update backlog tracker correctly when schema is updated
compaction: Don't leak backlog of input sstable when compaction strategy is changed
compaction: introduce compaction_read_monitor_generator::remove_exhausted_sstables()
compaction: simplify removal of monitors
Currently the following can happen:
1) there's ongoing compaction with input sstable A, so sstable set
and backlog tracker both contains A.
2) ongoing compaction replaces input sstable A by B, so sstable set
contains only B now.
3) schema is updated, so a new backlog tracker is built without A
because sstable set now contains only B.
4) ongoing compaction tries to remove A from tracker, but it was
excluded in step 3.
5) tracker can now have a negative value if table is decreasing in
size, which leads to log(<negative number>) == -NaN
This problem happens because backlog tracker updates are decoupled
from sstable set updates. Given that the essential content of
backlog tracker should be the same as one of sstable set, let's move
tracker management to table.
Whenever sstable set is updated, backlog tracker will be updated with
the same changes, making their management less error prone.
Fixes#9157
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Database no longer needs it. Since the only user of the old-style
notification is gone -- remove it as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Tests don't have sstable format selector and enforce the needed
format by hands with the help of special database:: method. It's
more natural to provide it via convig. Doing this makes database
initialization in main and cql_test_env closer to each other.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Setting sstables format into database and into sstables_manager is
all plain assignments. Mark them as noexcept, next patch will become
apparently exception safe after that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are some large-data-handler-related helpers left after previous
patches, they can be removed altogehter.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After stop_database() became shard-local, it's possible to merge
it with database::stop() as they are both called one after another
on scylla stop. In cql-test-env there are few more steps in
between, but they don't rely on the database being partially
stopped.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method need to perform four steps cross-shard synchronously:
first stop compaction manager, then close user and, after it,
system tables, finally shutdown the large data handler.
This patch reworks this synchronization with the help of cross-shard
barrier added to the database previously. The motivation is to merge
.stop_database() with .stop().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Make sure a node-wide barrier exists on a database when scylla starts.
Also provide a barrier for cql_test_env. In all other cases keep a
solo-mode barrier so that single-shard db stop doesn't get blocked.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The intention is to keep all database initialization code in one place.
The init_system_keyspace() is one the obstacles -- it initializes db's
commitlog as first step.
This patch moves the commitlog initialization out of the mentioned
helper. The result looks clumsy, but it's temporary, next patches will
brush it up.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>