After previous patches both, create_snitch() and stop_snitch() no look
like the classica sharded service start/stop sequence. Finally both
helpers can be removed and the rest of the user can just call start/stop
on locally obtained sharded references.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently snitch drivers register themselves in class-registry with all
sorts of construction options possible. All those different constuctors
are in fact "config options".
When later snitch will declare its dependencies (gossiper and system
keyspace), it will require patching all this registrations, which's very
inconvenient.
This patch introduces the snitch_config struct and replaces all the
snitch constructors with the snitch_driver(snitch_config cfg) one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Colordiff is problematic when writing the diff into a file for later
examination. Use regular diff instead. One can still get syntax
highlighting by writing the output into `.diff` file (which most editors
will recognize).
Signed-off-by: Botond Dénes <bdenes@scylladb.com>
Message-Id: <20220407080944.324108-1-bdenes@scylladb.com>
There's a public call on replica::table to get back the compaction
manager reference. It's not needed, actually. The users of the call are
distributed loader which already has database at hand, and a test that
creates itw own instance of compaction manager for its testing tables
and thus also has it available.
tests: unit(dev)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220406171351.3050-1-xemul@scylladb.com>
"
First migrate all users to the v2 variant, all of which are tests.
However, to be able to properly migrate all tests off it, a v2 variant
of the restricted reader is also needed. All restricted reader users are
then migrated to the freshly introduced v2 variant and the v1 variant is
removed.
Users include:
* replica::table::make_reader_v2()
* streaming_virtual_table::as_mutation_source()
* sstables::make_reader()
* tests
This allows us to get rid of a bunch of conversions on the query path,
which was mostly v2 already.
With a few tests we did kick the can down the road by wrapping the v2
reader in `downgrade_to_v1()`, but this series is long enough already.
Tests: unit(dev), unit(boost/flat_mutation_reader_test:debug)
"
* 'remove-reader-from-mutations-v1/v3' of https://github.com/denesb/scylla:
readers: remove now unused v1 reader from mutations
test: move away from v1 reader from mutations
test/boost/mutation_reader_test: use fragment_scatterer
test/boost/mutation_fragment_test: extract fragment_scatterer into a separate hh
test/boost: mutation_fragment_test: refactor fragment_scatterer
readers: remove now unused v1 reversing reader
test/boost/flat_mutation_reader_test: convert to v2
frozen_mutation: fragment_and_freeze(): convert to v2
frozen_mutation: coroutinize fragment_and_freeze()
readers: migrate away from v1 reversing reader
db/virtual_table: use v2 variant of reversing and forwardable readers
replica/table: use v2 variant of reversing reader
sstables/sstable: remove unused make_crawling_reader_v1()
sstables/sstable: remove make_reader_v1()
readers: add v2 variant of reversing reader
readers/reversing: remove FIXME
readers: reader from mutations: use mutation's own schema when slicing
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
"
There's a static global sharded<local_cache> variable in system keyspace
the keeps several bits on board that other subsystems need to get from
the system keyspace, but what to have it in future<>-less manner.
Some time ago the system_keyspace became a classical sharded<> service
that references the qctx and the local cache. This set removes the global
cache variable and makes its instances be unique_ptr's sitting on the
system keyspace instances.
The biggest obstacle on this route is the local_host_id that was cached,
but at some point was copied onto db::config to simplify getting the value
from sstables manager (there's no system keyspace at hand there at all).
So the first thing this set does is removes the cached host_id and makes
all the users get it from the db::config.
(There's a BUG with config copy of host id -- replace node doesn't
update it. This set also fixes this place)
De-globalizing the cache is the prerequisite for untangling the snitch-
-messaging-gossiper-system_keyspace knot. Currently cache is initialized
too late -- when main calls system_keyspace.start() on all shards -- but
before this time messaging should already have access to it to store
its preferred IP mappings.
tests: unit(dev), dtest.simple_boot_shutdown(dev)
"
* 'br-trade-local-hostid-for-global-cache' of https://github.com/xemul/scylla:
system_keyspace: Make set_local_host_id non-static
system_keyspace: Make load_local_host_id non-static
system_keyspace: Remove global cache instance
system_keyspace: Make it peering service
system_keyspace,snitch: Make load_dc_rack_info non-static
system_keyspace,cdc,storage_service: Make bootstrap manipulations non-static
system_keyspace: Coroutinize set_bootstrap_state
gossiper: Add system keyspace dependency
cdc_generation_service: Add system keyspace dependency
system_keyspace: Remove local host id from local cache
storage_service: Update config.host_id on replace
storage_service: Indentation fix after previous patch
storage_service: Coroutinize prepare_replacement_info()
system_distributed_keyspace: Indentation fix after previous patch
code,system_keyspace: Relax system_keyspace::load_local_host_id() usage
code,system_keyspace: Remove system_keyspace::get_local_host_id()
The gossiper reads peer features from system keyspace. Also the snitch
code needs system keyspace, and since now it gets all its dependencies
from gossiper (will be fixed some day, but not now), it will do the same
for sys.ks.. Thus it's worth having gossiper->system_keyspace explicit
dependency.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The service uses system keyspace to, e.g., manage the generation id,
thus it depends on the system_keyspace instance and deserves the
explicit reference.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For compaction to be able to purge expired data, like tombstones, a
sstable set snapshot is set in the compaction descriptor.
That's a decision that belongs to task type. For example, all regular
compaction enable GC, whereas scrub for example doesn't for safety
reasons.
The problem is that the decision is being made by every instantiation
of compaction_descriptor in the strategies, which is both unnecessary
and also adds lots of boilerplate to the code, making it hard to
understand and work with.
As sstable set snapshot is an implementation detail, a new method
is being added to compaction_descriptor to make the intention
clearer, making the interface easier to understand.
can_purge_tombstones, used previously by rewrite task only, is being
reused for communicating GC intention into task::compact_sstables().
The boilerplate was a pain when adding a new strategy method for
the ongoing work on cleanup, described by issue #10097.
Another benefit is that we'll now only create a set snapshot when
compaction will really run. Before, it could happen that the snapshot
would be discarded if the compaction attempt had to be postponed,
which is a waste of cpu cycles.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Prior to the change, `USES_RAFT_CLUSTER_MANAGEMENT` feature wasn't
properly advertised upon enabling `SUPPORTS_RAFT_CLUSTER_MANAGEMENT`
raft feature.
This small series consists of 3 parts to fix the handling of supported
features for raft:
1. Move subscription for `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` to the
`raft_group_registry`.
2. Update `system.local#supported_features` directly in the
`feature_service::support()` method.
3. Re-advertise gossiper state for `SUPPORTED_FEATURES` gossiper
value in the support callback within `raft_group_registry`.
* manmanson/track_supported_set_recalculation_v7:
raft: re-advertise gossiper features when raft feature support changes
raft: move tracking `SUPPORTS_RAFT_CLUSTER_MANAGEMENT` feature to raft
gms: feature_service: update `system.local#supported_features` when feature support changes
test: cql_test_env: enable features in a `seastar::thread`
Move the listener from feature service to the `raft_group_registry`.
Enable support for the `USES_RAFT_CLUSTER_MANAGEMENT`
feature when the former is enabled.
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Each feature can have an associated `when_enabled` callback
registered, which is assumed to run in the thread context,
so wrap the `enable()` call in a seastar thread.
Tests: unit(dev)
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
"
The generating reader is a reader which converts a functor returning
mutation fragments to a mutation reader.
We currently have 2 generating reader implementations: one operating
with a v1 functor and one with a v2 one. This patch-set converts the v1
functor based one to a v2 reader, by adapting the v1 functor to a v2
functor and reusing the v2 reader implementation.
Tests are also added to both variants.
Tests: unit(dev)
"
* 'generating-reader-v2/v1' of https://github.com/denesb/scylla:
test/boost: mutation_reader_test: add tests for generating reader
test: export squash_mutations() into lib/mutation_source_test.hh
readers: add next partition adaptor
readers: implement generating_reader from v1 generator via adaptor
readers: upgrade_to_v2(): reimplement in terms of upgrading_consumer
readers: add upgrading_consumer
readers: generating_reader: use noncopyable_function<>
readers: merge generating.hh into generating_v2.hh
readers/generating.hh: return v2 reader from make_generating_reader()
This method used to be a static one in
boost/flat_mutation_reader_test.cc. Turns out it is useful for other
tests based on the mutation source test suite, so move it into the
header of the latter to make it accessible.
The method needs to call merge_schema() that will need system keyspace
instance at hand. The parse_s._t. method is boot-time one, pushing the
main-local instance through it is fine
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The main target here is system_keyspace::update_schema_version() which
is now static, but needs to have system_keyspace at "this". Migration
manager is one of the places that calls that method indirectly.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For now it's a reference, but all users of the cache will be
eventually switched into using system_keyspace.
In cql-test-env cache starting happens earlier than it was
before, but that's OK, it just initializes empty instances.
In main cache starts at the same time as before patching.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Start happens at exactly the same place. One thing to take care
of is that it happens on all shards.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The db::system_keyspace was made a class some time ago, time to create
a standard sharded<> object out of it. It needs query processor and
database. None of those depensencies is started early enough, so the
object for now starts in two steps -- early instances creation and
late start.
The instances will carry qctx and local_cache on board and all the
services that need those two will depend on system-keyspace. Its start
happens at exactly the same place where system_keyspace::setup happens
thus any service that will use system_keyspace will be on the same
safe side as it is now.
In the further future the system_keyspace will be equpped with its
own query processor backed by local replica database instance, instead
of the whole storage proxy as it is now.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
"
This series hardens raft_group_registry::stop_servers
and uses it to drain_on_shutdown, called before
the database is stopped in cql_test_env.
(Not needed for main).
raft_group_registry deferred_stop is introduced right after
the service is started to make sure it's properly stopped
even if there's an exception at any point while starting.
Test: unit(dev)
"
* tag 'raft_group_registry-drain_on_shutdown-v1' of https://github.com/bhalevy/scylla:
cql_test_env: raft_group_registry::drain_on_shutdown before stopping the database
raft_group_registry: harden stop_servers
raft_group_registry: delete unused _shutdown_gate
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.
With changes
real 29m14.051s
user 168m39.071s
sys 5m13.443s
Without changes
real 30m36.203s
user 175m43.354s
sys 5m26.376s
Closes#10194
database
We're currently stopping raft_gr before
shutting the database down, but we fail to do that if
anything goes wrong before that, e.g. if
distributed_loader::init_non_system_keyspaces fails.
This change splits drain_on_shutdown out of stop()
to stop the raft groups before the database is stopped
and does the rest in a deferred_stop placed right
after the rafr_gr registry is strated.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The series overhauls the compaction_manager::task design and implementation
by properly layering the functionality between the compaction_manager
that deals with generic task execution, and the per-task business logic that is defined
in a set of classes derived from the generic task class.
While at it, the series introduces `task::state` and a set of helper functions to manage it
to prevent leaks in the statistics, fixing #9974.
Two more stats counter were exposed: `completed_tasks` and a new `postponed_tasks`.
Test: sstable_compaction_test
Dtest: compaction_test.py compaction_additional_test.py
Fixes#9974Closes#10122
* github.com:scylladb/scylla:
compaction_manager: use coroutine::switch_to
compaction_manager::task: drop _compaction_running
compaction_manager: move per-type logic to derived task
compaction_manager: task: add state enum
compaction_manager: task: add maybe_retry
compaction_manager: reevaluate_postponed_compactions: mark as noexcept
compaction_manager: define derived task types
compaction_manager: register_metrics: expose postponed_compactions
compaction_manager: register_metrics: expose failed_compactions
compaction_manager: register_metrics: expose _stats.completed_tasks
compaction: add documentation for compaction_type to string conversions
compaction: expose to_string(compaction_type)
compaction_manager: task: standardize task description in log messages
compaction_manager: refactor can_proceed
compaction_manager: pass compaction_manager& to task ctor
compaction_manager: use shared_ptr<task> rather than lw_shared_ptr
compaction_manager: rewrite_sstables: acquire _maintenance_ops_sem once
compaction_manager: use compaction_state::lock only to synchronize major and regular compaction
Move the business logic into the task specific classes.
Separating initialization during task construction,
from the compaction_done task, moved into
a do_run() method, and in some cases moving
a lambda function that was called per table (as in
rewrite_sstables) into a private method of the
derived class.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define task::describe and use it via operator<<
to print the task metadata to the log in a standard way.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And use it to get the compaction state of the
table to compact.
It will be used in a later patch
to manage the task state from task
methods.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Although we have a log in run_mutation_reader_tests(), it is useful to
know where it was called from, when trying to find the test scenario
that failed.
Also remove the incorrect difference in range tombstone checking
behavior between `produces_range_tombstone()` and `produces(const
range_tombstone&)` by having both turn on checking.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Since `flat_reader_assertions::produces(const range_tombstone&,...)`
records the range tombstone for checking, be sure to explicitly pass
in a clustering range that does not extend beyond the mock-read part
of the mutation.
Also (provisionally) change the assertion method to accept clustering
ranges.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
`produces_range_tombstone()` is smart enough to not just try to read
one range tombstone from the input and compare it to the passed
reference, but to read as many range tombstones as the reader is
looking at (including none) using `may_produce_tombstones()` and
record those appropriately.
When `produces(const schema&, const mutation_fragment&)` is passed a
range tombstone as the second argument, it does not do anything
special -- it just reads one fragment, disregards it (!), and applies
its second argument to both "expected" and "encountered" range
tombstone lists. The right thing here is to use the same logic as
`produces_range_tombstone()`; upcoming memtable-related reader
changes (which result in more split range tombstones) cause some unit
tests to fail without fixing this.
Refactor the relevant logic into a private method (`apply_rt()`) and
use that in both places.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Not precise, as bytes_on_disk accounts for all components, but good enough
for testing purposes.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Memtables are a replica-side entity, and so are moved to the
replica module and namespace.
Memtables are also used outside the replica, in two places:
- in some virtual tables; this is also in some way inside the replica,
(virtual readers are installed at the replica level, not the
cooordinator), so I don't consider it a layering violation
- in many sstable unit tests, as a convenient way to create sstables
with known input. This is a layering violation.
We could make memtables their own module, but I think this is wrong.
Memtables are deeply tied into replica memory management, and trying
to make them a low-level primitive (at a lower level than sstables) will
be difficult. Not least because memtables use sstables. Instead, we
should have a memtable-like thing that doesn't support merging and
doesn't have all other funky memtable stuff, and instead replace
the uses of memtables in sstable tests with some kind of
make_flat_mutation_reader_from_unsorted_mutations() that does
the sorting that is the reason for the use of memtables in tests (and
live with the layering violation meanwhile).
Test: unit (dev)
Closes#10120
"
Currently the mutation compactor supports v1 and v2 output and has a v1
output. The next step is to add a v2 output but this would lead to a
full conversion matrix which we want to avoid. So in preparation we drop
the v1 input support. Most inputs were already v2, but there were some
notable exceptions: tests, the compacting reader and the multishard
query code. The former two was a simple mechanical update but the latter
required some further work because it turned out the v2 version of
evictable reader wasn't used yet and thus it managed to hide some bugs
and dropped features. While at it, we migrate all evictable and
multishard reader users to the v2 variant of the respective readers and
drop the v1 variant completely.
With this the road is open to a v2 compactor output and therefore to a
v2 sstable writer.
Tests: unit(dev, release), dtest(paging_additional_test.py)
"
* 'compact-mutation-v2-only-input/v5' of https://github.com/denesb/scylla:
test/lib/test_utils: return OK from check() variants
repair/row_level: use evictable reader v2
db/view/view_updating_consumer: migrate to v2
test/boost/mutation_reader_test: add v2 specific evictable reader tests
test: migrate to evictable reader v2 and multishard combining reader v2
compact_mutation: drop support for v1 input
test: pass v2 input to mutation_compaction
test/boost/mutation_test: simplify test_compaction_data_stream_split test
mutation_partition: do_compact(): do drop row tombstones covered by higher order tombstones
multishard_mutation_query: migrate to v2
mutation_fragment_v2: range_tombstone_change: add memory_usage()
evictable_reader_v2: terminate active range tombstones on reader recreation
evictable_reader_v2: restore handling of non-monotonically increasing positions
evictable_reader_v2: simplify handling of reader recreation
mutation: counter_write_query: use v2 reader
mutation: migrate consume() to v2
mutation_fragment_v2,flat_mutation_reader_v2: mirror v1 concept organization
mutation_reader: compacting_reader: require a v2 input reader
db/view/view_builder: use v2 reader
test/lib/flat_mutation_reader_assertions: adjust has_monotonic_positions() to v2 spec
The various require() and check() methods in test_utils.hh were
introduced to replace BOOST_REQUIRE() and BOOST_CHECK() respectively in
multi-shard concurrent tests, specifically those in
tests/boost/multishard_mutation_query_test.cc.
This was done literally, just replacing BOOST_REQUIRE() with require()
and BOOST_CHECK() with check(). The problem is that check() is missing a
feature BOOST_CHECK() had: while BOOST_CHECK() doesn't cause an
immediate test failure, just logging an error if the condition fails, it
remembers this failure and will fail the test in the end. check() did
not have this feature and this caused potential errors to just be logged
while the test could still pass fine, causing false-positive tests
passes. This patch fixes this by returning a [[nodiscard]] bool from the
check() methods. The caller can & these together over all calls to
check() methods and manually fail the test in the end. We choose this
method over a hidden global (like BOOST_CHECK() does) for simplicity
sake.