load_truncation_times() now works only for
schema tables since the rest is not loaded
until distributed_loader::init_non_system_keyspaces.
An attempt to call cf.set_truncation_time
for non-system table just throws an exception,
which is caught and logged with debug level.
This means that the call cf.get_truncation_time in
paxos_state.cc has never worked as expected.
To fix that we move load_truncation_times()
closer to the point where the tables are loaded.
The function distributed_loader::populate_keyspace is
called for both system and non-system tables. Once
the tables are loaded, we use the 'truncated' table
to initialize _truncated_at field for them.
The truncation_time check for schema tables is also moved
into populate_keyspace since is seems like a more natural
place for it.
It is more efficient to iterate over multiple data directories
in the inner loop rather than the outer loop.
Following patch will make use of the datadir in
table_populator.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Population of keyspaces happens first fo system keyspaces, then for
non-system ones. Both methods iterate over config datadirs to populate
from all configured directories. This patch generalizes this loop into
the populate_keyspace() method.
(indentation is deliberately left broken)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method in question tries to find keyspace reference on the database
by the given keyspace name. However, one of the callers aready has the
keyspace reference at hands and can just pass it. The other calls can
find the keyspace on its own.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We want to switch system.scylla_local table to the
schema commitlog, but load phases hamper here - schema
commitlog is initialized after phase1,
so a table which is using it should be moved to phase2,
but system.scylla_local contains features, and we need
them before schema commitlog initialization for
SCHEMA_COMMITLOG feature.
In this commit we are taking a different approach to
loading system tables. First, we load them all in
one pass in 'readonly' mode. In this mode, the table
cannot be written to and has not yet been assigned
a commit log. To achieve this we've added _readonly bool field
to the table class, it's initialized to true in table's
constructor. In addition, we changed the table constructor
to always assign nullptr to commitlog, and we trigger
an internal error if table.commitlog() property is accessed
while the table is in readonly mode. Then, after
triggering on_system_tables_loaded notifications on
feature_service and sstable_format_selector, we call
system_keyspace::mark_writable and eventually
table::mark_ready_for_writes which selects the
proper commitlog and marks the table as writable.
In sstable_compaction_test we drop several
mark_ready_for_writes calls since they are redundant,
the table has already been made writable in
env.make_table_for_tests call.
The table::commitlog function either returns the current
commitlog or causes an error if the table is readonly. This
didn't work for virtual tables, since they never called
mark_ready_for_writes. In this commit we add this
call to initialize_virtual_tables.
Take references to services which are initialized earlier. The
references to `gossiper`, `storage_service` and `raft_group0_registry`
are no longer needed.
This will allow us to move the `make` step right after starting
`system_keyspace`.
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).
So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command
The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields
Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)
Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile
The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closes#13963
It's the directory that owns the components lister and can reason about
the way to pick up dangling bits, be it local directories or entries
from the ownership table.
First thing to do is to move the g.c. code into sstable_directory. While
at it -- convert ssting dir into fs::path dir and switch logger.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Load-and-stream reads the entire content from SSTables, therefore it can
afford to discard the bloom filter that might otherwise consume a significant
amount of memory. Bloom filters are only needed by compaction and other
replica::table operations that might want to check the presence of keys
in the SSTable files, like single-partition reads.
It's not uncommon to see Data:Filter ratio of less than 100:1, meaning
that for ~300G of data, filters will take ~3G.
In addition to saving memory footprint, it also reduces operation time
as load-and-stream no longer have to read, parse and build the filters
from disk into memory.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
We aim (#12642) to use the schema commit log
for raft tables. Now they are loaded at
the first call to init_system_keyspace in
main.cc, but the schema commitlog is only
initialized shortly before the second
call. This is important, since the schema
commitlog initialization
(database::before_schema_keyspace_init)
needs to access schema commitlog feature,
which is loaded from system.scylla_local
and therefore is only available after the
first init_system_keyspace call.
So the idea is to defer the loading of the raft tables
until the second call to init_system_keyspace,
just as it works for schema tables.
For this we need a tool to mark which tables
should be loaded in the first or second phase.
To do this, in this patch we introduce system_table_load_phase
enum. It's set in the schema_static_props for schema tables.
It replaces the system_keyspace::table_selector in the
signature of init_system_keyspace.
The call site for populate_keyspace in init_system_keyspace
was changed, table_selector.contains_keyspace was replaced with
db.local().has_keyspace. This check prevents calling
populate_keyspace(system_schema) on phase1, but allows for
populate_keyspace(system) on phase2 (to init raft tables).
On this second call some tables from system keyspace
(e.g. system.local) may have already been populated on phase1.
This check protects from double-populating them, since every
populated cf is marked as ready_for_writes.
When sstables are loaded from upload/ subdir, the final step is to move
them from this directory into base or staging one. The uploading code
evaluates the target directory, then pushes it down the stack towards
make_sstables_available() method.
This patch replaces the path argument with bool to_staging one. The
goal is to remove the knowlege of exact sstable location (nowadays --
its files' path) from the distributed loader and keep it in sstable
object itself. Next patches will make full use of this change.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It used to be just metadata by providing the meta for population, now it
does the population by itself, so rename it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This ownership change also requires the auto& = *this alias and extra
specification where to call reshard() and reshape() from.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now it gets one from this-> but the method is becoming static one in
distributed_loader which only has it as an argument. That's not big deal
as the current IO class is going to be derived from current sched group,
so this extra arg will go away at all some day.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add a new virtual table `system.raft_state` that shows the currently
operating Raft configuration for each present group. The schema is the
same as `system.raft_snapshot_config` (the latter shows the config from
the last snapshot). In the future we plan to add more columns to this
table, showing more information (like the current leader and term),
hence the generic name.
Adding the table requires some plumbing of
`sharded<raft_group_registry>&` through function parameters to make it
accessible from `register_virtual_tables`, but it's mostly
straightforward.
Also added some APIs to `raft_group_registry` to list all groups and
find a given group (returning `nullptr` if one isn't found, not throwing
an exception).
The sstable_directory::process_sstable_dir() accepts a boolean to
control its behavior when collecting sstables. Turn this boolean into a
structure of flags. The intention is to extend this flags set in the
future (next patch).
This boolean is true all the time, but one place sets it to true in a
"verbose" manner, like this:
bool sort_sstables_according_to_owner = false;
process_sstable_dir(directory, sort_sstables_according_to_owner).get();
the local variable is not used anymore. Using designated initializers
solves the verbosity in a nicer manner.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We should scan all sstables in the table directory and its
subdirectories to determine the highest sstable version and generation
before using it for creating new sstables (via reshard or reshape).
Fixesscylladb/scylladb#11793
Note: table_population_metadata::start_subdir is called
in a seastar thread to facilitate backporting to old versions
that do not support coroutines yet.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
It's final destination is virtual tabls registration code called from
init_system_keyspace() eventually
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This reverts commit 7396de72b1.
In 7396de7 (and refactorings before it) the set of prioritized keyspaces (and processing thereof)
was removed, due to apparent non-usage (which is true for open-source version).
This functionality is however required for certain features of the enterprise version (ear).
As such is needs to be restored and reenabled. This reverts the actual commit, patch after
ensures we use the prio set.
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.
Fixes#11207
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"
Making the system-keyspace into a standard sharded instance will
help to fix several dependency knots.
First, the global qctx and local-cache both will be moved onto the
sys-ks, all their users will be patched to depend on system-keyspace.
Now it's not quite so, but we're moving towards this state.
Second, snitch instance now sits in the middle of another dependency
loop. To untie one the preferred ip and dc/rack info should be
moved onto system keyspace altogether (now it's scattered over several
places). The sys-ks thus needs to be a sharded service with some
state.
This set makes system-keyspace sharded instance, equipps it with all
the dependencies it needs and passes it as dependency into storage
service, migration manager and API. This helps eliminating a good
portion of global qctx/cache usage and prepares the ground for snitch
rework.
tests: unit(dev)
v1: unit(debug), dtest.simple_boot_shutdown(dev)
"
* 'br-sharded-system-keyspace-instance-2' of https://github.com/xemul/scylla: (25 commits)
system_keyspace: Make load_host_ids non-static
system_keyspace: Make load_tokens non-static
system_keyspace: Make remove_endpoint and update_tokens non-static
system_keyspace: Coroutinize update_tokens
system_keyspace: Coroutinize remove_endpoint
system_keyspace: Make update_cached_values non-static
system_keyspace: Coroutinuze update_peer_info
system_keyspace: Make update_schema_version non-static
schema_tables: Add sharded<system_keyspace> argument to update_schema_version_and_announce
replica: Push sharded<system_keyspace> down to parse_system_tables
api: Carry sharded<system_keyspace> reference along
storage_service: Keep sharded<system_keyspace> reference
migration_manager: Keep sharded<system_keyspace> reference
system_keyspace: Remove temporary qp variable
system_keyspace: Make get_preferred_ips non-static
system_keyspace: Make cache_truncation_record non-static
system_keyspace: Make check_health non-static
system_keyspace: Make build_bootstrap_info non-static
system_keyspace: Make build_dc_rack_info non-static
system_keyspace: Make setup_version non-static
...
In https://github.com/scylladb/scylla/issues/10218
we see off-strategy compaction happening on a table
during the initial phases of
`distributed_loader::populate_column_family`.
It is caused by triggering offtrategy compaction
too early, when sstables are populated from the staging
directory in a144d30162.
We need to trigger offstrategy compaction only of the base
table directory, never the staging or quarantine dirs.
Fixes#10218
Test: unit(dev)
DTest: materialized_views_test.py::TestInterruptBuildProcess
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220316152812.3344634-1-bhalevy@scylladb.com>
The method needs to call merge_schema() that will need system keyspace
instance at hand. The parse_s._t. method is boot-time one, pushing the
main-local instance through it is fine
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
SSTables created by repair will potentially not conform to the compaction strategy
layout goal. If node shuts down before off-strategy has a chance to
reshape those files, node will be forced to reshape them on restart. That
causes unexpected downtime. Turns out we can skip reshape of those files
on boot, and allow them to be reshaped after node becomes online, as if
the node never went down. Those files will go through same procedure as
files created by repair-based ops. They will be placed in maintenance set,
and be reshaped iteratively until ready for integration into the main set.
Fixes#9895.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
distributed_loader is replica-side thing, so it belongs in the
replica module ("distributed" refers to its ability to load
sstables in their correct shards). So move it to the replica
module.