Commit Graph

4972 Commits

Author SHA1 Message Date
Kamil Braun
3ab244e6d9 system_keyspace: make get/set_scylla_local_param public
We'll use it outside `system_keyspace` code in later commit.
2023-09-15 13:04:04 +02:00
Kamil Braun
72cd457d53 feature_service: add GROUP0_SCHEMA_VERSIONING feature
This feature, when enabled, will modify how schema versions
are calculated and stored.

- In group 0 mode, schema versions are persisted by the group 0 command
  that performs the schema change, then reused by each node instead of
  being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
  perform a schema change, we revert to the old digest-based way, taking
  into account the possibility of having performed group0-mode schema
  changes (that used persistent versions). As we will see in future
  commits, this will be done by storing additional flags and tombstones
  in system tables.

By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).

For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
2023-09-15 13:04:04 +02:00
Kamil Braun
dc4e20d835 schema_tables: refactor scylla_tables(schema_features)
The `scylla_tables` function gives a different schema definition
for the `system_schema.scylla_tables` table, depending on whether
certain schema features are enabled or not.

The way it was implemented, we had to write `θ(2^n)` amount
of code and comments to handle `n` features.

Refactor it so that the amount of code we have to write to handle `n`
features is `θ(n)`.
2023-09-15 13:04:04 +02:00
Kamil Braun
4376854473 schema_tables: remove default value for reload in merge_schema
To avoid bugs like the one fixed in the previous commit.
2023-09-15 13:04:04 +02:00
Kamil Braun
48164e1d09 schema_tables: pass reload flag when calling merge_schema cross-shard
In 0c86abab4d `merge_schema` obtained a new flag, `reload`.

Unfortunately, the flag was assigned a default value, which I think is
almost always a bad idea, and indeed it was in this case. When
`merge_scehma` is called on shard different than 0, it recursively calls
itself on shard 0. That recursive call forgot to pass the `reload` flag.

Fix this.
2023-09-15 13:04:04 +02:00
Kamil Braun
9017b998ca system_keyspace: fix outdated comment 2023-09-15 13:04:04 +02:00
Patryk Jędrzejczak
e375e769b9 raft topology: set CDC generation clean-up candidate
We want to use the clean-up candidates to remove the obsolete CDC
generation data, but first, we need to set suitable generations as
a candidate when there is no candidate. Since CDC generations must
be published before we remove them, a generation that is being
published is a good candidate.
2023-09-15 09:23:59 +02:00
Dawid Medrek
fbbb9f879a db/hints: Remove unused aliases from manager.hh 2023-09-15 04:17:08 +02:00
Dawid Medrek
d46437a87b db/hints: Rename end_point_hints_manager
This commit renames `end_point_hints_manager` to `hint_endpoint_manager`
to be consistent with other names used in the module (they all start
with `hint_`).
2023-09-15 03:46:15 +02:00
Dawid Medrek
6d1eee448b db/hints: Rename sender to hint_sender
We rename the structure to highlight what exactly its purpose is.
2023-09-15 03:46:15 +02:00
Dawid Medrek
4ad0f8907c db/hints: Move the rebalancing logic to hint_storage
This commit continues modularizing manager.hh.
2023-09-15 03:46:15 +02:00
Dawid Medrek
999484466d db/hints: Move the implementation of sender
This commit continues modularizing manager.hh.
After moving the declaration of sender to a dedicated
header file, these changes move its implementation to
a separate source file.
2023-09-15 03:46:15 +02:00
Dawid Medrek
17aabf6b9a db/hints: Move the declaration of sender to hint_sender.hh
This commit is yet another step in modularizing manager.hh.
We move the declaration of sender to a dedicated file.
Its implementation will follow in a future commit.
2023-09-15 03:46:15 +02:00
Dawid Medrek
1a7262ed6e db/hints: Move sender::replay_allowed() to the source file
The premise of these changes is the fact that we cannot have
a cycle of #includes.

Because the declaration of `sender` is going to be moved to
a separate header file in a future commit, and because that
header file is going to be included in the file where
`end_point_hints_manager` is declared, we will need to rely
on `end_point_hints_manager` being an incomplete type there.

A consequence of that is that we cannot access any of
`end_point_hints_manager`'s methods.

This commit prepares the ground for it by moving
the definition of the function to the source file where
`end_point_hints_manager` will be a complete type.
2023-09-15 03:46:15 +02:00
Dawid Medrek
ad2a36bd45 db/hints: Put end_point_hints_manager in internal namespace 2023-09-15 03:46:15 +02:00
Dawid Medrek
507054012d db/hints: Move the implementation of end_point_hints_manager
This commit continues moving end_point_hints_manager to its
dedicated files. After moving the declaration of the class,
these changes move the implementation.
2023-09-15 03:46:15 +02:00
Dawid Medrek
f72c423984 db/hints: Move the declaration of end_point_hints_manager
This commit is yet another step in modularizing manager.hh.
We move the declaration of the class to a dedicated header file.
The implementation will follow in a future commit.
2023-09-15 03:46:15 +02:00
Dawid Medrek
854cc0c939 db/hints: Move definitions of functions using shard hint manager
We move definitions of inline methods of end_point_hints_manager
and sender accessing shard hint manager to the source file,
effectively un-inlining them. We need to do that to prepare for
moving said structures out of manager.hh. This commit is yet
another step in modularizing manager.hh.
2023-09-15 03:45:57 +02:00
Dawid Medrek
db08a85f5d db/hints: Introduce hint_storage.hh
This commit moves types used by shard hint manager
and related to storing hints on disk to another file.
It is yet another step in modularizing manager.hh.
2023-09-15 02:28:10 +02:00
Dawid Medrek
4814b3b19a db/hints: Extract the logger from manager.cc
This commit extracts the logger used in manager.cc
to prepare the ground for modularization of manager.hh
into separate smaller files. We want to preserve
the logging behavior (at least for the time being),
which means new files should use the same logger.
These changes serve that purpose.
2023-09-15 02:24:20 +02:00
Dawid Medrek
efd6d1f57a db/hints: Extract common types from manager.hh
Currently, data structures used in manager.hh
use their own aliases for gms::inet_address.
It is clear they all should use the same type
and having different names for it only reduces
readability of the code. This commit introduces
a common alias -- endpoint_id -- and gets rid
of the other ones.

This commit is also the first step in modularizing
manager.hh by extracting common types to another
file.
2023-09-15 02:23:30 +02:00
Patryk Jędrzejczak
c0fd42ead4 system_keyspace: introduce decode_cdc_generation_id
The decode_cdc_generations_ids function allows us to decode
a vector of CDC generation IDs. After adding cleanup_candidate
to CDC_GENERATIONS_V3, we need a similar function that decodes
a single ID.
2023-09-14 12:09:14 +02:00
Patryk Jędrzejczak
6db325fb69 system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3
In the following commits, we implement a garbage collection for
CDC_GENERATIONS_V3. The first step is introducing the clean-up
candidate. It will be continually updated by the CDC generation
publisher and used to remove obsolete data.
2023-09-14 12:09:10 +02:00
Petr Gusev
082cd3bc8e system_keyspace: switch CDC_LOCAL to schema commitlog 2023-09-13 23:17:20 +04:00
Petr Gusev
a683cebb02 system_keyspace: scylla_local: use schema commitlog
We remove flush from set_scylla_local_param_as
since it's now redundant. We add it to
save_local_enabled_features as features need to
be available before schema commitlog replay.

We skip the flush if save_local_enabled_features
is called from topology_state_load when the features
are migrated to system.topology and we don't need
strict durability.
2023-09-13 23:17:20 +04:00
Petr Gusev
beb29f094b system_keyspace: drop load phases
We want to switch system.scylla_local table to the
schema commitlog, but load phases hamper here - schema
commitlog is initialized after phase1,
so a table which is using it should be moved to phase2,
but system.scylla_local contains features, and we need
them before  schema commitlog initialization for
SCHEMA_COMMITLOG feature.

In this commit we are taking a different approach to
loading system tables. First, we load them all in
one pass in 'readonly' mode. In this mode, the table
cannot be written to and has not yet been assigned
a commit log. To achieve this we've added _readonly bool field
to the table class, it's initialized to true in table's
constructor. In addition, we changed the table constructor
to always assign nullptr to commitlog, and we trigger
an internal error if table.commitlog() property is accessed
while the table is in readonly mode. Then, after
triggering on_system_tables_loaded notifications on
feature_service and sstable_format_selector, we call
system_keyspace::mark_writable and eventually
table::mark_ready_for_writes which selects the
proper commitlog and marks the table as writable.

In sstable_compaction_test we drop several
mark_ready_for_writes calls since they are redundant,
the table has already been made writable in
env.make_table_for_tests call.

The table::commitlog function either returns the current
commitlog or causes an error if the table is readonly. This
didn't work for virtual tables, since they never called
mark_ready_for_writes. In this commit we add this
call to initialize_virtual_tables.
2023-09-13 23:17:20 +04:00
Petr Gusev
47ffc66c7f database.hh: add_column_family: add readonly parameter
Previously, creating a table or view in
schema_tables.cc/merge_tables_and_views was a two-step process:
first adding a column family (add_column_family function) and
then marking it as ready for writes (mark_table_as_writable).
There is an yield between these stages, this means
someone could see a table or view for which the
mark_table_as_writable method had not yet been called,
and start writing to it.

This problem was demonstrated by materialised view dtests.
A view is created on all nodes. On some nodes it will be created
earlier than on others and the view rebuild process will start
writing data to that view on other nodes, where mark_table_as_writable
has not yet been called.

In this patch we solve this problem by adding a readonly parameter
to the add_column_family method. When loading tables from disk,
this flag is set to true and the mark_table_as_writable
is called only after all sstables have been loaded.
When creating a new table, this flag is set to false,
mark_table_as_writable is called from inside add_column_family
and the new table becomes visible already as writable.
2023-09-13 23:17:20 +04:00
Petr Gusev
7e52014633 schema_tables: merge_tables_and_views: delay events until tables/views are created on all shards
db.get_notifier().create_view triggers view rebuild, this
process writes to the table on all shards and thus can
access partially created table, e.g the one where
mark_table_ready_for_writes was not yet called.
2023-09-13 23:17:20 +04:00
Petr Gusev
0e5f9ae9a4 system_keyspace: switch system.peers to schema commitlog
Also, we remove flushes on writes as durability
is now guaranteed by the commitlog.
2023-09-13 23:17:20 +04:00
Petr Gusev
7881ce1e09 system_keyspace: switch system.local to schema commitlog
Schema commitlog lives only on the zero shard,
so we need to turn on use_null_sharder option.

Also, we remove flushes on writes as durability
is now guaranteed by the commitlog.
2023-09-13 23:17:20 +04:00
Petr Gusev
a0653590b5 sstables_format_selector: extract listener
In the following commits we want to move schema
commitlog replay earlier, but the current sstable
format should be selected before the replay.
The current sstable format is stored in system.scylla_local,
so we can't read it until system tables are loaded.
This problem is similar to the enabled_features.

To solve this we split sstables_format_selector in two
parts. The lower level part, sstables_format_selector,
knows only about database and system_keyspace. It
will be moved before system_keyspace initialization,
and the on_system_tables_loaded method will
be called on it when the system_keyspace has loaded its tables.

The higher level part, sstables_format_listener, is responsible
for subscribing to feature_services and gossipier and is started
later, at the same place as sstables_format_selector before this commit.
2023-09-13 23:04:50 +04:00
Petr Gusev
7104fc8a7e sstables_format_selector: wrap when_enabled with seastar::async
The listener may fire immediately, we must be in a thread
context for this to work.

In the next commits we are going to move
enable_features_on_startup above
sstables_format_selector::start in scylla_main, so we
need to fix this beforehand.
2023-09-13 23:00:16 +04:00
Petr Gusev
2a0b228d17 main.cc: inline and split system_keyspace.setup
Our goal is to switch system.local table to schema
commitlog and stop doing flushes when we write to it.
This means it would be incorrect to read from this
table until schema commitlog is replayed.

On the other hand, we need truncation records
to be loaded before we start replaying schema
commitlog, since commitlog_replayer relies on them.

In this commit we inline the system_keyspace::setup
function and split its content into two parts. In
the first part, before schema commitlog replay,
we load truncation records. It's safe to load
them before schema commitlog replay since we intend
to let the flushes on writes to system.truncated
table. In the second part, after schema commitlog replay,
we do the rest of the job - build_bootstrap_info and
db::schema_tables::save_system_schema.

We decided to inline this function since there is
very low cohesion between the actions it's performing.
It's just simpler to reason about them individually.
2023-09-13 23:00:15 +04:00
Petr Gusev
f0bc9f2d93 system_keyspace: refactor save_system_schema function
This is a refactoring commit without observable changes
in behaviour.

Previously, there were two related functions in db::schema_tables:
save_system_keyspace_schema(qp) and save_system_schema(qp, ks).
The first called the second passing "system_schema" as
the second argument. Outside of schema_tables module we
don't need two functions, we just need a way to say
'persist system schema objects in the appropriate tables/keyspaces'.
In this commit we change the function save_system_schema
to have this meaning. Internally it calls save_system_schema_to_keyspace
twice with "system_schema" and "system", since that's what we need
in the single call site of this function in system_keyspace::setup.
In subsequent commits we are going to move this call out of the
system_keyspace::setup.
2023-09-13 23:00:15 +04:00
Petr Gusev
e395086557 system_keyspace: move initialize_virtual_tables into virtual_tables.hh
This is a readability refactoring commit without observable changes
in behaviour.

initialize_virtual_tables logically belongs to virtual_tables module,
and it allows to make other functions in virtual_tables.cc
(register_virtual_tables, install_virtual_readers)
local to the module, which simplifies the matters a bit.

all_virtual_tables() is not needed anymore, all the references to
registered virtual tables are now local to virtual_tables module
and can just use virtual_tables variable directly.
2023-09-13 23:00:15 +04:00
Petr Gusev
c4787a160b system_keyspace: remove unused parameter 2023-09-13 23:00:15 +04:00
Petr Gusev
b90011294d config.cc: drop db::config::host_id
In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.

Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.

We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
2023-09-13 23:00:15 +04:00
Petr Gusev
a03fbc3781 system_keyspace: set null sharder when configuring schema commitlog
The schema commitlog lives only on the null shard, it
makes no sense to set use_schema_commitlog
without use_null_sharder.

We also extract the function enable_schema_commitlog which
sets all the needed properties.
2023-09-13 23:00:15 +04:00
Petr Gusev
d32191a353 system_keyspace: rename static variables
'raft_tables' in set_use_schema_commitlog
initialization was misleading. Other variables have
also been renamed for consistency.
2023-09-13 23:00:15 +04:00
Petr Gusev
cda49b06dc system_keyspace: remove redundant wait_for_sync_to_commitlog
Tables with schema commitlog already sync every
write, wait_for_sync_to_commitlog makes sense
only for the regular commitlog.

Technically there are nothing wrong with allowing
both options, but it's confusing. Being strict
and accurate about the meaning of the options
reduces the chance of errors due to misunderstanding.

This is preparation for the next commits, where
we will start generating an error if the combination
of options doesn't make sense.
2023-09-13 23:00:15 +04:00
Avi Kivity
0a5d9532f9 Merge 'Sanitize batchlog manager start/stop' from Pavel Emelyanov
This code is now spread over main and differs in cql_test_env. The PR unifies both places and makes the manager start-stop look standard

refs: #2795

Closes #15375

* github.com:scylladb/scylladb:
  batchlog_manager: Remove start() method
  batchlog_manager: Start replay loop in constructor
  main, cql_test_env: Start-stop batchlog manager in one "block"
  batchlog_manager: Move shard-0 check into batchlog_replay_loop()
  batchlog_manager: Fix drain() reentrability
2023-09-13 18:20:56 +03:00
Kamil Braun
a184b07cbb Merge 'raft topology: make CDC_GENERATIONS_V3 single-partition, timeuuid-sorted' from Patryk Jędrzejczak
We make the `CDC_GENERATIONS_V3` table single-partition and change the
clustering key from `range_end` to `(id, range_end)`. We also change the
type of `id` to `timeuuid` and ensure that a new generation always has
the highest `id`. These changes allow efficient clearing of obsolete CDC
generation data, which we need to prevent Raft-topology snapshots from
endlessly growing as we introduce new generations over time.

All this code is protected by an experimental feature flag. It includes
the definition of `CDC_GENERATIONS_V3`. The table is not created unless
the feature flag is enabled.

Fixes #15163

Closes #15319

* github.com:scylladb/scylladb:
  system_keyspace: rename cdc_generation_id_v2
  system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3
  cdc: generation: remove topology_description_generator
  cdc: do not create uuid in make_new_generation_data
  system_kayspace: make CDC_GENERATIONS_V3 single-partition
  cdc: generation: introduce get_common_cdc_generation_mutations
  cdc: generation: rename get_cdc_generation_mutations
2023-09-13 12:54:49 +02:00
Pavel Emelyanov
d48aff5789 batchlog_manager: Remove start() method
It's now a no-op, can be dropped.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 16:37:52 +03:00
Pavel Emelyanov
3966a50ed4 batchlog_manager: Start replay loop in constructor
... and sanitize the future used on stop.

The loop in question is now started in .start(), but all callers now
construct the manager late enough, so the loop spawning can be moved.
This also calls for renaming the future member of the class and allows
to make it regular, not shared, future.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 16:35:53 +03:00
Pavel Emelyanov
9f45778467 batchlog_manager: Move shard-0 check into batchlog_replay_loop()
Currently the only caller of it is the batchlog manager itself. It
checks for the shard-id to be zero, calls the method, then the method
asserts that it's run on shard-0.

Moving the check into the method removes the need for assertion and
makes further patching simpler.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 16:32:12 +03:00
Pavel Emelyanov
38d0ea0916 batchlog_manager: Fix drain() reentrability
Currently drain() is called twise -- first time from
storage_service::drain() (on shutdown), second via
batchlog_manager::stop(). The routine is unintentinally re-entrable,
because:
- explicit check for not aborting the abort source twise
- breaking semaphore can be done multiple times
- co-await-ing of the _started future works because the future is shared

That's not extremely elegant, better to make the drain() bail out early
if it was already called.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-12 16:30:07 +03:00
Patryk Jędrzejczak
92209996b5 system_keyspace: rename cdc_generation_id_v2
Changing the second value of cdc_generation_id_v2 from uuid_type
to timeuuid_type made the name of cdc_generation_id_v2 unsuitable
because it does not match cdc::generation_id_v2 anymore.
2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak
1c58c6336a system_keyspace: change id to timeuuid in CDC_GENERATIONS_V3
We change the type of IDs in CDC_GENERATIONS_V3 to timeuuid to
give them a time-based order. We also change how we initialize
them so that the new CDC generation always has the highest ID.
This is the last step to enabling the efficient clearing of
obsolete CDC generation data.

Additionally, we change the types of current_cdc_generation_uuid,
new_cdc_generation_data_uuid and the second values of the elements
in unpublished_cdc_generations to timeuuid, so that they match id
in CDC_GENERATIONS_V3.
2023-09-12 11:43:34 +02:00
Patryk Jędrzejczak
2cd430ac80 system_kayspace: make CDC_GENERATIONS_V3 single-partition
We make CDC_GENERATIONS_V3 single-partition by adding the key
column and changing the clustering key from range_end to
(id, range_end). This is the first step to enabling the efficient
clearing of obsolete CDC generation data, which we need to prevent
Raft-topology snapshots from endlessly growing as we introduce new
generations over time. The next step is to change the type of the id
column to timeuuid. We do it in the following commits.

After making CDC_GENERATIONS_V3 single-partition, there is no easy
way of preserving the num_ranges column. As it is used only for
sanity checking, we remove it to simplify the implementation.
2023-09-12 09:51:45 +02:00
Patryk Jędrzejczak
ed1c1369d9 cdc: generation: rename get_cdc_generation_mutations
In the following commits, we modify the CDC_GENERATIONS_V3 schema
to enable efficient clearing of obsolete CDC generation data.
These modifications make the current get_cdc_generation_mutations
work only for the CDC_GENERATIONS_V2 schema, and we need a new
function for CDC_GENERATIONS_V3, so we add the "_v2" suffix.
2023-09-11 12:30:21 +02:00