Commit Graph

3334 Commits

Author SHA1 Message Date
Petr Gusev
1b2e0d0cc9 system_keyspace: get_truncated_position -> get_truncated_positions
This method can return many replay_positions, so
the plural form is more appropriate.
2023-09-28 12:25:40 +04:00
Pavel Emelyanov
becd960ae8 view_update_generator: Add logging to do_abort()
Just tell the logs that the guy is aborting
refs: #10941

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-21 13:34:21 +03:00
Pavel Emelyanov
967ebacaa4 view_update_generator: Move abort kicking to do_abort()
When v.u.g. stops is first aborts the generation background fiber by
requesting abort on the internal abort source and signalling the fiber
in case it's waiting. Right now v.u.g.::stop() is defer-scheduled last
in main(), so this move doesn't change much -- when stop_signal fires,
it will kick the v.u.g.::do_abort() just a bit earlier, there's nothing
that would happen after it before real ::stop() is called that depends
on it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-21 13:32:45 +03:00
Pavel Emelyanov
e34220ebb7 view_update_generator: Add early abort subscription
Subscribe v.u.g. to the main's stop_signal. For now a no-op callback.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-21 13:32:45 +03:00
Tomasz Grabiec
3d4398d1b2 Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun
When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620).

If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957).

When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary.

We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`.

Fixes: #7620
Fixes: #13957

Closes scylladb/scylladb#15331

* github.com:scylladb/scylladb:
  test: add test for group 0 schema versioning
  test/pylib: log_browsing: fix type hint
  feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode
  schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0
  migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations
  schema_tables: use schema version from group 0 if present
  migration_manager: store `group0_schema_version` in `scylla_local` during schema changes
  migration_manager: migration_request handler: assume `canonical_mutation` support
  system_keyspace: make `get/set_scylla_local_param` public
  feature_service: add `GROUP0_SCHEMA_VERSIONING` feature
  schema_tables: refactor `scylla_tables(schema_features)`
  migration_manager: add `std::move` to avoid a copy
  schema_tables: remove default value for `reload` in `merge_schema`
  schema_tables: pass `reload` flag when calling `merge_schema` cross-shard
  system_keyspace: fix outdated comment
2023-09-20 10:43:40 +02:00
Botond Dénes
111cdce2e1 Merge 'db/hints: Modularize manager.hh' from Dawid Mędrek
This PR modularizes `manager.{hh, cc}` by dividing the files into separate smaller units. The changes improve overall readability of code and help reason about it. Each file has a specific purpose now.

This is the first step in refactoring the Hinted Handoff module.

Refs scylladb/scylla#15358

Closes scylladb/scylladb#15378

* github.com:scylladb/scylladb:
  db/hints: Remove unused aliases from manager.hh
  db/hints: Rename end_point_hints_manager
  db/hints: Rename sender to hint_sender
  db/hints: Move the rebalancing logic to hint_storage
  db/hints: Move the implementation of sender
  db/hints: Move the declaration of sender to hint_sender.hh
  db/hints: Move sender::replay_allowed() to the source file
  db/hints: Put end_point_hints_manager in internal namespace
  db/hints: Move the implementation of end_point_hints_manager
  db/hints: Move the declaration of end_point_hints_manager
  db/hints: Move definitions of functions using shard hint manager
  db/hints: Introduce hint_storage.hh
  db/hints: Extract the logger from manager.cc
  db/hints: Extract common types from manager.hh
2023-09-19 10:56:16 +03:00
Michael Huang
62a8a31be7 cdc: use chunked_vector for topology_description entries
Lists can grow very big. Let's use a chunked vector to prevent large contiguous
allocations.
Fixes: #15302.

Closes scylladb/scylladb#15428
2023-09-18 23:17:01 +03:00
Kamil Braun
bc6f7d1b20 Merge 'raft topology: add garbage collection for internal CDC generations table' from Patryk Jędrzejczak
We add garbage collection for the `CDC_GENERATIONS_V3` table to prevent
it from endlessly growing. This mechanism is especially needed because
we send the entire contents of `CDC_GENERATIONS_V3` as a part of the
group 0 snapshot.

The solution is to keep a clean-up candidate, which is one of the
already published CDC generations. The CDC generation publisher
introduced in #15281 continually uses this candidate to remove all
generations with timestamps not exceeding the candidate's and sets a new
candidate when needed.

We also add `test_cdc_generation_clearing.py` that verifies this new
mechanism.

Fixes #15323

Closes scylladb/scylladb#15413

* github.com:scylladb/scylladb:
  test: add test_cdc_generation_clearing
  raft topology: remove obsolete CDC generations
  raft topology: set CDC generation clean-up candidate
  topology_coordinator: refactor publish_oldest_cdc_generation
  system_keyspace: introduce decode_cdc_generation_id
  system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3
2023-09-18 11:30:10 +02:00
Kamil Braun
947c419421 schema_tables: don't delete version cell from scylla_tables mutations from group 0
As explained in the previous commit, we use the new
`committed_by_group0` flag attached to each row of a `scylla_tables`
mutation to decide whether the `version` cell needs to be deleted or
not.

The rest of #13957 is solved by pre-existing code -- if the `version`
column is present in the mutation, we don't calculate a hash for
`schema::version()`, but take the value from the column:

```
table_schema_version schema_mutations::digest(db::schema_features sf)
const {
    if (_scylla_tables) {
        auto rs = query::result_set(*_scylla_tables);
        if (!rs.empty()) {
            auto&& row = rs.row(0);
            auto val = row.get<utils::UUID>("version");
            if (val) {
                return table_schema_version(*val);
            }
        }
    }

    ...
```

The issue will therefore be fixed once we enable
`GROUP0_SCHEMA_VERSIONING`.
2023-09-15 14:32:52 +02:00
Kamil Braun
ce68ee0950 migration_manager: add committed_by_group0 flag to system.scylla_tables mutations
As described in #13957, when creating or altering a table in group 0
mode, we don't want each node to calculate `schema::version()`s
independently using a hash algorithm. Instead, we want to all nodes to
use a single version for that table, commited by the group 0 command.

There's even a column ready for this in `system.scylla_tables` --
`version`. This column is currently being set for system tables, but
it's not being used for user tables.

Similarly to what we did with global schema version in earlier commits,
the obvious thing to do would be to include a live cell for the `version`
column in the `system.scylla_tables` mutation when we perform the schema
change in Raft mode, and to include a tombstone when performing it
outside of Raft mode, for the RECOVERY case.

But it's not that simple because as it turns out, we're *already*
sending a `version` live cell (and also a tombstone, with timestamp
decremented by 1) in all `system.scylla_tables` mutations. But then we
delete that cell when doing schema merge (which begs the question
why were we sending it in the first place? but I digress):
```
        // We must force recalculation of schema version after the merge, since the resulting
        // schema may be a mix of the old and new schemas.
        delete_schema_version(mutation);
```
the above function removes the `version` cell from the mutation.

So we need another way of distinguishing the cases of schema change
originating from group 0 vs outside group 0 (e.g. RECOVERY).

The method I chose is to extend `system.scylla_tables` with a boolean
column, `committed_by_group0`, and extend schema mutations to set
this column.

In the next commit we'll decide whether or not the `version` cell should
be deleted based on the value of this new column.
2023-09-15 14:32:52 +02:00
Kamil Braun
59912ca3b0 schema_tables: use schema version from group 0 if present
As promised in the previous commit, if we persisted a schema version
through a group 0 command, use it after a schema merge instead of
calculating a digest.

Ref: #7620

The above issue will be fixed once we enable the
`GROUP0_SCHEMA_VERSIONING` feature.
2023-09-15 14:32:52 +02:00
Kamil Braun
7ab7588d59 migration_manager: store group0_schema_version in scylla_local during schema changes
We extend schema mutations with an additional mutation to the
`system.scylla_local` table which:
- in Raft mode, stores a UUID under the `group0_schema_version` key.
- outside Raft mode, stores a tombstone under that key.

As we will see in later commits, nodes will use this after applying
schema mutations. If the key is absent or has a tombstone, they'll
calculate the global schema digest on their own -- using the old way. If
the key is present, they'll take the schema version from there.

The Raft-mode schema version is equal to the group 0 state ID of this
schema command.

The tombstone is necessary for the case of performing a schema change in
RECOVERY mode. It will force a revert to the old digest-based way.

Note that extending schema mutations with a `system.scylla_local`
mutation is possible thanks to earlier commits which moved
`system.scylla_local` to schema commitlog, so all mutations in the
schema mutations vector still go to the same commitlog domain.
2023-09-15 14:32:45 +02:00
Kamil Braun
3ab244e6d9 system_keyspace: make get/set_scylla_local_param public
We'll use it outside `system_keyspace` code in later commit.
2023-09-15 13:04:04 +02:00
Kamil Braun
72cd457d53 feature_service: add GROUP0_SCHEMA_VERSIONING feature
This feature, when enabled, will modify how schema versions
are calculated and stored.

- In group 0 mode, schema versions are persisted by the group 0 command
  that performs the schema change, then reused by each node instead of
  being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
  perform a schema change, we revert to the old digest-based way, taking
  into account the possibility of having performed group0-mode schema
  changes (that used persistent versions). As we will see in future
  commits, this will be done by storing additional flags and tombstones
  in system tables.

By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).

For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
2023-09-15 13:04:04 +02:00
Kamil Braun
dc4e20d835 schema_tables: refactor scylla_tables(schema_features)
The `scylla_tables` function gives a different schema definition
for the `system_schema.scylla_tables` table, depending on whether
certain schema features are enabled or not.

The way it was implemented, we had to write `θ(2^n)` amount
of code and comments to handle `n` features.

Refactor it so that the amount of code we have to write to handle `n`
features is `θ(n)`.
2023-09-15 13:04:04 +02:00
Kamil Braun
4376854473 schema_tables: remove default value for reload in merge_schema
To avoid bugs like the one fixed in the previous commit.
2023-09-15 13:04:04 +02:00
Kamil Braun
48164e1d09 schema_tables: pass reload flag when calling merge_schema cross-shard
In 0c86abab4d `merge_schema` obtained a new flag, `reload`.

Unfortunately, the flag was assigned a default value, which I think is
almost always a bad idea, and indeed it was in this case. When
`merge_scehma` is called on shard different than 0, it recursively calls
itself on shard 0. That recursive call forgot to pass the `reload` flag.

Fix this.
2023-09-15 13:04:04 +02:00
Kamil Braun
9017b998ca system_keyspace: fix outdated comment 2023-09-15 13:04:04 +02:00
Patryk Jędrzejczak
e375e769b9 raft topology: set CDC generation clean-up candidate
We want to use the clean-up candidates to remove the obsolete CDC
generation data, but first, we need to set suitable generations as
a candidate when there is no candidate. Since CDC generations must
be published before we remove them, a generation that is being
published is a good candidate.
2023-09-15 09:23:59 +02:00
Dawid Medrek
fbbb9f879a db/hints: Remove unused aliases from manager.hh 2023-09-15 04:17:08 +02:00
Dawid Medrek
d46437a87b db/hints: Rename end_point_hints_manager
This commit renames `end_point_hints_manager` to `hint_endpoint_manager`
to be consistent with other names used in the module (they all start
with `hint_`).
2023-09-15 03:46:15 +02:00
Dawid Medrek
6d1eee448b db/hints: Rename sender to hint_sender
We rename the structure to highlight what exactly its purpose is.
2023-09-15 03:46:15 +02:00
Dawid Medrek
4ad0f8907c db/hints: Move the rebalancing logic to hint_storage
This commit continues modularizing manager.hh.
2023-09-15 03:46:15 +02:00
Dawid Medrek
999484466d db/hints: Move the implementation of sender
This commit continues modularizing manager.hh.
After moving the declaration of sender to a dedicated
header file, these changes move its implementation to
a separate source file.
2023-09-15 03:46:15 +02:00
Dawid Medrek
17aabf6b9a db/hints: Move the declaration of sender to hint_sender.hh
This commit is yet another step in modularizing manager.hh.
We move the declaration of sender to a dedicated file.
Its implementation will follow in a future commit.
2023-09-15 03:46:15 +02:00
Dawid Medrek
1a7262ed6e db/hints: Move sender::replay_allowed() to the source file
The premise of these changes is the fact that we cannot have
a cycle of #includes.

Because the declaration of `sender` is going to be moved to
a separate header file in a future commit, and because that
header file is going to be included in the file where
`end_point_hints_manager` is declared, we will need to rely
on `end_point_hints_manager` being an incomplete type there.

A consequence of that is that we cannot access any of
`end_point_hints_manager`'s methods.

This commit prepares the ground for it by moving
the definition of the function to the source file where
`end_point_hints_manager` will be a complete type.
2023-09-15 03:46:15 +02:00
Dawid Medrek
ad2a36bd45 db/hints: Put end_point_hints_manager in internal namespace 2023-09-15 03:46:15 +02:00
Dawid Medrek
507054012d db/hints: Move the implementation of end_point_hints_manager
This commit continues moving end_point_hints_manager to its
dedicated files. After moving the declaration of the class,
these changes move the implementation.
2023-09-15 03:46:15 +02:00
Dawid Medrek
f72c423984 db/hints: Move the declaration of end_point_hints_manager
This commit is yet another step in modularizing manager.hh.
We move the declaration of the class to a dedicated header file.
The implementation will follow in a future commit.
2023-09-15 03:46:15 +02:00
Dawid Medrek
854cc0c939 db/hints: Move definitions of functions using shard hint manager
We move definitions of inline methods of end_point_hints_manager
and sender accessing shard hint manager to the source file,
effectively un-inlining them. We need to do that to prepare for
moving said structures out of manager.hh. This commit is yet
another step in modularizing manager.hh.
2023-09-15 03:45:57 +02:00
Dawid Medrek
db08a85f5d db/hints: Introduce hint_storage.hh
This commit moves types used by shard hint manager
and related to storing hints on disk to another file.
It is yet another step in modularizing manager.hh.
2023-09-15 02:28:10 +02:00
Dawid Medrek
4814b3b19a db/hints: Extract the logger from manager.cc
This commit extracts the logger used in manager.cc
to prepare the ground for modularization of manager.hh
into separate smaller files. We want to preserve
the logging behavior (at least for the time being),
which means new files should use the same logger.
These changes serve that purpose.
2023-09-15 02:24:20 +02:00
Dawid Medrek
efd6d1f57a db/hints: Extract common types from manager.hh
Currently, data structures used in manager.hh
use their own aliases for gms::inet_address.
It is clear they all should use the same type
and having different names for it only reduces
readability of the code. This commit introduces
a common alias -- endpoint_id -- and gets rid
of the other ones.

This commit is also the first step in modularizing
manager.hh by extracting common types to another
file.
2023-09-15 02:23:30 +02:00
Patryk Jędrzejczak
c0fd42ead4 system_keyspace: introduce decode_cdc_generation_id
The decode_cdc_generations_ids function allows us to decode
a vector of CDC generation IDs. After adding cleanup_candidate
to CDC_GENERATIONS_V3, we need a similar function that decodes
a single ID.
2023-09-14 12:09:14 +02:00
Patryk Jędrzejczak
6db325fb69 system_keyspace: add cleanup_candidate to CDC_GENERATIONS_V3
In the following commits, we implement a garbage collection for
CDC_GENERATIONS_V3. The first step is introducing the clean-up
candidate. It will be continually updated by the CDC generation
publisher and used to remove obsolete data.
2023-09-14 12:09:10 +02:00
Petr Gusev
082cd3bc8e system_keyspace: switch CDC_LOCAL to schema commitlog 2023-09-13 23:17:20 +04:00
Petr Gusev
a683cebb02 system_keyspace: scylla_local: use schema commitlog
We remove flush from set_scylla_local_param_as
since it's now redundant. We add it to
save_local_enabled_features as features need to
be available before schema commitlog replay.

We skip the flush if save_local_enabled_features
is called from topology_state_load when the features
are migrated to system.topology and we don't need
strict durability.
2023-09-13 23:17:20 +04:00
Petr Gusev
beb29f094b system_keyspace: drop load phases
We want to switch system.scylla_local table to the
schema commitlog, but load phases hamper here - schema
commitlog is initialized after phase1,
so a table which is using it should be moved to phase2,
but system.scylla_local contains features, and we need
them before  schema commitlog initialization for
SCHEMA_COMMITLOG feature.

In this commit we are taking a different approach to
loading system tables. First, we load them all in
one pass in 'readonly' mode. In this mode, the table
cannot be written to and has not yet been assigned
a commit log. To achieve this we've added _readonly bool field
to the table class, it's initialized to true in table's
constructor. In addition, we changed the table constructor
to always assign nullptr to commitlog, and we trigger
an internal error if table.commitlog() property is accessed
while the table is in readonly mode. Then, after
triggering on_system_tables_loaded notifications on
feature_service and sstable_format_selector, we call
system_keyspace::mark_writable and eventually
table::mark_ready_for_writes which selects the
proper commitlog and marks the table as writable.

In sstable_compaction_test we drop several
mark_ready_for_writes calls since they are redundant,
the table has already been made writable in
env.make_table_for_tests call.

The table::commitlog function either returns the current
commitlog or causes an error if the table is readonly. This
didn't work for virtual tables, since they never called
mark_ready_for_writes. In this commit we add this
call to initialize_virtual_tables.
2023-09-13 23:17:20 +04:00
Petr Gusev
47ffc66c7f database.hh: add_column_family: add readonly parameter
Previously, creating a table or view in
schema_tables.cc/merge_tables_and_views was a two-step process:
first adding a column family (add_column_family function) and
then marking it as ready for writes (mark_table_as_writable).
There is an yield between these stages, this means
someone could see a table or view for which the
mark_table_as_writable method had not yet been called,
and start writing to it.

This problem was demonstrated by materialised view dtests.
A view is created on all nodes. On some nodes it will be created
earlier than on others and the view rebuild process will start
writing data to that view on other nodes, where mark_table_as_writable
has not yet been called.

In this patch we solve this problem by adding a readonly parameter
to the add_column_family method. When loading tables from disk,
this flag is set to true and the mark_table_as_writable
is called only after all sstables have been loaded.
When creating a new table, this flag is set to false,
mark_table_as_writable is called from inside add_column_family
and the new table becomes visible already as writable.
2023-09-13 23:17:20 +04:00
Petr Gusev
7e52014633 schema_tables: merge_tables_and_views: delay events until tables/views are created on all shards
db.get_notifier().create_view triggers view rebuild, this
process writes to the table on all shards and thus can
access partially created table, e.g the one where
mark_table_ready_for_writes was not yet called.
2023-09-13 23:17:20 +04:00
Petr Gusev
0e5f9ae9a4 system_keyspace: switch system.peers to schema commitlog
Also, we remove flushes on writes as durability
is now guaranteed by the commitlog.
2023-09-13 23:17:20 +04:00
Petr Gusev
7881ce1e09 system_keyspace: switch system.local to schema commitlog
Schema commitlog lives only on the zero shard,
so we need to turn on use_null_sharder option.

Also, we remove flushes on writes as durability
is now guaranteed by the commitlog.
2023-09-13 23:17:20 +04:00
Petr Gusev
a0653590b5 sstables_format_selector: extract listener
In the following commits we want to move schema
commitlog replay earlier, but the current sstable
format should be selected before the replay.
The current sstable format is stored in system.scylla_local,
so we can't read it until system tables are loaded.
This problem is similar to the enabled_features.

To solve this we split sstables_format_selector in two
parts. The lower level part, sstables_format_selector,
knows only about database and system_keyspace. It
will be moved before system_keyspace initialization,
and the on_system_tables_loaded method will
be called on it when the system_keyspace has loaded its tables.

The higher level part, sstables_format_listener, is responsible
for subscribing to feature_services and gossipier and is started
later, at the same place as sstables_format_selector before this commit.
2023-09-13 23:04:50 +04:00
Petr Gusev
7104fc8a7e sstables_format_selector: wrap when_enabled with seastar::async
The listener may fire immediately, we must be in a thread
context for this to work.

In the next commits we are going to move
enable_features_on_startup above
sstables_format_selector::start in scylla_main, so we
need to fix this beforehand.
2023-09-13 23:00:16 +04:00
Petr Gusev
2a0b228d17 main.cc: inline and split system_keyspace.setup
Our goal is to switch system.local table to schema
commitlog and stop doing flushes when we write to it.
This means it would be incorrect to read from this
table until schema commitlog is replayed.

On the other hand, we need truncation records
to be loaded before we start replaying schema
commitlog, since commitlog_replayer relies on them.

In this commit we inline the system_keyspace::setup
function and split its content into two parts. In
the first part, before schema commitlog replay,
we load truncation records. It's safe to load
them before schema commitlog replay since we intend
to let the flushes on writes to system.truncated
table. In the second part, after schema commitlog replay,
we do the rest of the job - build_bootstrap_info and
db::schema_tables::save_system_schema.

We decided to inline this function since there is
very low cohesion between the actions it's performing.
It's just simpler to reason about them individually.
2023-09-13 23:00:15 +04:00
Petr Gusev
f0bc9f2d93 system_keyspace: refactor save_system_schema function
This is a refactoring commit without observable changes
in behaviour.

Previously, there were two related functions in db::schema_tables:
save_system_keyspace_schema(qp) and save_system_schema(qp, ks).
The first called the second passing "system_schema" as
the second argument. Outside of schema_tables module we
don't need two functions, we just need a way to say
'persist system schema objects in the appropriate tables/keyspaces'.
In this commit we change the function save_system_schema
to have this meaning. Internally it calls save_system_schema_to_keyspace
twice with "system_schema" and "system", since that's what we need
in the single call site of this function in system_keyspace::setup.
In subsequent commits we are going to move this call out of the
system_keyspace::setup.
2023-09-13 23:00:15 +04:00
Petr Gusev
e395086557 system_keyspace: move initialize_virtual_tables into virtual_tables.hh
This is a readability refactoring commit without observable changes
in behaviour.

initialize_virtual_tables logically belongs to virtual_tables module,
and it allows to make other functions in virtual_tables.cc
(register_virtual_tables, install_virtual_readers)
local to the module, which simplifies the matters a bit.

all_virtual_tables() is not needed anymore, all the references to
registered virtual tables are now local to virtual_tables module
and can just use virtual_tables variable directly.
2023-09-13 23:00:15 +04:00
Petr Gusev
c4787a160b system_keyspace: remove unused parameter 2023-09-13 23:00:15 +04:00
Petr Gusev
b90011294d config.cc: drop db::config::host_id
In this refactoring commit we remove the db::config::host_id
field, as it's hacky and duplicates token_metadata::get_my_id.

Some tests want specific host_id, we add it to cql_test_config
and use in cql_test_env.

We can't pass host_id to sstables_manager by value since it's
initialized in database constructor and host_id is not loaded yet.
We also prefer not to make a dependency on shared_token_metadata
since in this case we would have to create artificial
shared_token_metadata in many tools and tests where sstables_manager
is used. So we pass a function that returns host_id to
sstables_manager constructor.
2023-09-13 23:00:15 +04:00
Petr Gusev
a03fbc3781 system_keyspace: set null sharder when configuring schema commitlog
The schema commitlog lives only on the null shard, it
makes no sense to set use_schema_commitlog
without use_null_sharder.

We also extract the function enable_schema_commitlog which
sets all the needed properties.
2023-09-13 23:00:15 +04:00