Commit Graph

140 Commits

Author SHA1 Message Date
Gleb Natapov
c2ef390a52 service: raft: move group0 write path into a separate file
Writing into the group0 raft group on a client side involves locking
the state machine, choosing a state id and checking for its presence
after operation completes. The code that does it resides now in the
migration manager since the currently it is the only user of group0. In
the near future we will have more client for group0 and they all will
have to have the same logic, so the patch moves it to a separate class
raft_group0_client that any future user of group0 can use to write
into it.

Message-Id: <YoYAJwdTdbX+iCUn@scylladb.com>
2022-05-19 17:21:35 +03:00
Avi Kivity
5937b1fa23 treewide: remove empty comments in top-of-files
After fcb8d040 ("treewide: use Software Package Data Exchange
(SPDX) license identifiers"), many dual-licensed files were
left with empty comments on top. Remove them to avoid visual
noise.

Closes #10562
2022-05-13 07:11:58 +02:00
Pavel Emelyanov
42e733bdf7 migration_manager: Keep sharded<system_keyspace> reference
The main target here is system_keyspace::update_schema_version() which
is now static, but needs to have system_keyspace at "this". Migration
manager is one of the places that calls that method indirectly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-03-16 14:24:40 +03:00
Kamil Braun
4a52b802ac test: unit test for group 0 concurrent change protection and CQL DDL retries
Check that group 0 history grows iff a schema change does not throw
`group0_concurrent_modification`. Check that the CQL DDL statement retry
mechanism works as expected.
2022-01-27 11:26:15 +01:00
Kamil Braun
edd8344706 cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes
Schema changes on top of Raft do not allow concurrent changes.
If two changes are attempted concurrently, one of them gets
`group0_concurrent_modification` exception.

Catch the exception in CQL DDL statement execution function and retry.

In addition, the description of CQL DDL statements in group 0 history
table was improved.
2022-01-27 11:26:14 +01:00
Kamil Braun
b863a63b08 test: unit test for clearing old entries in group0 history
We perform a bunch of schema changes with different values of
`migration_manager::_group0_history_gc_duration` and check if entries
are cleared according to this setting.
2022-01-25 13:13:35 +01:00
Kamil Braun
e9083433a8 service: migration_manager: clear old entries from group 0 history when announcing
When performing a change through group 0 (which right now only covers
schema changes), clear entries from group 0 history table which are older
than one week.

This is done by including an appropriate range tombstone in the group 0
history table mutation.
2022-01-25 13:11:14 +01:00
Kamil Braun
044e05b0d9 service: migration_manager: announce: take a description parameter
The description parameter is used for the group 0 history mutation.
The default is empty, in which case the mutation will leave
the description column as `null`.
I filled the parameter in some easy places as an example and left the
rest for a follow-up.

This is how it looks now in a fresh cluster with a single statement
performed by the user:

cqlsh> select * from system.group0_history ;

 key     | state_id                             | description
---------+--------------------------------------+------------------------------------------------------
 history | 9ec29cac-7547-11ec-cfd6-77bb9e31c952 |                                    CQL DDL statement
 history | 9beb2526-7547-11ec-7b3e-3b198c757ef2 |                                                 null
 history | 9be937b6-7547-11ec-3b19-97e88bd1ca6f |                                                 null
 history | 9be784ca-7547-11ec-f297-f40f0073038e |                                                 null
 history | 9be52e14-7547-11ec-f7c5-af15a1a2de8c |                                                 null
 history | 9be335dc-7547-11ec-0b6d-f9798d005fb0 |                                                 null
 history | 9be160c2-7547-11ec-e0ea-29f4272345de |                                                 null
 history | 9bdf300e-7547-11ec-3d3f-e577a2e31ffd |                                                 null
 history | 9bdd2ea8-7547-11ec-c25d-8e297b77380e |                                                 null
 history | 9bdb925a-7547-11ec-d754-aa2cc394a22c |                                                 null
 history | 9bd8d830-7547-11ec-1550-5fd155e6cd86 |                                                 null
 history | 9bd36666-7547-11ec-230c-8702bc785cb9 | Add new columns to system_distributed.service_levels
 history | 9bd0a156-7547-11ec-a834-85eac94fd3b8 |        Create system_distributed(_everywhere) tables
 history | 9bcfef18-7547-11ec-76d9-c23dfa1b3e6a |        Create system_distributed_everywhere keyspace
 history | 9bcec89a-7547-11ec-e1b4-34e0010b4183 |                   Create system_distributed keyspace
2022-01-24 15:20:37 +01:00
Kamil Braun
6a00e790c7 service: raft: check and update state IDs during group 0 operations
The group 0 state machine will only modify state during command
application if the provided "previous state ID" is equal to the
last state ID present in the history table. Otherwise, the command will
be a no-op.

To ensure linearizability of group 0 changes, the performer of the
change must first read the last state ID, only then read the state
and send a command for the state machine. If a concurrent change
races with this command and manages to modify the state, we will detect
that the last state ID does not match during `apply`; all calls to
`apply` are serialized, and `apply` adds the new entry to the history
table at the end, after modifying the group 0 state.

The details of this mechanism are abstracted away with `group0_guard`.
To perform a group 0 change, one needs to call `announce`, which
requires a `group0_guard` to be passed in. The only way to obtain a
`group0_guard` is by calling `start_group0_operation`, which underneath
performs a read barrier on group 0, obtains the last state ID from the
history table, and constructs a new state ID that the change will append
to the history table. The read barrier ensures that all previously
completed changes are visible to this operation. The caller can then
perform any necessary validation, construct mutations which modify group
0 state, and finally call `announce`.

The guard also provides a timestamp which is used by the caller
to construct the mutations. The timestamp is obtained from the new state ID.
We ensure that it is greater than the timestamp of the last state ID.
Thus, if the change is successful, the applied mutations will have greater
timestamps than the previously applied mutations.

We also add two locks. The more important one, used to ensure
correctness, is `read_apply_mutex`. It is held when modifying group 0
state (in `apply` and `transfer_snapshot`) and when reading it (it's
taken when obtaining a `group0_guard` and released before a command is
sent in `announce`). Its goal is to ensure that we don't read partial
state, which could happen without it because group 0 state consist of
many parts and `apply` (or `transfer_snapshot`) potentially modifies all
of them. Note: this doesn't give us 100% protection; if we crash in the
middle of `apply` (or `transfer_snapshot`), then after restart we may
read partial state. To remove this possibility we need to ensure that
commands which were being applied before restart but not finished are
re-applied after restart, before anyone can read the state. I left a
TODO in `apply`.

The second lock, `operation_mutex`, is used to improve liveness. It is
taken when obtaining a `group0_guard` and released after a command is
applied (compare to `read_apply_mutex` which is released before a
command is sent). It is not taken inside `apply` or `transfer_snapshot`.
This lock ensures that multiple fibers running on the same node do not
attempt to modify group0 concurrently - this would cause some of them to
fail (due to the concurrent modification protection described above).
This is mostly important during first boot of the first node, when
services start for the first time and try to create their internal
tables. This lock serializes these attempts, ensuring that all of them
succeed.
2022-01-24 15:20:37 +01:00
Kamil Braun
a664ac7ba5 treewide: require group0_guard when performing schema changes
`announce` now takes a `group0_guard` by value. `group0_guard` can only
be obtained through `migration_manager::start_group0_operation` and
moved, it cannot be constructed outside `migration_manager`.

The guard will be a method of ensuring linearizability for group 0
operations.
2022-01-24 15:20:35 +01:00
Kamil Braun
742f036261 service: migration_manager: introduce group0_guard
This object will be used to "guard" group 0 operations. Obtaining it
will be necessary to perform a group 0 change (such as modifying the
schema), which will be enforced by the type system.

The initial implementation is a stub and only provides a timestamp which
will be used by callers to create mutations for group 0 changes. The
next commit will change all call sites to use the guard as intended.

The final implementation, coming later, will ensure linearizability of
group 0 operations.
2022-01-24 15:12:50 +01:00
Kamil Braun
86762a1dd9 service: migration_manager: rename schema_read_barrier to start_group0_operation
1. Generalize the name so it mentions group 0, which schema will be a
   strict subset of.
2. Remove the fact that it performs a "read barrier" from the name. The
   function will be used in general to ensure linearizability of group0
   operations - both reads and writes. "Read barrier" is Raft-specific
   terminology, so it can be thought of as an implementation detail.
2022-01-24 15:12:50 +01:00
Kamil Braun
0f24b907b7 service: migration_manager: announce: split raft and non-raft paths to separate functions 2022-01-24 15:12:50 +01:00
Kamil Braun
283ac7fefe treewide: pass mutation timestamp from call sites into migration_manager::prepare_* functions
The functions which prepare schema change mutations (such as
`prepare_new_column_family_announcement`) would use internally
generated timestamps for these mutations. When schema changes are
managed by group 0 we want to ensure that timestamps of mutations
applied through Raft are monotonic. We will generate these timestamps at
call sites and pass them into the `prepare_` functions. This commit
prepares the APIs.
2022-01-24 15:12:50 +01:00
Kamil Braun
3bab5c564a service: migration_manager: remove some unused and disabled code
`include_keyspace_and_announce` was no longer used.
`do_announce_new_type` only had a declaration, it was not used and there
was no definition.
2022-01-24 15:12:49 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Gleb Natapov
1ff85020b5 migration_manager: drop unused announce_ functions 2022-01-13 23:10:18 +02:00
Avi Kivity
63d254a8d2 Merge 'gms, service: futurize and coroutinize gossiper-related code' from Pavel Solodovnikov
This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely).

`i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`).
This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions.

To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though.

Tests: unit(dev)

Closes #9844

* github.com:scylladb/scylla:
  service: storage_service: coroutinize `set_gossip_tokens`
  service: storage_service: coroutinize `leave_ring`
  service: storage_service: coroutinize `handle_state_left`
  service: storage_service: coroutinize `handle_state_leaving`
  service: storage_service: coroutinize `handle_state_removing`
  service: storage_service: coroutinize `do_drain`
  service: storage_service: coroutinize `shutdown_protocol_servers`
  service: storage_service: coroutinize `excise`
  service: storage_service: coroutinize `remove_endpoint`
  service: storage_service: coroutinize `handle_state_replacing`
  service: storage_service: coroutinize `handle_state_normal`
  service: storage_service: coroutinize `update_peer_info`
  service: storage_service: coroutinize `do_update_system_peers_table`
  service: storage_service: coroutinize `update_table`
  service: storage_service: coroutinize `handle_state_bootstrap`
  service: storage_service: futurize `notify_*` functions
  service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges`
  repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair`
  locator: reconnectable_snitch_helper: coroutinize `reconnect`
  gms: i_endpoint_state_change_subscriber: make callbacks to return futures
  utils: atomic_vector: introduce future-returning `for_each` function
  utils: atomic_vector: rename `for_each` to `thread_for_each`
  gms: gossiper: coroutinize `start_gossiping`
  gms: gossiper: coroutinize `force_remove_endpoint`
  gms: gossiper: coroutinize `do_status_check`
  gms: gossiper: coroutinize `remove_endpoint`
2022-01-13 23:09:02 +02:00
Gleb Natapov
2aec9009ef migration_manager: drop no longer used functions 2022-01-12 16:40:06 +02:00
Gleb Natapov
459539e812 migration_manager: do not allow creating keyspace with arbitrary timestamp
This was needed to fix issue #2129 which was only manifest itself with
auto_bootstrap set to false. The option is ignored now and we always
wait for schema to synch during boot.
2022-01-12 16:33:15 +02:00
Pavel Solodovnikov
5dcfb94d5a gms: i_endpoint_state_change_subscriber: make callbacks to return futures
Coroutinize a few simple callbacks in the process.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-01-11 09:29:12 +03:00
Avi Kivity
bbad8f4677 replica: move ::database, ::keyspace, and ::table to replica namespace
Move replica-oriented classes to the replica namespace. The main
classes moved are ::database, ::keyspace, and ::table, but a few
ancillary classes are also moved. There are certainly classes that
should be moved but aren't (like distributed_loader) but we have
to start somewhere.

References are adjusted treewide. In many cases, it is obvious that
a call site should not access the replica (but the data_dictionary
instead), but that is left for separate work.

scylla-gdb.py is adjusted to look for both the new and old names.
2022-01-07 12:04:38 +02:00
Avi Kivity
a97731a7e5 migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference
A static helper also gained a storage_proxy parameter.
2021-12-16 21:05:47 +02:00
Gleb Natapov
6e5061a12d migration_manager: add is_raft_enabled() to check if raft is enabled on a cluster 2021-12-14 09:01:42 +02:00
Gleb Natapov
955e582fb6 migration_manager: add schema_read_barrier() function
The function is responsible of calling raft's group zero read barrier in
case it is enabled.
2021-12-14 09:01:42 +02:00
Gleb Natapov
e9fafea5c1 migration_manager: pass raft_gr to the migration manager
Migration manager will be use raft group zero to distribute schema
changes.
2021-12-11 12:31:07 +02:00
Gleb Natapov
38e1f85959 migration_manager: drop view_ptr array from announce_column_family_update()
No users pass it any longer.
2021-12-11 12:31:07 +02:00
Gleb Natapov
a13ebe13c9 mm: drop unused announce_ methods 2021-12-11 12:31:07 +02:00
Gleb Natapov
07103d915e migration_manager: add prepare_aggregate_drop_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
25ae8a6376 migration_manager: add prepare_function_drop_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
7430750674 migration_manager: add prepare_new_aggregate_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
10c14cd044 migration_manager: add prepare_new_function_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
25294e4460 migration_manager: add prepare_view_drop_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
87b52c30e7 migration_manager: add prepare_type_drop_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
471d48d277 migration_manager: add prepare_column_family_drop_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
f1cc1fb96e migration_manager: add prepare_keyspace_drop_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
d79e426fb6 migration_manager: add prepare_keyspace_update_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
011f38a2f1 migration_manager: add prepare_new_keyspace_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
a4afc69b87 migration_manager: add prepare_view_update_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
3f47210374 migration_manager: add prepare_new_view_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
7cc629980b migration_manager: add prepare_column_family_update_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
5af2c342a3 migration_manager: add prepare_update_type_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
5649daf76a migration_manager: add prepare_new_type_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
20dbd717ff migration_manager: add prepare_new_column_family_announcement() function
The function only generates mutations for the announcement, but does not
send them out. Will be used by the later patches.
2021-12-11 12:31:07 +02:00
Gleb Natapov
2f95a29209 migration_manager: add include_keyspace() function
Currently a keyspace mutation is included into schema mutation list just
before announcement. Move the inclusion to a separate function. It will
be used later when instead of announcing new schema the mutation array
will be returned.
2021-12-11 12:31:07 +02:00
Pavel Emelyanov
e4f35e2139 migration_manager: Eliminate storage service from passive announcing
Currently storage service acts as a glue between database schema value
and the migration manager "passive_announce" call. This interposing is
not required, migration manager can do all the management itself, and
the linkage can be done in main.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
eb8e30f696 migration_manager: Rename stop to drain then bring it back
Because today's migration_manager::stop is called drain-time.
Keep the .stop for next patch, but since it's called when the
whole migration_manager stops, guard it against re-entrances.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
798f4b0e3f migration_manager: Sanitize (maybe_)schedule_schema_pull
Both calls are now private. Also the non-maybe one can become void
and handle pull exceptions by itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Pavel Emelyanov
d4d0bd147e migration_manager: Subscribe on gossiper events
This is to start schema pulls upon on_join, on_alive and on_change ones
in the next patch. Migration manager already has gossiper reference.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2021-12-02 19:43:30 +02:00
Gleb Natapov
e56022a8ba migration_manager: co-routinize announce_column_family_update
The patch also removes the usage of map_reduce() because it is no longer needed
after 6191fd7701 that drops futures from the view mutation building path.
The patch preserves yielding point that map_reduce() provides though by
calling to coroutine::maybe_yield() explicitly.

Message-Id: <YZoV3GzJsxR9AZfl@scylladb.com>
2021-11-22 10:48:25 +02:00