scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Gleb Natapov	c2ef390a52	service: raft: move group0 write path into a separate file Writing into the group0 raft group on a client side involves locking the state machine, choosing a state id and checking for its presence after operation completes. The code that does it resides now in the migration manager since the currently it is the only user of group0. In the near future we will have more client for group0 and they all will have to have the same logic, so the patch moves it to a separate class raft_group0_client that any future user of group0 can use to write into it. Message-Id: <YoYAJwdTdbX+iCUn@scylladb.com>	2022-05-19 17:21:35 +03:00
Avi Kivity	5937b1fa23	treewide: remove empty comments in top-of-files After `fcb8d040` ("treewide: use Software Package Data Exchange (SPDX) license identifiers"), many dual-licensed files were left with empty comments on top. Remove them to avoid visual noise. Closes #10562	2022-05-13 07:11:58 +02:00
Pavel Emelyanov	42e733bdf7	migration_manager: Keep sharded<system_keyspace> reference The main target here is system_keyspace::update_schema_version() which is now static, but needs to have system_keyspace at "this". Migration manager is one of the places that calls that method indirectly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-03-16 14:24:40 +03:00
Kamil Braun	4a52b802ac	test: unit test for group 0 concurrent change protection and CQL DDL retries Check that group 0 history grows iff a schema change does not throw `group0_concurrent_modification`. Check that the CQL DDL statement retry mechanism works as expected.	2022-01-27 11:26:15 +01:00
Kamil Braun	edd8344706	cql3: statements: schema_altering_statement: automatically retry in presence of concurrent changes Schema changes on top of Raft do not allow concurrent changes. If two changes are attempted concurrently, one of them gets `group0_concurrent_modification` exception. Catch the exception in CQL DDL statement execution function and retry. In addition, the description of CQL DDL statements in group 0 history table was improved.	2022-01-27 11:26:14 +01:00
Kamil Braun	b863a63b08	test: unit test for clearing old entries in group0 history We perform a bunch of schema changes with different values of `migration_manager::_group0_history_gc_duration` and check if entries are cleared according to this setting.	2022-01-25 13:13:35 +01:00
Kamil Braun	e9083433a8	service: migration_manager: clear old entries from group 0 history when announcing When performing a change through group 0 (which right now only covers schema changes), clear entries from group 0 history table which are older than one week. This is done by including an appropriate range tombstone in the group 0 history table mutation.	2022-01-25 13:11:14 +01:00
Kamil Braun	044e05b0d9	service: migration_manager: `announce`: take a description parameter The description parameter is used for the group 0 history mutation. The default is empty, in which case the mutation will leave the description column as `null`. I filled the parameter in some easy places as an example and left the rest for a follow-up. This is how it looks now in a fresh cluster with a single statement performed by the user: cqlsh> select * from system.group0_history ; key \| state_id \| description ---------+--------------------------------------+------------------------------------------------------ history \| 9ec29cac-7547-11ec-cfd6-77bb9e31c952 \| CQL DDL statement history \| 9beb2526-7547-11ec-7b3e-3b198c757ef2 \| null history \| 9be937b6-7547-11ec-3b19-97e88bd1ca6f \| null history \| 9be784ca-7547-11ec-f297-f40f0073038e \| null history \| 9be52e14-7547-11ec-f7c5-af15a1a2de8c \| null history \| 9be335dc-7547-11ec-0b6d-f9798d005fb0 \| null history \| 9be160c2-7547-11ec-e0ea-29f4272345de \| null history \| 9bdf300e-7547-11ec-3d3f-e577a2e31ffd \| null history \| 9bdd2ea8-7547-11ec-c25d-8e297b77380e \| null history \| 9bdb925a-7547-11ec-d754-aa2cc394a22c \| null history \| 9bd8d830-7547-11ec-1550-5fd155e6cd86 \| null history \| 9bd36666-7547-11ec-230c-8702bc785cb9 \| Add new columns to system_distributed.service_levels history \| 9bd0a156-7547-11ec-a834-85eac94fd3b8 \| Create system_distributed(_everywhere) tables history \| 9bcfef18-7547-11ec-76d9-c23dfa1b3e6a \| Create system_distributed_everywhere keyspace history \| 9bcec89a-7547-11ec-e1b4-34e0010b4183 \| Create system_distributed keyspace	2022-01-24 15:20:37 +01:00
Kamil Braun	6a00e790c7	service: raft: check and update state IDs during group 0 operations The group 0 state machine will only modify state during command application if the provided "previous state ID" is equal to the last state ID present in the history table. Otherwise, the command will be a no-op. To ensure linearizability of group 0 changes, the performer of the change must first read the last state ID, only then read the state and send a command for the state machine. If a concurrent change races with this command and manages to modify the state, we will detect that the last state ID does not match during `apply`; all calls to `apply` are serialized, and `apply` adds the new entry to the history table at the end, after modifying the group 0 state. The details of this mechanism are abstracted away with `group0_guard`. To perform a group 0 change, one needs to call `announce`, which requires a `group0_guard` to be passed in. The only way to obtain a `group0_guard` is by calling `start_group0_operation`, which underneath performs a read barrier on group 0, obtains the last state ID from the history table, and constructs a new state ID that the change will append to the history table. The read barrier ensures that all previously completed changes are visible to this operation. The caller can then perform any necessary validation, construct mutations which modify group 0 state, and finally call `announce`. The guard also provides a timestamp which is used by the caller to construct the mutations. The timestamp is obtained from the new state ID. We ensure that it is greater than the timestamp of the last state ID. Thus, if the change is successful, the applied mutations will have greater timestamps than the previously applied mutations. We also add two locks. The more important one, used to ensure correctness, is `read_apply_mutex`. It is held when modifying group 0 state (in `apply` and `transfer_snapshot`) and when reading it (it's taken when obtaining a `group0_guard` and released before a command is sent in `announce`). Its goal is to ensure that we don't read partial state, which could happen without it because group 0 state consist of many parts and `apply` (or `transfer_snapshot`) potentially modifies all of them. Note: this doesn't give us 100% protection; if we crash in the middle of `apply` (or `transfer_snapshot`), then after restart we may read partial state. To remove this possibility we need to ensure that commands which were being applied before restart but not finished are re-applied after restart, before anyone can read the state. I left a TODO in `apply`. The second lock, `operation_mutex`, is used to improve liveness. It is taken when obtaining a `group0_guard` and released after a command is applied (compare to `read_apply_mutex` which is released before a command is sent). It is not taken inside `apply` or `transfer_snapshot`. This lock ensures that multiple fibers running on the same node do not attempt to modify group0 concurrently - this would cause some of them to fail (due to the concurrent modification protection described above). This is mostly important during first boot of the first node, when services start for the first time and try to create their internal tables. This lock serializes these attempts, ensuring that all of them succeed.	2022-01-24 15:20:37 +01:00
Kamil Braun	a664ac7ba5	treewide: require `group0_guard` when performing schema changes `announce` now takes a `group0_guard` by value. `group0_guard` can only be obtained through `migration_manager::start_group0_operation` and moved, it cannot be constructed outside `migration_manager`. The guard will be a method of ensuring linearizability for group 0 operations.	2022-01-24 15:20:35 +01:00
Kamil Braun	742f036261	service: migration_manager: introduce `group0_guard` This object will be used to "guard" group 0 operations. Obtaining it will be necessary to perform a group 0 change (such as modifying the schema), which will be enforced by the type system. The initial implementation is a stub and only provides a timestamp which will be used by callers to create mutations for group 0 changes. The next commit will change all call sites to use the guard as intended. The final implementation, coming later, will ensure linearizability of group 0 operations.	2022-01-24 15:12:50 +01:00
Kamil Braun	86762a1dd9	service: migration_manager: rename `schema_read_barrier` to `start_group0_operation` 1. Generalize the name so it mentions group 0, which schema will be a strict subset of. 2. Remove the fact that it performs a "read barrier" from the name. The function will be used in general to ensure linearizability of group0 operations - both reads and writes. "Read barrier" is Raft-specific terminology, so it can be thought of as an implementation detail.	2022-01-24 15:12:50 +01:00
Kamil Braun	0f24b907b7	service: migration_manager: `announce`: split raft and non-raft paths to separate functions	2022-01-24 15:12:50 +01:00
Kamil Braun	283ac7fefe	treewide: pass mutation timestamp from call sites into `migration_manager::prepare_*` functions The functions which prepare schema change mutations (such as `prepare_new_column_family_announcement`) would use internally generated timestamps for these mutations. When schema changes are managed by group 0 we want to ensure that timestamps of mutations applied through Raft are monotonic. We will generate these timestamps at call sites and pass them into the `prepare_` functions. This commit prepares the APIs.	2022-01-24 15:12:50 +01:00
Kamil Braun	3bab5c564a	service: migration_manager: remove some unused and disabled code `include_keyspace_and_announce` was no longer used. `do_announce_new_type` only had a declaration, it was not used and there was no definition.	2022-01-24 15:12:49 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Gleb Natapov	1ff85020b5	migration_manager: drop unused announce_ functions	2022-01-13 23:10:18 +02:00
Avi Kivity	63d254a8d2	Merge 'gms, service: futurize and coroutinize gossiper-related code' from Pavel Solodovnikov This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely). `i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`). This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions. To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though. Tests: unit(dev) Closes #9844 * github.com:scylladb/scylla: service: storage_service: coroutinize `set_gossip_tokens` service: storage_service: coroutinize `leave_ring` service: storage_service: coroutinize `handle_state_left` service: storage_service: coroutinize `handle_state_leaving` service: storage_service: coroutinize `handle_state_removing` service: storage_service: coroutinize `do_drain` service: storage_service: coroutinize `shutdown_protocol_servers` service: storage_service: coroutinize `excise` service: storage_service: coroutinize `remove_endpoint` service: storage_service: coroutinize `handle_state_replacing` service: storage_service: coroutinize `handle_state_normal` service: storage_service: coroutinize `update_peer_info` service: storage_service: coroutinize `do_update_system_peers_table` service: storage_service: coroutinize `update_table` service: storage_service: coroutinize `handle_state_bootstrap` service: storage_service: futurize `notify_*` functions service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges` repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair` locator: reconnectable_snitch_helper: coroutinize `reconnect` gms: i_endpoint_state_change_subscriber: make callbacks to return futures utils: atomic_vector: introduce future-returning `for_each` function utils: atomic_vector: rename `for_each` to `thread_for_each` gms: gossiper: coroutinize `start_gossiping` gms: gossiper: coroutinize `force_remove_endpoint` gms: gossiper: coroutinize `do_status_check` gms: gossiper: coroutinize `remove_endpoint`	2022-01-13 23:09:02 +02:00
Gleb Natapov	2aec9009ef	migration_manager: drop no longer used functions	2022-01-12 16:40:06 +02:00
Gleb Natapov	459539e812	migration_manager: do not allow creating keyspace with arbitrary timestamp This was needed to fix issue #2129 which was only manifest itself with auto_bootstrap set to false. The option is ignored now and we always wait for schema to synch during boot.	2022-01-12 16:33:15 +02:00
Pavel Solodovnikov	5dcfb94d5a	gms: i_endpoint_state_change_subscriber: make callbacks to return futures Coroutinize a few simple callbacks in the process. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2022-01-11 09:29:12 +03:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	a97731a7e5	migration_manager: replace uses of get_storage_proxy and get_local_storage_proxy with constructor-provided reference A static helper also gained a storage_proxy parameter.	2021-12-16 21:05:47 +02:00
Gleb Natapov	6e5061a12d	migration_manager: add is_raft_enabled() to check if raft is enabled on a cluster	2021-12-14 09:01:42 +02:00
Gleb Natapov	955e582fb6	migration_manager: add schema_read_barrier() function The function is responsible of calling raft's group zero read barrier in case it is enabled.	2021-12-14 09:01:42 +02:00
Gleb Natapov	e9fafea5c1	migration_manager: pass raft_gr to the migration manager Migration manager will be use raft group zero to distribute schema changes.	2021-12-11 12:31:07 +02:00
Gleb Natapov	38e1f85959	migration_manager: drop view_ptr array from announce_column_family_update() No users pass it any longer.	2021-12-11 12:31:07 +02:00
Gleb Natapov	a13ebe13c9	mm: drop unused announce_ methods	2021-12-11 12:31:07 +02:00
Gleb Natapov	07103d915e	migration_manager: add prepare_aggregate_drop_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	25ae8a6376	migration_manager: add prepare_function_drop_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	7430750674	migration_manager: add prepare_new_aggregate_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	10c14cd044	migration_manager: add prepare_new_function_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	25294e4460	migration_manager: add prepare_view_drop_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	87b52c30e7	migration_manager: add prepare_type_drop_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	471d48d277	migration_manager: add prepare_column_family_drop_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	f1cc1fb96e	migration_manager: add prepare_keyspace_drop_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	d79e426fb6	migration_manager: add prepare_keyspace_update_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	011f38a2f1	migration_manager: add prepare_new_keyspace_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	a4afc69b87	migration_manager: add prepare_view_update_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	3f47210374	migration_manager: add prepare_new_view_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	7cc629980b	migration_manager: add prepare_column_family_update_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	5af2c342a3	migration_manager: add prepare_update_type_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	5649daf76a	migration_manager: add prepare_new_type_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	20dbd717ff	migration_manager: add prepare_new_column_family_announcement() function The function only generates mutations for the announcement, but does not send them out. Will be used by the later patches.	2021-12-11 12:31:07 +02:00
Gleb Natapov	2f95a29209	migration_manager: add include_keyspace() function Currently a keyspace mutation is included into schema mutation list just before announcement. Move the inclusion to a separate function. It will be used later when instead of announcing new schema the mutation array will be returned.	2021-12-11 12:31:07 +02:00
Pavel Emelyanov	e4f35e2139	migration_manager: Eliminate storage service from passive announcing Currently storage service acts as a glue between database schema value and the migration manager "passive_announce" call. This interposing is not required, migration manager can do all the management itself, and the linkage can be done in main. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-02 19:43:30 +02:00
Pavel Emelyanov	eb8e30f696	migration_manager: Rename stop to drain then bring it back Because today's migration_manager::stop is called drain-time. Keep the .stop for next patch, but since it's called when the whole migration_manager stops, guard it against re-entrances. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-02 19:43:30 +02:00
Pavel Emelyanov	798f4b0e3f	migration_manager: Sanitize (maybe_)schedule_schema_pull Both calls are now private. Also the non-maybe one can become void and handle pull exceptions by itself. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-02 19:43:30 +02:00
Pavel Emelyanov	d4d0bd147e	migration_manager: Subscribe on gossiper events This is to start schema pulls upon on_join, on_alive and on_change ones in the next patch. Migration manager already has gossiper reference. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-12-02 19:43:30 +02:00
Gleb Natapov	e56022a8ba	migration_manager: co-routinize announce_column_family_update The patch also removes the usage of map_reduce() because it is no longer needed after `6191fd7701` that drops futures from the view mutation building path. The patch preserves yielding point that map_reduce() provides though by calling to coroutine::maybe_yield() explicitly. Message-Id: <YZoV3GzJsxR9AZfl@scylladb.com>	2021-11-22 10:48:25 +02:00

1 2 3

140 Commits