scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 21:17:01 +00:00

Author	SHA1	Message	Date
Gleb Natapov	f80fff3484	gossip: remove unused STATUS_LEAVING gossiper status The status is no longer used. The function that referenced it was removed by `5a96751534` and it was unused back then for awhile already. Message-Id: <ZS92mcGE9Ke5DfXB@scylladb.com>	2023-10-18 11:13:14 +02:00
Avi Kivity	35849fc901	Revert "Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun" This reverts commit `3d4398d1b2`, reversing changes made to `45dfce6632`. The commit causes some schema changes to be lost due to incorrect timestamps in some mutations. More information is available in [1]. Reopens: scylladb/scylladb#7620 Reopens: scylladb/scylladb#13957 Fixes scylladb/scylladb#15530. [1] https://github.com/scylladb/scylladb/pull/15687	2023-10-11 00:32:05 +03:00
Benny Halevy	e8f720315d	gossiper: run: hold background_gate when sending gossip in background So it would be waited on in shutdown(). Although gossiper::run holds the `_callback_running` semaphore which is acquired in `do_stop_gossiping`, the gossip messages it initiates in the background are never waited on. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15493	2023-09-21 08:54:35 +03:00
Benny Halevy	72a5ac9ce7	gossiper: get_or_create_endpoint_state: create empty endpoint_state Currently, the endpoint address is set as the new endpoint_state RPC_ADDRESS. This is wrong since it should be assigned with the `broadcast_rpc_address` rather than `broadcast_address`. This was introduced in `b82c77ed9c` Instead just create an empty endpoint_state. The RPC_ADDRESS (as well as HOST_ID) application states are set later. Fixes scylladb/scylladb#15458 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15475	2023-09-20 13:20:44 +02:00
Kamil Braun	c2beee348a	feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode As promised in earlier commits: Fixes: #7620 Fixes: #13957 Also modify two test cases in `schema_change_test` which depend on the digest calculation method in their checks. Details are explained in the comments.	2023-09-15 17:54:36 +02:00
Kamil Braun	72cd457d53	feature_service: add `GROUP0_SCHEMA_VERSIONING` feature This feature, when enabled, will modify how schema versions are calculated and stored. - In group 0 mode, schema versions are persisted by the group 0 command that performs the schema change, then reused by each node instead of being calculated as a digest (hash) by each node independently. - In RECOVERY mode or before Raft upgrade procedure finishes, when we perform a schema change, we revert to the old digest-based way, taking into account the possibility of having performed group0-mode schema changes (that used persistent versions). As we will see in future commits, this will be done by storing additional flags and tombstones in system tables. By "schema versions" we mean both the UUIDs returned from `schema::version()` and the "global" schema version (the one we gossip as `application_state::SCHEMA`). For now, in this commit, the feature is always disabled. Once all necessary code is setup in following commits, we will enable it together with Raft.	2023-09-15 13:04:04 +02:00
Petr Gusev	a683cebb02	system_keyspace: scylla_local: use schema commitlog We remove flush from set_scylla_local_param_as since it's now redundant. We add it to save_local_enabled_features as features need to be available before schema commitlog replay. We skip the flush if save_local_enabled_features is called from topology_state_load when the features are migrated to system.topology and we don't need strict durability.	2023-09-13 23:17:20 +04:00
Petr Gusev	cbfc512667	main.cc: move schema commitlog replay earlier We want to switch system.local table to schema commitlog, but this table is used in host_id initialization (initialize_local_info), so we need to replay schema commitlog before. In this commit we gather all the actions related to early system_keyspace initialization in one place, before initialize_local_info_thread. The calls to save_system_schema and recalculate_schema_version are tied to legacy_schema_migrator::migrate and initialize_virtual_tables calls, so they are done separately after legacy_schema_migrator::migrate.	2023-09-13 23:17:11 +04:00
Piotr Dulikowski	66206207f9	gossiper: properly acquire lock_endpoint_update_semaphore in reset_endpoint_state_map The `gossiper::reset_endpoint_state_map` function is supposed to acquire a lock in order to serialize with `replicate_live_endpoints_on_change`. The `lock_endpoint_update_semaphore` is called, but its result is a future - and it is not co_awaited. Therefore, the lock has no effect. This commit fixes the issue by adding missing co_await. Fixes: #15361 Closes #15362	2023-09-13 10:03:47 +02:00
Avi Kivity	2c810e221a	Merge 'Gossiper: replace seastar threads with coroutines' from Benny Halevy Many of the gossiper internal functions currently use seastar threads for historical reasons, but since they are short living, the cost of spawning a seastar thread for them is excessive and they can be simplified and made more efficient using coroutines. Closes #15364 * github.com:scylladb/scylladb: gossiper: reindent do_stop_gossiping gossiper: coroutinize do_stop_gossiping gossiper: reindent assassinate_endpoint gossiper: coroutinize assassinate_endpoint gossiper: coroutinize handle_ack2_msg gossiper: handle_ack_msg: always log warning on exception gossiper: reindent handle_ack_msg gossiper: coroutinize handle_ack_msg gossiper: reindent handle_syn_msg gossiper: coroutinize handle_syn_msg gossiper: message handlers: no need to capture shared_from_this gossiper: add_local_application_state: throw internal error if endpoint state is not found gossiper: coroutinize add_local_application_state	2023-09-12 21:50:52 +03:00
Benny Halevy	47dc287efd	gossiper: reindent do_stop_gossiping Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:33:09 +03:00
Benny Halevy	8fa65ed016	gossiper: coroutinize do_stop_gossiping Simplify the function. It does not need to spawn a seastar thread. While at it, declare it as private since it's called only internally by the gossiper (and on shard 0). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:33:09 +03:00
Benny Halevy	a792babbda	gossiper: reindent assassinate_endpoint Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:33:09 +03:00
Benny Halevy	5dbc168c03	gossiper: coroutinize assassinate_endpoint It has no need to spawn a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:33:09 +03:00
Benny Halevy	29b9596050	gossiper: coroutinize handle_ack2_msg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:33:09 +03:00
Benny Halevy	cc030a5040	gossiper: handle_ack_msg: always log warning on exception Unlike handle_syn_msg, the warning is currently printed only `if (_ack_handlers.contains(from.addr))`. Unclear why. It is interesting in any case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:32:40 +03:00
Benny Halevy	990ac23d19	gossiper: reindent handle_ack_msg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:27:08 +03:00
Benny Halevy	2ca2118130	gossiper: coroutinize handle_ack_msg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:26:03 +03:00
Benny Halevy	8c065bf023	gossiper: reindent handle_syn_msg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:24:14 +03:00
Benny Halevy	264f4daded	gossiper: coroutinize handle_syn_msg Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:23:09 +03:00
Benny Halevy	63ab5f1ab3	gossiper: message handlers: no need to capture shared_from_this The handlers future is waited on under `background_msg` which is closed in gossiper::stop so the instance is already guranteed to be kept valid. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:21:07 +03:00
Benny Halevy	8bfec81985	gossiper: add_local_application_state: throw internal error if endpoint state is not found If the function is called too early, the first get_endpoint_state_ptr would throw an exception that is later caught and degraded into a warning. But that endpoint_state should never disappear after yielding, so call on_internal_error in that case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:21:07 +03:00
Benny Halevy	d1c67300d4	gossiper: coroutinize add_local_application_state There is no need for it to spawn a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:20:41 +03:00
Tomasz Grabiec	6e83e54b0d	Merge 'gossiper: get rid of uses_host_id' from Benny Halevy This function practically returned true from inception. In `d38deef499` it started using messaging_service().knows_version(endpoint) that also returns `true` unconditionally, to this day So there's no point calling it since we can assume that `uses_host_id` is true for all versions. Closes #15343 * github.com:scylladb/scylladb: storage_service: fixup indentation after last patch gossiper: get rid of uses_host_id	2023-09-12 12:44:56 +02:00
Benny Halevy	08f8fd30ea	gossiper: get rid of comment about advertise_removing It was deleted in `66ff072540`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20230911140349.1809014-1-bhalevy@scylladb.com>	2023-09-11 16:14:26 +02:00
Benny Halevy	f855479c9d	gossiper: get rid of uses_host_id This function practically returned true from inception. In `d38deef499` It started using messaging_service().knows_version(endpoint) that also returns `true` unconditionally, to this day So there's no point calling it since we can assume that `uses_host_id` is true for all versions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-11 16:48:07 +03:00
Benny Halevy	c5e4dace8e	gossiper: real_mark_alive: do not erase from unreachable_endpoints without holding lock This code was supposed to be moved into `mutate_live_and_unreachable_endpoints` in `2c27297dbd` but it looks like the original statements were left in place outside the mutate function. This patch just removes the stale code since the required logic is already done inside `mutate_live_and_unreachable_endpoints`. Fixes scylladb/scylladb#15296 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15304	2023-09-07 10:02:49 +02:00
Benny Halevy	04ba560b8d	gossiper: get_current* methods: mark as const We need to const_cast `this` since the const container() has no const invoke_on override. Trying to fix this in seastar sharded.hh breaks many other call sites in scylla. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:18:04 +03:00
Benny Halevy	43d883c5aa	gossiper: get_generation_for_nodes: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:17:38 +03:00
Benny Halevy	cfe0ec2203	gossiper: examine_gossiper: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:17:25 +03:00
Benny Halevy	ce05bbe32f	gossiper: request_all, send_all: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:16:19 +03:00
Benny Halevy	cc1d5771e5	gossiper: do_on_*notifications: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:16:10 +03:00
Benny Halevy	963d6fb009	gossiper: compare_endpoint_startup: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:14:22 +03:00
Benny Halevy	2899e07572	gossiper: get_state_for_version_bigger_than: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:13:02 +03:00
Benny Halevy	87ac1a26f2	gossiper: make_random_gossip_digest: delete dead legacy code Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:12:51 +03:00
Benny Halevy	33f004587e	gossiper: make_random_gossip_digest: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:12:43 +03:00
Benny Halevy	02e8fdc4b8	gossiper: do_sort: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:11:56 +03:00
Benny Halevy	482963b2c4	gossiper: is* methods: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:11:00 +03:00
Benny Halevy	f7eddf0322	gossiper: wait_for_gossip and friends: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:09:15 +03:00
Benny Halevy	044a696aca	gossiper: drop unused dump_endpoint_state_map Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:09:04 +03:00
Benny Halevy	083506d479	gossiper: remove unused shadow version members Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:08:25 +03:00
Tomasz Grabiec	7b65d4d947	Merge 'Gossiper: provide strong exception safety for endpoint state changes' from Benny Halevy This series ensures that endpoint state changes (for each single endpoint) are applied to the gossiper endpoint_state_map as a whole and on all shards. Any failure in the process will keep the existing endpoint state intact. Note that verbs that modify the endpoint states of multiple endpoints may still succeed to modify some of them before hitting an error and those changes are committed to the endpoint_state_map, so we don't ensure atomicity when updating multiple endpoints' states. Fixes scylladb/scylladb#14794 Fixes scylladb/scylladb#14799 Closes #15073 * github.com:scylladb/scylladb: gossiper: move endpoint_state by value to apply it gossiper: replicate: make exception safe gms: pass endpoint_state_ptr to endpoint_state change subscribers gossiper: modify endpoint state only via replicate gossiper: keep and serve shared endpoint_state_ptr in map gossiper: get_max_endpoint_state_version: get state by reference api/failure_detector: get_all_endpoint_states: reduce allocations cdc/generation: get_generation_id_for: get endpoint_state& gossiper: add for_each_endpoint_state helpers gossiper: add num_endpoints gossiper: add my_endpoint_state	2023-09-01 12:23:19 +02:00
Kamil Braun	117dedab19	Merge 'Cluster features on raft: topology coordinator + check on boot followups' from Piotr Dulikowski This PR collects followups described in #14972: - The `system.topology` table is now flushed every time feature-related columns are modified. This is done because of the feature check that happens before the schema commitlog is replayed. - The implementation now guarantees that, if all nodes support some feature as described by the `supported_features` column, then support for that feature will not be revoked by any node. Previously, in an edge case where a node is the last one to add support for some feature `X` in `supported_features` column, crashes before applying/persisting it and then restarts without supporting `X`, it would be allowed to boot anyway and would revoke support for the `X` in `system.topology`. The existing behavior, although counterintuitive, was safe - the topology coordinator is responsible for explicitly marking features as enabled, and in order to enable a feature it needs to perform a special kind of a global barrier (`barrier_after_feature_update`) which only succeeds after the node has updated its features column - so there is no risk of enabling an unsupported feature. In order to make the behavior less confusing, the node now will perform a second check when it tries to update its `supported_features` column in `system.topology`. - The `barrier_after_feature_update` is removed and the regular global `barrier` topology command is used instead. The `barrier` handler now performs a feature check if the node did not have a chance to verify and update its cluster features for the second time. JOIN_NODE rpc will be sent separately as it is a big item on its own. Fixes: #14972 Closes #15168 * github.com:scylladb/scylladb: test: topology{_experimental_raft}: don't stop gracefully in feature tests storage_service: remove _topology_updated_with_local_metadata topology_coordinator: remove barrier_after_feature_update topology_coordinator: perform feature check during barrier storage_service: repeat the feature check after read barrier feature_service: introduce unsupported_feature_exception feature_service: move startup feature check to a separate function topology_coordinator: account for features to enable in should_preempt_balancing group0_state_machine: flush system.topology when updating features columns	2023-09-01 11:52:26 +02:00
Piotr Dulikowski	aa5401383f	feature_service: introduce unsupported_feature_exception The new `unsupported_feature_exception` is introduced so that the exception thrown by `check_features` can be caught in a type-safe way.	2023-08-31 16:46:10 +02:00
Piotr Dulikowski	8286a2c369	feature_service: move startup feature check to a separate function The logic responsible for checking supported features agains the currently enabled features (and features that are unsafe to disable) is moved to a separate function, `check_features`. Currently, it is only used from `enable_features_on_startup`, but more checks against features in raft will be added in the commits that follow.	2023-08-31 16:45:40 +02:00
Benny Halevy	98fd9fcc11	gossiper: move endpoint_state by value to apply it Save a copy of the applied endpoint state by moving the value towards replicate. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	38c2347a3c	gossiper: replicate: make exception safe First replicate the new endpoint_state on all shards before applying the replicated endpoint_state objects to _endpoint_state_map. Fixes scylladb/scylladb#14794 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	c16ec870da	gms: pass endpoint_state_ptr to endpoint_state change subscribers Now that the endpoint_state isn't change in place we do not need to copy it to each subscriber. We can rather just pass the lw_shared_ptr holding a snapshot of it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	1d04242a90	gossiper: modify endpoint state only via replicate And restrict the accessor methods to return const pointers or refrences. With that, the endpoint_state_ptr:s held in the _endpoint_state_map point to immutable endpoint_state objects - with one exception: the endpoint_state update_timestamp may be updated in place, but the endpoint_state_map is immutable. replicate() replaces the endpoint_state_ptr in the map with a new one to maintain immutability. A later change will also make this exception safe so replicate will guarantee strong exception safety so that all shards are updated or none of them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	d00e49a1bb	gossiper: keep and serve shared endpoint_state_ptr in map This commit changes the interface to using endpoint_state_ptr = lw_shared_ptr<const endpoint_state> so that users can get a snapshot of the endpoint_state that they must not modify in-place anyhow. While internally, gossiper still has the legacy helpers to manage the endpoint_state. Fixes scylladb/scylladb#14799 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:34:36 +03:00

1 2 3 4 5 ...

1002 Commits