scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 18:10:39 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	e5e8b970ed	join_token_ring, gossip topology: recalculate sync nodes in wait_alive Before this patch, if we booted a node just after removing a different node, the booting node may still see the removed node as NORMAL and wait for it to be UP, which would time out and fail the bootstrap. This issue caused scylladb/scylladb#17526. Fix it by recalculating the nodes to wait for in every step of the of the `wait_alive` loop. (cherry picked from commit `017134fd38`)	2024-06-21 12:05:42 +00:00
Benny Halevy	796ca367d1	gossiper: rename topo_sm member to _topo_sm Follow scylla convention for class member naming. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#18528	2024-05-12 11:02:35 +03:00
Kamil Braun	3363f6e1e8	Merge 'Fix write failures during node replace with same IP with topology over raft' from Gleb Currently a new node is marked as alive too late, after it is already reported as a pending node. The patch series changes replace procedure to be the same as what node_ops do: first stop reporting the IP of the node that is being replaced as a natural replica for writes, then mark the IP is alive, and only after that report the IP as a pending endpoint. Fixes: scylladb/scylladb#17421 * 'gleb/17421-fix-v2' of github.com:scylladb/scylla-dev: test_replace_reuse_ip: add data plane load sync_raft_topology_nodes: make replace procedure similar to nodeops one storage_service: topology_coordinator: fix indentation after previous patch storage_service: topology coordinator: drop ring check in node_state::replacing state	2024-04-24 17:09:01 +02:00
Gleb Natapov	4614fedd22	sync_raft_topology_nodes: make replace procedure similar to nodeops one In replace-with-same-ip a new node calls gossiper.start_gossiping from join_token_ring with the 'advertise' parameter set to false. This means that this node will fail echo RPC-s from other nodes, making it appear as not alive to them. The node changes this only in storage_service::join_node_response_handler, when the topology coordinator notifies it that it's actually allowed to join the cluster. The node calls _gossiper.advertise_to_nodes({}), and only from this moment other nodes can see it as alive. The problem is that topology coordinator sends this notification in topology::transition_state::join_group0 state. In this state nodes of the cluster already see the new node as pending, they react with calling tmpr->add_replacing_endpoint and update_topology_change_info when they process the corresponding raft notification in sync_raft_topology_nodes. When the new token_metadata is published, assure_sufficient_live_nodes sees the new node in pending_endpoints. All of this happen before the new node handled successful join notification, so it's not alive yet. Suppose we had a cluster with three nodes and we're replacing on them with a fourth node. For cl=qurum assure_sufficient_live_nodes throws if live < need + pending, which in our case becomes 2 < 2 + 1. The end effect is that during replace-with-same-ip data plane requests can fail with unavailable_exception, breaking availability. The patch makes boot procedure more similar to node ops one. It splits the marking of a node as "being replaced" and adding it to pending set in to different steps and marks it as alive in the middle. So when the node is in topology::transition_state::join_group0 state it marked as "being replaced" which means it will no longer be used for reads and writes. Then, in the next state, new node is marked as alive and is added to pending list. fixes scylladb/scylladb#17421	2024-04-24 16:59:22 +03:00
Gleb Natapov	06e6ed09ed	gossiper: disable status check for endpoints in raft mode Gossiper automatically removes endpoints that do not have tokens in normal state and either do not send gossiper updates or are dead for a long time. We do not need this with topology coordinator mode since in this mode the coordinator is responsible to manage the set of nodes in the cluster. In addition the patch disables quarantined endpoint maintenance in gossiper in raft mode and uses left node list from the topology coordinator to ignore updates for nodes that are no longer part of the topology.	2024-04-21 16:36:07 +03:00
Benny Halevy	239069eae5	storage_service: topology_state_load: set local STATUS state using add_saved_endpoint When loading this node endpoint state and it has tokens in token_metadata, its status can already be set to normal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	6aaa1b0f48	gossiper: add_saved_endpoint: set dc and rack When loading endpoint_state from system.peers, pass the loaded nodes dc/rack info from storage_service::join_token_ring to gossiper::add_saved_endpoint. Load the endpoint DC/RACK information to the endpoint_state, if available so they can propagate to bootstrapping nodes via gossip, even if those nodes are DOWN after a full cluster-restart. Note that this change makes the host_id presence mandatory following https://github.com/scylladb/scylladb/pull/16376. The reason to do so is that the other states: tokens, dc, and rack are useless with the host_id. This change is backward compatible since the HOST_ID application state was written to system.peers since inception in scylla and it would be missing only due to potential exception in older versions that failed to write it. In this case, manual intervention is needed and the correct HOST_ID needs to be manually updated in system.peers. Refs #15787 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	b9e2aa4065	gossiper: add_saved_endpoint: make host_id mandatory Require all callers to provide a valid host_id parameter. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:07:00 +03:00
Benny Halevy	1061455442	gossiper: add load_endpoint_state Pack the topology-related data loaded from system.peers in `gms::load_endpoint_state`, to be used in a following patch for `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:06:56 +03:00
Kamil Braun	cf646022cb	gossiper: report error when waiting too long for endpoint lock In a longevity test reported in scylladb/scylladb#16668 we observed that NORMAL state is not being properly handled for a node that replaced another node. Either handle_state_normal is not being called, or it is but getting stuck in the middle. Which is the case couldn't be determined from the logs, and attempts at creating a local reproducer failed. One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127). Modify the code so it reports an error if `lock_endpoint` waits for the lock for more than a minute. When the issue reproduces again in longevity, we will see if `lock_endpoint` got stuck.	2024-01-11 17:29:25 +01:00
Kamil Braun	6e39c2ffde	gossiper: store source_location instead of string in endpoint_permit The original code extracted only the function_name from the source_location for logging. We'll use more information from the source_location in later commits.	2024-01-10 17:02:52 +01:00
Kefu Chai	7e84e03f52	gms: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. because the removal of `#include "unimplemented.hh"`, `service/migration_manager.cc` misses the definition of `unimplemented::cause::VALIDATION`, so include the header where it is used. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16654	2024-01-05 13:37:08 +02:00
Kamil Braun	f942bf4a1f	Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy Currently, `add_saved_endpoint` is called from two paths: One, is when loading states from system.peers in the join path (join_cluster, join_token_ring), when `_raft_topology_change_enabled` is false, and the other is from `storage_service::topology_state_load` when raft topology changes are enabled. In the later path, from `topology_state_load`, `add_saved_endpoint` is called only if the endpoint_state does not exist yet. However, this is checked without acquiring the endpoint_lock and so it races with the gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint state may already be populated. Since `add_saved_endpoint` applies local information about the endpoint state (e.g. tokens, dc, rack), it uses the local heart_beat_version, with generation=0 to update the endpoint states, and that is incompatible with changes applies via gossip that will carry the endpoint's generation and version, determining the state's update order. This change makes sure that the endpoint state is never update in `add_saved_endpoint` if it has non-zero generation. An internal error exception is thrown if non-zero generation is found, and in the only call site that might reach that state, in `storage_service::topology_state_load`, the caller acquires the endpoint_lock for checking for the existence of the endpoint_state, calling `add_saved_endpoint` under the lock only if the endpoint_state does not exist. Fixes #16429 Closes scylladb/scylladb#16432 * github.com:scylladb/scylladb: gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found storage_service: topology_state_load: lock endpoint for add_saved_endpoint raft_group_registry: move on_alive error injection to gossiper	2024-01-04 14:47:10 +01:00
Benny Halevy	147f30caff	gossiper: mutate_live_and_unreachable_endpoints: make exception safe Change the mutate_live_and_unreachable_endpoints procedure so that the called `func` would mutate a cloned `live_and_unreachable_endpoints` object in place. Those are replicated to temporary copies on all shards using `foreign<unique_ptr<>>` so that the would be automatically freed on exception. Only after all copies are made, they are applied on all gossiper shards in a noexcept loop and finally, a `on_success` function is called to apply further side effects if everything else was replicated successfully. The latter is still susceptible to exceptions, but we can live with those as long as `_live_endpoints` and `_unreachable_endpoints` are synchronized on all shards. With that, the read-only methods: `get_live_members_synchronized` and `get_unreachable_members_synchronized` become trivial and they just return the required data from shard 0. Fixes #15089 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16597	2024-01-03 14:46:10 +02:00
Benny Halevy	ad8a9104d8	endpoint_state subscriptions: batch on_change notification Rather than calling on_change for each particular application_state, pass an endpoint_state::map_type with all changed states, to be processed as a batch. In particular, thise allows storage_service::on_change to update_peer_info once for all changed states. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	1d07a596bf	everywhere: drop before_change subscription None of the subscribers is doing anything before_change. This is done before changing `on_change` in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	5abf556399	gms: endpoint_state: define application_state_map Have a central definition for the map held in the endpoint_state (before changing it to std::unordered_map). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 18:37:34 +02:00
Benny Halevy	3099c5b8ab	storage_service: topology_state_load: lock endpoint for add_saved_endpoint `topology_state_load` currently calls `add_saved_endpoint` only if it finds no endpoint_state_ptr for the endpoint. However, this is done before locking the endpoint and the endpoint state could be inserted concurrently. To prevent that, a permit_id parameter was added to `add_saved_endpoint` allowing the caller to call it while the endpoint is locked. With that, `topology_state_load` locks the endpoint and checks the existence of the endpoint state under the lock, before calling `add_saved_endpoint`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-31 16:48:57 +02:00
Benny Halevy	0bcce35abd	treewide: get rid of now unused fb_utilities Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 16:22:49 +02:00
Benny Halevy	f3e0358563	gossiper: use locator::topology rather than fb_utilities And add `get_endpoint_state_ptr` for this_node. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Benny Halevy	25754f843b	gossiper: add get_this_endpoint_state_ptr Returns this node's endpoint_state_ptr. With this entry point, the caller doesn't need to get_broadcast_address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-12-05 08:42:49 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Botond Dénes	2b11a02b67	Merge 'Improvements to gossiper shadow round' from Kamil Braun Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions. Fix the calculation of `nodes_down` which could count a single node multiple times. Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode). Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node). More details in commit messages. Ref: https://github.com/scylladb/scylladb/issues/15675 Closes scylladb/scylladb#15941 * github.com:scylladb/scylladb: gossiper: do_shadow_round: increment `nodes_down` in case of timeout gossiper: do_shadow_round: fix `nodes_down` calculation storage_service: make shadow round mandatory during bootstrap/replace gossiper: do_shadow_round: remove default value for nodes param gossiper: do_shadow_round: remove `fall_back_to_syn_msg`	2023-11-13 13:37:13 +02:00
Kamil Braun	b03fa87551	storage_service: make shadow round mandatory during bootstrap/replace It is unsafe to bootstrap or perform replace without performing the shadow round, which is used to obtain features from the existing cluster and verify that we support all enabled features. Before this patch, I could easily produce the following scenario: 1. bootstrap first node in the cluster 2. shut it down 3. start bootstrapping second node, pointing to the first as seed 4. the second node skips shadow round because it gets `rpc::closed_error` when trying to connect to first node. 5. the node then passes the feature check (!) and proceeds to the next step, where it waits for nodes to show up in gossiper 6. we now restart the first node, and the second node finishes bootstrap The shadow round must be mandatory during bootstrap/replace, which is what this patch does. On restart it can remain optional as it was until now. In fact it should be completely unnecessary during restart, but since we did it until now (as best-effort), we can keep doing it.	2023-11-06 10:28:07 +01:00
Kamil Braun	7e9e84200c	gossiper: do_shadow_round: remove default value for nodes param	2023-11-06 10:28:07 +01:00
Gleb Natapov	15a34f650d	gossip: remove unused HIBERNATE gossiper status The status is not used since `2ec1f719de` which is included in scylla-4.6.0. We cannot have mixed cluster with the version so old, so the new version should not carry the compatibility burden.	2023-10-31 14:08:38 +02:00
Avi Kivity	2c810e221a	Merge 'Gossiper: replace seastar threads with coroutines' from Benny Halevy Many of the gossiper internal functions currently use seastar threads for historical reasons, but since they are short living, the cost of spawning a seastar thread for them is excessive and they can be simplified and made more efficient using coroutines. Closes #15364 * github.com:scylladb/scylladb: gossiper: reindent do_stop_gossiping gossiper: coroutinize do_stop_gossiping gossiper: reindent assassinate_endpoint gossiper: coroutinize assassinate_endpoint gossiper: coroutinize handle_ack2_msg gossiper: handle_ack_msg: always log warning on exception gossiper: reindent handle_ack_msg gossiper: coroutinize handle_ack_msg gossiper: reindent handle_syn_msg gossiper: coroutinize handle_syn_msg gossiper: message handlers: no need to capture shared_from_this gossiper: add_local_application_state: throw internal error if endpoint state is not found gossiper: coroutinize add_local_application_state	2023-09-12 21:50:52 +03:00
Benny Halevy	8fa65ed016	gossiper: coroutinize do_stop_gossiping Simplify the function. It does not need to spawn a seastar thread. While at it, declare it as private since it's called only internally by the gossiper (and on shard 0). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-12 19:33:09 +03:00
Tomasz Grabiec	6e83e54b0d	Merge 'gossiper: get rid of uses_host_id' from Benny Halevy This function practically returned true from inception. In `d38deef499` it started using messaging_service().knows_version(endpoint) that also returns `true` unconditionally, to this day So there's no point calling it since we can assume that `uses_host_id` is true for all versions. Closes #15343 * github.com:scylladb/scylladb: storage_service: fixup indentation after last patch gossiper: get rid of uses_host_id	2023-09-12 12:44:56 +02:00
Benny Halevy	08f8fd30ea	gossiper: get rid of comment about advertise_removing It was deleted in `66ff072540`. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20230911140349.1809014-1-bhalevy@scylladb.com>	2023-09-11 16:14:26 +02:00
Benny Halevy	f855479c9d	gossiper: get rid of uses_host_id This function practically returned true from inception. In `d38deef499` It started using messaging_service().knows_version(endpoint) that also returns `true` unconditionally, to this day So there's no point calling it since we can assume that `uses_host_id` is true for all versions. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-11 16:48:07 +03:00
Benny Halevy	04ba560b8d	gossiper: get_current* methods: mark as const We need to const_cast `this` since the const container() has no const invoke_on override. Trying to fix this in seastar sharded.hh breaks many other call sites in scylla. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:18:04 +03:00
Benny Halevy	43d883c5aa	gossiper: get_generation_for_nodes: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:17:38 +03:00
Benny Halevy	cfe0ec2203	gossiper: examine_gossiper: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:17:25 +03:00
Benny Halevy	ce05bbe32f	gossiper: request_all, send_all: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:16:19 +03:00
Benny Halevy	cc1d5771e5	gossiper: do_on_*notifications: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:16:10 +03:00
Benny Halevy	963d6fb009	gossiper: compare_endpoint_startup: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:14:22 +03:00
Benny Halevy	2899e07572	gossiper: get_state_for_version_bigger_than: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:13:02 +03:00
Benny Halevy	33f004587e	gossiper: make_random_gossip_digest: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:12:43 +03:00
Benny Halevy	02e8fdc4b8	gossiper: do_sort: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:11:56 +03:00
Benny Halevy	482963b2c4	gossiper: is* methods: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:11:00 +03:00
Benny Halevy	f7eddf0322	gossiper: wait_for_gossip and friends: mark as const Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:09:15 +03:00
Benny Halevy	044a696aca	gossiper: drop unused dump_endpoint_state_map Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:09:04 +03:00
Benny Halevy	083506d479	gossiper: remove unused shadow version members Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-04 16:08:25 +03:00
Benny Halevy	98fd9fcc11	gossiper: move endpoint_state by value to apply it Save a copy of the applied endpoint state by moving the value towards replicate. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	c16ec870da	gms: pass endpoint_state_ptr to endpoint_state change subscribers Now that the endpoint_state isn't change in place we do not need to copy it to each subscriber. We can rather just pass the lw_shared_ptr holding a snapshot of it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	1d04242a90	gossiper: modify endpoint state only via replicate And restrict the accessor methods to return const pointers or refrences. With that, the endpoint_state_ptr:s held in the _endpoint_state_map point to immutable endpoint_state objects - with one exception: the endpoint_state update_timestamp may be updated in place, but the endpoint_state_map is immutable. replicate() replaces the endpoint_state_ptr in the map with a new one to maintain immutability. A later change will also make this exception safe so replicate will guarantee strong exception safety so that all shards are updated or none of them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:35:15 +03:00
Benny Halevy	d00e49a1bb	gossiper: keep and serve shared endpoint_state_ptr in map This commit changes the interface to using endpoint_state_ptr = lw_shared_ptr<const endpoint_state> so that users can get a snapshot of the endpoint_state that they must not modify in-place anyhow. While internally, gossiper still has the legacy helpers to manage the endpoint_state. Fixes scylladb/scylladb#14799 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:34:36 +03:00
Benny Halevy	f33a6d37f2	gossiper: get_max_endpoint_state_version: get state by reference No need to copy the endpoint_state since the function is synchronous. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 09:33:42 +03:00
Benny Halevy	4f5ffc7719	gossiper: add for_each_endpoint_state helpers Before changing _endpoint_state_map to hold a lw_shared_ptr<endpoint_state>, provide synchronous helpers for users to traverse all endpoint_states with no need to copy them (as long as the called func does not yield). With that, gossiper::get_endpoint_states() can be made private. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-08-31 08:32:31 +03:00

1 2 3 4 5 ...

365 Commits