scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Pavel Emelyanov	54edb44b20	code: Stop using seastar::compat::source_location And switch to std::source_location. Upcoming seastar update will deprecate its compatibility layer. The patch is for f in $(git grep -l 'seastar::compat::source_location'); do sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f; done and removal of few header includes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#27309	2025-11-27 19:10:11 +02:00
Patryk Jędrzejczak	ba5b5c7d2f	gossip: add recovery_leader to gossip_digest_syn In the new Raft-based recovery procedure, live nodes join the new group 0 one by one during a rolling restart. There is a time window when some of them are in the old group 0, while others are in the new group 0. This causes a group 0 mismatch in `gossiper::handle_syn_msg`. The current solution for this problem is to ignore group 0 mismatches if `recovery_leader` is set on the local node and to ask the administrator to perform the rolling restart in the following way: - set `recovery_leader` in `scylla.yaml` on all live nodes, - send the `SIGHUP` signal to all Scylla processes to reload the config, - proceed with the rolling restart. This commit makes `gossiper::handle_syn_msg` ignore group 0 mismatches when exactly one of the two gossiping nodes has `recovery_leader` set. We achieve this by adding `recovery_leader` to `gossip_digest_syn`. This change makes setting `recovery_leader` earlier on all nodes and reloading the config unnecessary. From now on, the administrator can simply restart each node with `recovery_leader` set. However, note that nodes that join group 0 must have `recovery_leader` set until all nodes join the new group 0. For example, assume that we are in the middle of the rolling restart and one of the nodes in the new group 0 crashes. It must be restarted with `recovery_leader` set, or else it would reject `gossip_digest_syn` messages from nodes in the old group 0. To avoid problems in such cases, we will continue to recommend setting `recovery_leader` in `scylla.yaml` instead of passing it as a command line argument.	2025-07-23 15:36:57 +02:00
Patryk Jędrzejczak	445a15ff45	db/config, gms/gossiper: change recovery_leader to UUID We change the type of the `recovery_leader` config parameter and `gossip_config::recovery_leader` from sstring to UUID. `recovery_leader` is supposed to store host ID, so UUID is a natural choice. After changing the type to UUID, if the user provides an incorrect UUID, parsing `recovery_leader` will fail early, but the start-up will continue. Outside the recovery procedure, `recovery_leader` will then be ignored. In the recovery procedure, the start-up will fail on: ``` throw std::runtime_error( "Cannot start - Raft-based topology has been enabled but persistent group 0 ID is not present. " "If you are trying to run the Raft-based recovery procedure, you must set recovery_leader."); ```	2025-07-23 15:36:56 +02:00
Benny Halevy	fa1c3e86a9	gossiper: add send_echo helper CAll send_gossip_echo using a centralized helper. A following patch will make it abortable. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-30 11:45:51 +03:00
Benny Halevy	cecfb6dfd7	gms: gossiper: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:28:48 +03:00
Gleb Natapov	8d534ee68e	gossiper: change make_random_gossip_digest to return value instead of modifying passed parameter	2025-04-06 18:39:24 +03:00
Gleb Natapov	6f53611337	gossiper: move force_remove_endpoint to work on host id Since the gossiper works on host ids now it is incorrect to leave this function to work on ip. It makes it impossible to delete outdated entry since the "gossiper.get_host_id(endpoint) != id" check will always be false for such entries (get_host_id() always returns most up -to-date mapping.	2025-04-06 18:39:24 +03:00
Gleb Natapov	3abe5de8bf	gossiper: make examine_gossiper private	2025-03-31 16:50:50 +03:00
Gleb Natapov	afdfde8300	gossiper: rename get_nodes_with_host_id to get_node_ip Also change it to return std::optional instead of std::set since now there can be only on ip mapped to an id.	2025-03-31 16:50:50 +03:00
Gleb Natapov	28fb84117d	treewide: drop id parameter from gossiper::for_each_endpoint_state We have it in endpoint_state anyway, so no need to pass both.	2025-03-31 16:50:50 +03:00
Gleb Natapov	4609bbbbb2	treewide: move gossiper to index nodes by host id This patch changes gossiper to index nodes by host ids instead of ips. The main data structure that changes is _endpoint_state_map, but this results in a lot of changes since everything that uses the map directly or indirectly has to be changed. The big victim of this outside of the gossiper itself is topology over gossiper code. It works on IPs and assumes the gossiper does the same and both need to be changed together. Changes to other subsystems are much smaller since they already mostly work on host ids anyway.	2025-03-31 16:50:50 +03:00
Gleb Natapov	19ac05b0ba	gossiper: drop ip from replicate function parameters We have it in endpoint_state now, so no need to pass both.	2025-03-31 16:50:50 +03:00
Gleb Natapov	c5b8429bec	gossiper: drop ip from apply_new_states parameters We have it in endpoint_state now, so no need to pass both.	2025-03-31 16:50:50 +03:00
Gleb Natapov	6da5f541a2	gossiper: drop address from handle_major_state_change parameter list We have it in endpoint_state now, so no need to pass both.	2025-03-31 16:50:50 +03:00
Gleb Natapov	704580b197	gossiper: add try_get_host_id function The function returns unengaged std::optional if id is not found instead of throwing like get_host_id does.	2025-03-31 16:50:45 +03:00
Gleb Natapov	e5cc3b75f8	gossiper: drop template from wait_alive_helper function Move ip to id translation to the caller.	2025-03-31 15:42:07 +03:00
Gleb Natapov	0dd86b4f1d	gossiper: move get_supported_features and its users to host id	2025-03-31 15:42:07 +03:00
Gleb Natapov	a581a99dbf	gossiper: move _pending_mark_alive_endpoints to host id Index _pending_mark_alive_endpoints map by host id instead of ip	2025-03-31 15:25:39 +03:00
Patryk Jędrzejczak	9970c1fcc3	gossip: allow group 0 ID mismatch in the Raft-based recovery procedure This patch ensures that members of the new group 0 can gossip with members of the old group 0 during rolling restart in the Raft-based recovery procedure. Without this change, restarted nodes (members of the new group 0) wouldn't be marked as UP by other nodes (members of the old group 0), which would decrease availability.	2025-03-14 13:53:05 +01:00
Gleb Natapov	cca228265e	gossiper: move _expire_time_endpoint_map to host_id Index _expire_time_endpoint_map map by host id instead of ip	2025-03-11 12:09:22 +02:00
Gleb Natapov	c45b50bbe6	gossiper: move _just_removed_endpoints to host id Index _just_removed_endpoints map by host id instead of ip	2025-03-11 12:09:22 +02:00
Gleb Natapov	22739bb39a	gossiper: drop unused get_msg_addr function	2025-03-11 12:09:22 +02:00
Gleb Natapov	499eb4d17f	treewide: pass host id to endpoint state change subscribers	2025-03-11 12:09:22 +02:00
Gleb Natapov	eb59205caf	gossiper: drop deprecated unsafe_assassinate_endpoint operation It was always deprecated.	2025-03-11 12:09:21 +02:00
Gleb Natapov	7dcffda6bd	gossiper: drop ip address from handle_echo_msg and simplify code since host_id is now mandatory	2025-03-11 12:09:21 +02:00
Gleb Natapov	8425c26462	gossiper: start using host ids to send messages earlier Send digest ack and ack2 by host ids as well now since the id->ip mapping is available after receiving digest syn. It allows to convert more code to host id here.	2025-03-11 12:09:21 +02:00
Gleb Natapov	0e3dcb7954	treewide: move everyone to use host id based gossiper::is_alive and drop ip based one	2025-03-11 12:09:21 +02:00
Gleb Natapov	e47f251178	gossiper: move _live_endpoints and _unreachable_endpoints endpoint to host_id Index live and dead endpoints by host id. It also allows to simplify some code that does a translation.	2025-03-11 12:09:21 +02:00
Gleb Natapov	f1a82c1d01	gossiper: drop unused get_endpoint_states function	2025-03-11 12:09:20 +02:00
Gleb Natapov	c4a0fbae16	gossiper: check id match inside force_remove_endpoint Before calling force_remove_endpoint (which works on ip) the code checks that the ip maps to the correct id (not not remove a new node that inherited this ip by mistake). Move the check to the function itself.	2025-03-11 12:09:20 +02:00
Gleb Natapov	4420ddaf86	gossiper: move is_gossip_only_member and its users to work on host id	2025-03-11 12:09:20 +02:00
Gleb Natapov	6952f62869	gossiper: drop unused field from loaded_endpoint_state	2025-03-11 12:09:20 +02:00
Gleb Natapov	0ec9f7de64	gossiper: drop get_unreachable_token_owners functions It is used by truncate code only and even there it only check if the returned set is not empty. Check for dead token owners in the truncation code directly.	2025-01-16 16:37:07 +02:00
Gleb Natapov	36ccc897e8	gossiper: change get_live_members and all its users to work on host ids	2025-01-16 16:37:06 +02:00
Gleb Natapov	b3f8b579c0	gossiper: add get_endpoint_state_ptr() function that works on host id Will be used later to simplify code.	2025-01-15 16:30:29 +02:00
Kamil Braun	91cddcc17f	Merge 'Do not reset quarantine list in non raft mode' from Gleb Natapov The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue. Fixes: scylladb/scylladb#21930 Closes scylladb/scylladb#21956 * github.com:scylladb/scylladb: gossiper: do not reset _just_removed_endpoints in non raft mode gossiper: do not send echo message to yourself gossiper: do not call apply for the node's old state	2024-12-19 11:03:35 +01:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Gleb Natapov	e318dfb83a	gossiper: do not reset _just_removed_endpoints in non raft mode By the time the function is called during start it may already be populated. Fixes: scylladb/scylladb#21930	2024-12-17 16:57:13 +02:00
Gleb Natapov	c2e3d875ab	gossiper: change get_unreachable_nodes to host ids	2024-12-15 11:31:11 +02:00
Gleb Natapov	03c8ffa45c	storage_service: move node_ops code to use host ids instead of host ips	2024-12-15 11:31:11 +02:00
Gleb Natapov	92815684df	gossiper: add get_unreachable_host_ids() function Will be needed later.	2024-12-15 11:31:10 +02:00
Gleb Natapov	18a9de51e7	gossiper: add get_application_state_ptr that searches by host_id	2024-12-02 10:31:13 +02:00
Gleb Natapov	7d751709e3	gossiper: change get_live_token_owners to return host ids Also amend the only user and drop the ip to id translation.	2024-12-02 10:31:13 +02:00
Gleb Natapov	a64b079b5c	gossiper: drop advertise_myself parameter to gossiper The parameter was needed when nodes were addressed by IP, so during replace with the same IP a new node had to "hide" itself from the cluster to not get accidentally confused with the old node. Now, when nodes are addressed by host id the situation is impossible.	2024-12-02 10:31:11 +02:00
Gleb Natapov	e7f869591d	gossiper: add address map getters	2024-12-02 10:31:11 +02:00
Gleb Natapov	15145c16d1	gossiper: provide wait_alive that works on host ids We have wait_alive function that gets an array of ip address and wait for all of them to be alive. Provide similar one that works on host ids.	2024-12-02 10:31:10 +02:00
Gleb Natapov	84c7aa8f48	gossiper: send up notifications by host ids	2024-12-02 10:31:10 +02:00
Gleb Natapov	aa87fecce2	gossiper: add is_alive that works on host_id The function checks if a node with provided id is alive. If it fails to map id to ip or there is no state for the ip found the node is considered to be dead.	2024-12-01 12:12:30 +02:00
Gleb Natapov	0e264ccba9	gossiper: populate gossip_address_map Add a non expiring entry into the address map for each host in the gossiper state and change one to expiring when the state is deleted.	2024-12-01 12:12:30 +02:00
Gleb Natapov	ca2544e57e	gossiper: introduce gossip address map Introduce new address map that will be populated by the gossiper. Create in during initialization and pass it to the gossiper.	2024-12-01 12:12:29 +02:00

1 2 3 4 5 ...

435 Commits