scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 11:00:35 +00:00

Author	SHA1	Message	Date
Gleb Natapov	9d6bf7f351	raft: introduce leader stepdown procedure Section 3.10 of the PhD describes two cases for which the extension can be helpful: 1. Sometimes the leader must step down. For example, it may need to reboot for maintenance, or it may be removed from the cluster. When it steps down, the cluster will be idle for an election timeout until another server times out and wins an election. This brief unavailability can be avoided by having the leader transfer its leadership to another server before it steps down. 2. In some cases, one or more servers may be more suitable to lead the cluster than others. For example, a server with high load would not make a good leader, or in a WAN deployment, servers in a primary datacenter may be preferred in order to minimize the latency between clients and the leader. Other consensus algorithms may be able to accommodate these preferences during leader election, but Raft needs a server with a sufficiently up-to-date log to become leader, which might not be the most preferred one. Instead, a leader in Raft can periodically check to see whether one of its available followers would be more suitable, and if so, transfer its leadership to that server. (If only human leaders were so graceful.) The patch here implements the extension and employs it automatically when a leader removes itself from a cluster.	2021-03-22 10:28:43 +02:00
Konstantin Osipov	cb3314d756	raft: set follower's next_idx when switching to SNAPSHOT mode Set follower's next_idx to snapshot index + 1 when switching it to snapshot mode. If snapshot transfer succeeds, that's the best match for the follower's next replication index. If it fails, the leader will send a new probe to find out the follower position again and re-try sending a possibly newer snapshot. The change helps reduce protocol state managed outside FSM.	2021-03-18 16:35:11 +03:00
Gleb Natapov	dd6ba3d507	raft: add non-voting member support This patch adds a support for non-voting members. Non voting member is a member which vote is not counted for leader election purposes and commit index calculation purposes and it cannot become a leader. But otherwise it is a normal raft node. The state is needed to let new nodes to catch up their log without disturbing a cluster. All kind of transitions are allowed. A node may be added as a voting member directly or it may be added as non-voting and then changed to be voting one through additional configuration change. A node can be demoted from voting to non-voting member through a configuration change as well. Message-Id: <20210304101158.1237480-2-gleb@scylladb.com>	2021-03-09 13:47:48 +01:00
Konstantin Osipov	e49d5f89a5	raft: do not account for the same vote twice While a duplicate vote from the same server is not possible by a conforming Raft implementation, Raft assumptions on network permit duplicates. So, in theory, it is possible that a vote message is delivered multiple times. The current voting implementation does reject votes from non-members, but doesn't check for duplicate votes. Keep track of who has voted yet, and reject duplicate votes. A unit test follows.	2021-02-18 16:04:44 +03:00
Konstantin Osipov	7ea064ac04	raft: remove fsm::set_configuration() Set either tracker or votes configuration explicitly. This saves a few lines and simplifies unit tests.	2021-02-18 16:04:44 +03:00
Konstantin Osipov	c4552ffb9a	raft: add ostream serialization for enum vote_result	2021-02-18 16:04:44 +03:00
Konstantin Osipov	04b4d97d6a	raft: rename progress.hh to tracker.hh class tracker is the main class of this module.	2021-02-18 16:04:43 +03:00

7 Commits