scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-21 17:10:35 +00:00

Author	SHA1	Message	Date
Kamil Braun	a83e04279e	raft: fsm: remove constructor used only in tests This constructor does not provide persisted commit index. It was only used in tests, so move it there, to the helper `fsm_debug` which inherits from `fsm`. Test cases which used `fsm` directly instead of `fsm_debug` were modified to use `fsm_debug` so they can access the constructor. `fsm_debug` doesn't change the behavior of `fsm`, only adds some helper members. This will be useful in following commits too.	2024-01-18 18:07:17 +01:00
Kefu Chai	fa3129fa29	treewide: use unsigned variable to compare with unsigned some times we initialize a loop variable like auto i = 0; or int i = 0; but since the type of `0` is `int`, what we get is a variable of `int` type, but later we compare it with an unsigned number, if we compile the source code with `-Werror=sign-compare` option, the compiler would warn at seeing this. in general, this is a false alarm, as we are not likely to have a wrong comparison result here. but in order to prevent issues due to the integer promotion for comparison in other places. and to prepare for enabling `-Werror=sign-compare`. let's use unsigned to silence this warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 10:27:18 +08:00
Kamil Braun	daf9c53bb8	raft: split `can_vote` field from `server_address` to separate struct Whether a server can vote in a Raft configuration is not part of the address. `server_address` was used in many context where `can_vote` is irrelevant. Split the struct: `server_address` now contains only `id` and `server_info` as it did before `can_vote` was introduced. Instead we have a `config_member` struct that contains a `server_address` and the `can_vote` field. Also remove an "unsafe" constructor from `server_address` where `id` was provided but `server_info` was not. The constructor was used for tests where `server_info` is irrelevant, but it's important not to forget about the info in production code. The constructor was used for two purposes: - Invoking set operations such as `contains`. To solve this we use C++20 transparent hash and comparator functions, which allow invoking `contains` and similar functions by providing a different key type (in this case `raft::server_id` in set of addresses, for example). - constructing addresses without `info`s in tests. For this we provide helper functions in the test helpers module and use them.	2022-07-18 18:22:10 +02:00
Gleb Natapov	108e7fcc4e	raft: enter candidate state immediately when starting a singleton cluster When a node starts it does not immediately becomes a candidate since it waits to learn about already existing leader and randomize the time it becomes a candidate to prevent dueling candidates if several nodes are started simultaneously. If a cluster consist of only one node there is no point in waiting before becoming a candidate though because two cases above cannot happen. This patch checks that the node belongs to a singleton cluster where the node itself is the only voting member and becomes candidate immediately. This reduces the starting time of a single node cluster which are often used in testing. Message-Id: <YiCbQXx8LPlRQssC@scylladb.com>	2022-03-04 20:30:52 +01:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Avi Kivity	11cc772388	test: raft: avoid ignored variable errors Avoid instantiating unused variables, and in one case ignore it, to avoid a gcc warning.	2021-10-10 18:17:53 +03:00
Avi Kivity	9907303bf5	test: adjust signed/unsigned comparisons in loops and boost tests gcc complains about comparing a signed loop induction variable with an unsigned limit, or comparing an expected value and measured value. Fix by using unsigned throughout, except in one case where the signed value was needed for the data_value constructor.	2021-10-10 18:16:50 +03:00
Kamil Braun	bf823e34a4	raft: disable sticky leadership rule The Raft PhD presents the following scenario. When we remove a server from the cluster configuration, it does not receive the configuration entry which removes it (because the leader appending this entry uses that entry's configuration to decide to which servers to send the entry to, and the entry does not contain the removed server). Therefore the server keeps believing it is a member but does not receive heartbeats from leaders in the new configuration. Therefore it will keep becoming a candidate, causing existing leaders to step down, harming availability. With many such candidates the cluster may even stop being able to proceed at all. We call such servers "disruptive". More concretely, consider the following example, adapted from the PhD for joint configuration changes (the original PhD considered a different algorithm which can only add/remove one server at once): Let C_old = {A, B, C, D}, C_new = {B, C, D}, and C_joint be the joint configuration (C_old, C_new). D is the leader. D managed to append C_joint to every server and commit it. D appends C_new. At this point, D stops sending heartbeats to A because C_new does not contain A, but A's last entry is still C_joint, so it still has the ability to become a candidate. A can now become a candidate and cause D, or any other leader in C_new, to step down. Even if D manages to commit C_new, A can keep disrupting the cluster until it is shut down. Prevoting changes the situation, which the authors admit. The "even if" above no longer applies: if D manages to commit C_new, or just append it to a majority of C_new, then A won't be able to succeed in the prevote phase because a majority of servers in C_new has a longer log than A (and A must obtain a prevote from a majority of servers in C_new because A is in C_joint which contains C_new). But the authors continue to argue that disruptions can still occur during the small period where C_new is only appended on D but not yet on a majority of C_new. As they say: "we also did not want to assume that a leader will reliably replicate entries fast enough to move past the scenario (...) quickly; that might have worked in practice, but it depends on stronger assumptions that we prefer to avoid about the performance (...) of replicating log entries". One could probably try debunking this by saying that if entries take longer to replicate than the election timeout we're in much bigger trouble, but nevermind. In any case, the authors propose a solution which we call "sticky leadership". A server will not grant a vote to a candidate if it has recently received a heartbeat from the currently known leader, even if the candidate's term is higher. In the above example, servers in C_new would not grant votes to A as long as D keeps sending them heartbeats, thus A is no longer disruptive. In our case the situation is a bit different: in original Raft, "heartbeats" have a very specific meaning - they are append_entries requests (possibly empty) sent by leaders. Thus if a node stops being a leader it stops sending heartbeats; similarly, if a node leaves the configuration, it stops receiving heartbeats from others still in the configuration. We instead use a "shared failure detector" interface, where nodes may still consider other nodes alive regardless of their configuration/leadership situation, as part of the general "MultiRaft" framework. This pretty much invalidates the original argument, as seen on the above example: A will still consider D alive, thus it won't become a candidate. Shared failure detector combined with sticky leadership actually makes the situation worse - it may cause cluster unavailability in certain scenarios (fortunately not a permanent one, it can be solved with server restarts, for example). Randomized nemesis testing with reconfigurations found the following scenario: Let C1 = {A, B, C}, C2 = {A}, C3 = {B, C}. We start from configuration C1, B is the leader. B commits joint (C1, C2), then new C2 configuration. Note that C does not learn about the last entry (since it's not part of C2) but it keeps believing that B is alive, so it keeps believing that B is the leader. We then partition {A} from {B, C}. A appends (C2, C3) joint configuration to its log. It's not able to append it to B or C due to the partition. The partition holds long enough for A to revert to candidate state (or we may restart A at this point). Eventually the partition resolves. The only node which can become a candidate now is A: C does not become a candidate because it keeps believeing that B is the leader, and B does not become a candidate because it saw the C2 non-joint entry being committed. However, A won't become a leader because C won't grant it a vote due to the sticky leadership rule. The cluster will remain unavailable until e.g. C is restarted. Note that this scenario requires allowing configuration changes which remove and then readd the same servers to the configuration. One may wonder if such reconfigurations should be allowed, but there doesn't seem to be any example of them breaking safety of Raft (and the PhD doesn't seem to mention them at all; perhaps it implicitly accepts them). It is unknown whether a similar scenario may be produced without such reconfigurations. In any case, disabling sticky leadership resolves the problem, and it is the last currently known availability problem found in randomized nemesis testing. There is no reason to keep this extension, both because the original Raft authors' argument does not apply for shared failure detector, and because one may even argue with the authors in vanilla Raft given that prevoting is enabled (see end of third paragraph of this commit message). Message-Id: <20210921153741.65084-1-kbraun@scylladb.com>	2021-09-26 11:09:01 +03:00
Gleb Natapov	ce40b01b07	raft: rename snapshot into snapshot_descriptor The snapshot structure does not contain the snapshot itself but only refers to it trough its id. Rename it to snapshot_descriptor for clarity.	2021-08-29 12:53:03 +03:00
Gleb Natapov	5e1d589872	raft: do not wait for entry to become stable before replicate it Since io_fiber persist entries before sending out messages even non stable entries will become stable before observed by other nodes. This patch also moves generation of append messages into get_outptut() call because without the change we will lose batching since each advance of last_idx will generate new append message.	2021-08-29 12:48:15 +03:00
Alejo Sanchez	a6cd35c512	raft: testing: refactor helper Move definitions to helper object file. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-08-23 17:50:16 +02:00
Gleb Natapov	4764028cb3	raft: Remove leader_id from append_request The filed is not used anywhere. Message-Id: <YP0khmjK2JSp77AG@scylladb.com>	2021-07-28 20:30:07 +02:00
Gleb Natapov	09528b8671	raft: test: test leadership transfer timeout Test that if leadership transfer cannot be done in configured time frame fsm cancels the leadership transfer process. Also check that timeout_now message is resent on each tick while leadership transfer is in progress.	2021-06-22 14:42:50 +03:00
Pavel Solodovnikov	e9258f43cd	raft: etcd_test: test_transfer_non_member Test that a node outside configuration, that receives `timeout_now` message, doesn't disrupt operation of the rest of the cluster. That is, `timeout_now` has no effect and the outsider stays in the follower state without promoting to the candidate. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:21 +03:00
Pavel Solodovnikov	2b6d73de98	raft: etcd_test: test_leader_transfer_ignore_proposal Test that a leader which has entered leader stepdown mode rejects new append requests. Tests: unit(dev, debug) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-06-15 19:44:21 +03:00
Konstantin Osipov	d42d5aee8c	raft: (internal) simplify construction of tagged_id Make it easy to construct tagged_id from UUID.	2021-06-08 14:52:32 +03:00
Pavel Solodovnikov	0389001496	raft: avoid 'using' statements in raft testing helpers header It is generally considered a bad practice to use the `using` directives at global scope in header files. Also, many parts of `test/raft/helpers.hh` were already using `raft::` prefixes explicitly, so definitely not much to lose there. Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-05-17 13:36:09 +03:00
Gleb Natapov	745f63991f	raft: test: fix c&p error in a test Message-Id: <YJKBOwBX8hqHLxsB@scylladb.com>	2021-05-05 17:18:49 +02:00
Alejo Sanchez	ace0ee514f	raft: etcd unit tests: test proposal handling scenarios TestProposal For multiple scenarios, check proposal handling. Note, instead of expecting an explicit result for each specified case, the test automatically checks for expected behavior when quorum is reached or not. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	77163ea76a	raft: etcd unit tests: test old messages ignored TestOldMessages Checks an append request from a leader from a previous term is ignored. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	bf65b19803	raft: etcd unit tests: test single node precandidate TestSingleNodePreCandidate Checks a single node configuration with precandidate on works to automatically elect the node. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	de7051467b	raft: etcd unit tests: test dueling precandidates TestDuelingPreCandidates In a configuration of 3 nodes, two nodes don't see each other and they compete for leadership. Loser (3) should revert to follower when prevote is rejected and revert to term 1. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	aa7d23f86b	raft: etcd unit tests: test dueling candidates TestDuelingCandidates In a configuration of 3 nodes, two nodes don't see each other and they compete for leadership. Once reconnected, loser should not disrupt. But note it will remain candidate with current algorithm without prevoting and other fsms will not bump term. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	1eac94e7d6	raft: etcd unit tests: test cannot commit without new term TestCannotCommitWithoutNewTermEntry tests the entries cannot be committed when leader changes, no new proposal comes in and ChangeTerm proposal is filtered. NOTE: this doesn't check committed but it's implicit for next round; this could also use communicate() providing committed output map Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	b421fe3605	raft: etcd unit tests: test single node commit Port etcd TestSingleNodeCommit In a single node configuration elect the node, add 2 entries and check number of committed entries. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	9b4538476b	raft: etcd unit tests: update test_leader_election_overwrite_newer_logs Make test_leader_election_overwrite_newer_logs use newer communicate() and other new helpers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	368eec1190	raft: etcd unit tests: fix test_progress_leader Make implementation follow closer to original test. Use newer boost test helpers. NOTE: in etcd it seems a leader's self progress is in PIPELINE state. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:28 -04:00
Konstantin Osipov	b18599c630	raft: (testing) introduce testing utilities Add a discrete_failure_detector, to be able to mark a single server dead.	2021-03-24 14:04:18 +03:00
Pavel Solodovnikov	93c565a1bf	raft: allow raft server to start with initial term 0 Prior to the fix there was an assert to check in `raft::server_impl::start` that the initial term is not 0. This restriction is completely artificial and can be lifted without any problems, which will be described below. The only place that is dependent on this corner case is in `server_impl::io_fiber`. Whenever term or vote has changed, they will be both set in `fsm::get_output`. `io_fiber` checks whether it needs to persist term and vote by validating that the term field is set (by actually executing a `term != 0` condition). This particular check is based on an unobvious fact that the term will never be 0 in case `fsm::get_output` saves term and vote values, indicating that they need to be persisted. Vote and term can change independently of each other, so that checking only for term obscures what is happening and why even more. In either case term will never be 0, because: 1. If the term has changed, then it's naturally greater than 0, since it's a monotonically increasing value. 2. If the vote has changed, it means that we received a vote request message. In such case we have already updated our term to the requester's term. Switch to using an explicit optional in `fsm_output` so that a reader don't have to think about the motivation behind this `if` and just checks that `term_and_vote` optional is engaged. Given the motivation described above, the corresponding assert(_fsm->get_current_term() != term_t(0)); in `server_impl::start` is removed. Tests: unit(dev) Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>	2021-03-17 09:59:21 +02:00
Alejo Sanchez	88063b6e3e	raft: tests: move common helpers to header Move common test helper functions and data structures to a common helpers.hh header. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-15 06:16:58 -04:00
Alejo Sanchez	6139ad6337	raft: tests: move boost tests to tests/raft Move raft boost tests to test/raft directory. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-15 06:16:58 -04:00

31 Commits