scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-01 04:26:48 +00:00

Author	SHA1	Message	Date
Gleb Natapov	3a1bff26dd	raft: test: add test of a leadership change during ongoing snapshot transfer	2021-05-06 11:34:31 +03:00
Gleb Natapov	612e0f08c4	raft: test: retry submitting an entry if it was dropped	2021-05-06 11:34:31 +03:00
Gleb Natapov	0b2c9c549a	raft: test: wait for the log to be fully replicated on new leader only When forcing new leader it should be enough to wait for log to be fully replicated to that particular leader.	2021-05-06 11:34:31 +03:00
Gleb Natapov	6abe2772dc	raft: make snapshot transfer abortable A snapshot transfer may take a lot of time and meanwhile a leader doing it may lose the leadership. If that happens the ongoing snapshot transfer becomes obsolete since the snapshot will be rejected by the receiving node as coming from an old leader. Make snapshot transfer abortable and abort them when leader changes.	2021-05-06 11:34:31 +03:00
Gleb Natapov	d0ebd79deb	raft: test: return error from rpc module if nodes are disconnected Returning an error when nodes are disconnected closer resembles what will happen in real networking.	2021-05-06 11:34:31 +03:00
Gleb Natapov	745f63991f	raft: test: fix c&p error in a test Message-Id: <YJKBOwBX8hqHLxsB@scylladb.com>	2021-05-05 17:18:49 +02:00
Alejo Sanchez	27ad2a0f28	raft: replication test: remove obsolete helper As we are now serially adding commands with consecutive integers there is no need to build vectors of commands. Remove helper. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-04 11:01:07 -04:00
Alejo Sanchez	0a54fd848b	raft: replication test: add_entry with retries The current leader might have stepped down. Try again and learn if there's a new leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-04 11:00:46 -04:00
Alejo Sanchez	56e977ae69	raft: replication test: support config change Add support for configuration change on leader. Keep track of servers in config in test. Add a dummy entry to confirm configuration changed. If the add fails, because the old leader was not in the new config and stepped down, the config is considered changed, too. Add a test with some configuration changes. Add a test cycling every scenario for 1 of 4 nodes removed. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	8d8af92cbb	raft: replication test: add dummy command support Use a special value as dummy entry to be ignored when seen in state machine input. Ignore dummy entries for count. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	4aa52be7e5	raft: replication test: test both with and without prevote Before this change the default was prevote enabled. With this change each test is run with and without prevote. This duplicates the number of test cases. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	e759e492c7	raft: replication test: make initial leader just default The test suite requires an initial leader and at the moment it's always just 0. Make it default and simplify code. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	eb5bbcdec7	raft: replication test: create command helper Factor out repeated code and make it available for other uses. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	eb94dd26dc	raft: replication test: free elections as helper Add a helper to run free elections and use it in partitioning. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	cb297a57df	raft: replication test: fix election connectivity If a leader was already disconnected the election of a new leader could re-connect. Save original connectivity and restore it when done electing new leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	0a5c605713	raft: replication test: fix custom election Use the new specific connectivity to manage old leader disconnection more specifically. This fixes having elections where the vote of the old leader is required for quorum. For example {A,B} and we want to switch leader. For B to become candidate it has to see A as down. Then A has to see B's request for vote, and vote for A. So to make the general case old leader needs to be first disconnected from all nodes, make the desired node candidate, then have the old leader connected only to the desired candidate (else, other nodes would see the new candidate as disrupting a live leader). Also, there might be stray messages from the former leader. These could revert the candidate to follower. To handle this this patch retries the process until the desired node becomes leader. The helper function elect_me_leader() is split and renamed to wait_until_candidate() and wait_election_done(). The former ticks until the node is a candidate and the later waits until a candidate either becomes a leader or reverts to follower The existing etcd test workaround of incrementing from n=2 to n=3 nodes is corrected back to original n=2. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	9909983e38	raft: replication test: add helpers for threshold and election Add 2 helper functions for making nodes reach timeout threshold and to elect a specific node. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	38526d7a2f	raft: replication test: connectivity improvement Replace simple full disconnect of a node with specific from -> to disconnection tracking. This will help electing new leaders. Say there are {A,B,C} with A leader and we want to elect B. Before this patch, we would disconnect A, run an election with just {B,C}, and then re-connect A. If we have {A,B} and want to elect B, this won't work as B needs 2/2+1 votes and A is disconnected. Even if we made A stepped down. This patch corrects this shortcoming. (@gleb-cloudius) With this patch, we can specify other followers (not the previous or next leader) to not see the old leader, but the new and old leaders see each other just fine. In the example {A,B,C} above we can cut A<->B specifcally. Also, this is closer to etcd testing and should help porting cases. NOTE: in the current test implementation failure_detector reports node.is_alive(other_node) if there is a connection both ways. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	f53dea432c	raft: replication test: helper for server_address A helper function to convert from local 0-based id to raft 1-based server_address. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	294e16cf8b	raft: replication test: use wait_log() Use wait_log() helper in leftover election code. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	355c8a052f	raft: replication test: cycle leader more For ported etcd test cycle leader, cycle some more. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	5b2c9a6c94	raft: replication test: fix a test description Fix replace_log_leaders_log_empty description comment. Reported by @kbraun Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	bbb56e2265	raft: replication test: remove multiple state machines Checksum was removed so undo support for multiple versions added in: test: add support for different state machines `43dc5e7dc2` NOTE: as there is a test with custom total_values, expected value cannot be static const anymore. (line 630) Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	e77af8573b	raft: replication test: remove checksum Previously, entries were added in parallel and we needed to check if order was broken. Using a simple checksum was better than a hash as you could easily find the position it broke (we add consecutive numbers). Now order of entries is forced so it's not useful. This patch removes it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Alejo Sanchez	9335941b49	raft: replication test: remove unused class param persisted_snapshots is not used in state_machine class. Remove it. Reported by @kbraun Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-05-03 07:53:35 -04:00
Kamil Braun	4c95277619	raft: fsm: fix assertion failure on stray rejects When probes are sent over a slow network, the leader would send multiple probes to a lagging follower before it would get a reject response to the first probe back. After getting a reject, the leader will be able to correctly position `next_idx` for that follower and switch to pipeline mode. Then, an out of order reject to a now irrelevant probe could crash the leader, since it would effectively request it to "rewind" its `match_idx` for that follower, and the code asserts this never happens. We fix the problem by strengthening `is_stray_reject`. The check that was previously only made in `PIPELINE` case (`rejected.non_matching_idx <= match_idx`) is now always performed and we add a new check: `rejected.last_idx < match_idx`. We also strengthen the assert. The commit improves the documentation by explaining that `is_stray_reject` may return false negatives. We also precisely state the preconditions and postconditions of `is_stray_reject`, give a more precise definition of `progress.match_idx`, argue how the postconditions of `is_stray_reject` follow from its preconditions and Raft invariants, and argue why the (strengthened) assert must always pass. Message-Id: <20210423173117.32939-1-kbraun@scylladb.com>	2021-04-27 01:07:22 +02:00
Gleb Natapov	b9175edea4	raft: test: check that a server with id zero cannot be neither created nor added to a config Message-Id: <20210407134853.1964226-2-gleb@scylladb.com>	2021-04-08 17:07:18 +02:00
Gleb Natapov	68d73bd4c8	raft: add test for check quorum on a leader	2021-04-07 10:15:33 +03:00
Gleb Natapov	bdb59307d3	raft: test: add test case for stepdown process Add the test for the case where C_new entry is not the last one in a leader that is been removed from a cluster. In this case a leader will continue replication even after committing C_new and will start stepdown process later, when at least one follower is fully synchronized.	2021-04-07 10:15:33 +03:00
Gleb Natapov	10781037f5	raft: test: add test that leader behaves as expected when it gets unexpended messages	2021-04-04 11:33:35 +03:00
Alejo Sanchez	ace0ee514f	raft: etcd unit tests: test proposal handling scenarios TestProposal For multiple scenarios, check proposal handling. Note, instead of expecting an explicit result for each specified case, the test automatically checks for expected behavior when quorum is reached or not. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	77163ea76a	raft: etcd unit tests: test old messages ignored TestOldMessages Checks an append request from a leader from a previous term is ignored. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	bf65b19803	raft: etcd unit tests: test single node precandidate TestSingleNodePreCandidate Checks a single node configuration with precandidate on works to automatically elect the node. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	de7051467b	raft: etcd unit tests: test dueling precandidates TestDuelingPreCandidates In a configuration of 3 nodes, two nodes don't see each other and they compete for leadership. Loser (3) should revert to follower when prevote is rejected and revert to term 1. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	aa7d23f86b	raft: etcd unit tests: test dueling candidates TestDuelingCandidates In a configuration of 3 nodes, two nodes don't see each other and they compete for leadership. Once reconnected, loser should not disrupt. But note it will remain candidate with current algorithm without prevoting and other fsms will not bump term. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	1eac94e7d6	raft: etcd unit tests: test cannot commit without new term TestCannotCommitWithoutNewTermEntry tests the entries cannot be committed when leader changes, no new proposal comes in and ChangeTerm proposal is filtered. NOTE: this doesn't check committed but it's implicit for next round; this could also use communicate() providing committed output map Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	b421fe3605	raft: etcd unit tests: test single node commit Port etcd TestSingleNodeCommit In a single node configuration elect the node, add 2 entries and check number of committed entries. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	9b4538476b	raft: etcd unit tests: update test_leader_election_overwrite_newer_logs Make test_leader_election_overwrite_newer_logs use newer communicate() and other new helpers. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:29 -04:00
Alejo Sanchez	368eec1190	raft: etcd unit tests: fix test_progress_leader Make implementation follow closer to original test. Use newer boost test helpers. NOTE: in etcd it seems a leader's self progress is in PIPELINE state. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:28 -04:00
Alejo Sanchez	ba29970e29	raft: testing: log comparison helper functions Two helper functions to compare logs. For now only index, term, and data type are used. Data content comparison does not seem to be necessary for now. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:28 -04:00
Alejo Sanchez	aeab4cf4a9	raft: testing: helper to make fsm candidate Current election_timeout() helper might bump the term twice. It's convenient and less error prone to have a more fine grained helper that stops right when candidate state is reached. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:04:19 -04:00
Alejo Sanchez	7a6616f1cb	raft: testing: expose log for test verification Let derived classes access the log to verify its contents. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:03:46 -04:00
Alejo Sanchez	05b1f57e67	raft: testing: use server_address_set Use server_address_set in local namespace for brevity. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:01:12 -04:00
Alejo Sanchez	9d0a7d8ccf	raft: testing: add prevote configuration Provide a generic prevote configuration for tests. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-25 15:00:28 -04:00
Alejo Sanchez	7e6807e8fc	raft: testing: make become_follower() available for tests Some etcd tests need to force a follower with a specific leader. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2021-03-24 19:11:09 -04:00
Konstantin Osipov	1a1d7ab662	raft: (testing) stray replies from removed followers	2021-03-24 14:05:55 +03:00
Konstantin Osipov	0295163f6f	raft: always return a non-zero configuration index from the log Return snapshot index for last configuration index if there is no configuration in the log.	2021-03-24 14:05:55 +03:00
Konstantin Osipov	cec59e53ef	raft: (testing) leader change during configuration change	2021-03-24 14:05:36 +03:00
Konstantin Osipov	a203c8833f	raft: (testing) test confchange {ABCDE} -> {ABCDEFG}	2021-03-24 14:04:18 +03:00
Konstantin Osipov	40e117d36e	raft: (testing) test confchange {ABCDEF} -> {ABCGH}	2021-03-24 14:04:18 +03:00

1 2 3

128 Commits