scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Gleb Natapov	b6b51bf17e	raft: drop use of <ranges> for clang	2020-11-03 08:49:54 +02:00
Gleb Natapov	30ff874e48	raft: make fsm::become_leader() private Message-Id: <20201015143634.2807731-4-gleb@scylladb.com>	2020-10-15 16:45:55 +02:00
Gleb Natapov	d2e8181852	raft: remove outdated comments in server_impl::add_entry_internal Message-Id: <20201015143634.2807731-3-gleb@scylladb.com>	2020-10-15 16:45:54 +02:00
Gleb Natapov	2f38c05b93	raft: fix apply fiber logging to be more consistent Message-Id: <20201015143634.2807731-2-gleb@scylladb.com>	2020-10-15 16:45:54 +02:00
Gleb Natapov	7fdfa32dbd	raft: preserve trailing raft log entries during snapshotting This patch allows to leave snapshot_trailing amount of entries when a state machine is snapshotted and raft log entries are dropped. Those entries can be used to catch up nodes that are slow without requiring snapshot transfer. The value is part of the configuration and can be changed.	2020-10-15 11:50:27 +03:00
Gleb Natapov	7c1187b7f5	raft: implement periodic snapshotting of a state machine The patch implements periodic taking of a snapshot and trimming of the raft log. In raft the only way the log of already committed entries can be shorten is by taking a snapshot of the state machine and dropping log entries included in the snapshot from the raft log. To not let log to grow too large the patch takes the snapshot periodically after applying N number of entries where N can be configured by setting snapshot_threshold value in raft's configuration.	2020-10-15 11:48:44 +03:00
Gleb Natapov	6ca03585f4	raft: add snapshot transfer logic This patch adds the logic that detects that a follower misses data from a snapshot and initiate snapshot transfer in that case. Upon receiving the snapshot the follower stores it locally and applies it to its state machine. The code assumes that the snapshot is already exists on a leader.	2020-10-15 11:44:06 +03:00
Alejo Sanchez	670824c6fa	raft: declarative tests For convenience making Raft tests, use declarative structures. Servers are set up and initialized and then updates are processed. For now, updates are just adding entries to leader and change of leader. Updates and leader changes can be specified to run after initial test setup. An example test for 3 nodes, node 0 starting as leader having two entries 0 and 1 for term 1, and with current term 2, then adding 12 entries, changing leader to node 1, and adding 12 more entries. The test will automatically add more entries to the last leader until the test limit of total_values (default 100). {.name = "test_name", .nodes = 3, .initial_term = 2, .initial_states = {{.le = {{1,0},{1,1}}}, .updates = {entries{12},new_leader{1},entries{12}},}, Leader is isolated before change via is_leader returning false. Initial leader (default server 0) will be set with this method, too. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:50:31 +02:00
Alejo Sanchez	9f401c517e	raft: make election_elapsed public for testing Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:52 +02:00
Alejo Sanchez	1bff357816	raft: fix typo snaphot snapshot Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-09 15:49:39 +02:00
Gleb Natapov	0bff15a976	raft: Send multiple entries in one append_entry rpc Send more that one entry in single append_entry message but limit one packets size according to append_request_threshold parameter. Message-Id: <20201007142602.GA2496906@scylladb.com>	2020-10-07 16:43:33 +02:00
Alejo Sanchez	6b38ecc6e0	raft: Forbid server address 0 as it has special meaning Server address UUID 0 is not a valid server id since there is code that assumes if server_id is 0 the value is not set (e.g _voted_for). Prevent users from manually setting this invalid value. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2020-10-05 15:04:46 +02:00
Konstantin Osipov	532343f09e	raft: Fix the bug with not setting the current leader. When AppendEntries/InstallSnapshot term is the same as the current server's, and the current servers' leader is not set, we should assign it to avoid starting election if the current leader becomes idle. Restructure the code accordingly - change candidate state to Follower upon InstallSnapshot.	2020-10-05 15:04:45 +02:00
Gleb Natapov	a9674a197b	raft: Get back probe_sent logic in progress::PROBE state. It was erroneously replaced by the logic based on time which caused us to send one probe per tick which is not an intention at all. There can be one outstanding probe message but the moment it gets a reply next one should be sent without waiting for a tick.	2020-10-05 15:04:44 +02:00
Konstantin Osipov	9a5f2b87dc	raft: add a short readme file The file has a brief description of the code status, usage and some implementation assumptions.	2020-10-01 14:30:59 +03:00
Gleb Natapov	e1ac1a61c9	raft: Implement log replication and leader election This patch introduces partial RAFT implementation. It has only log replication and leader election support. Snapshotting and configuration change along with other, smaller features are not yet implemented. The approach taken by this implementation is to have a deterministic state machine coded in raft::fsm. What makes the FSM deterministic is that it does not do any IO by itself. It only takes an input (which may be a networking message, time tick or new append message), changes its state and produce an output. The output contains the state that has to be persisted, messages that need to be sent and entries that may be applied (in that order). The input and output of the FSM is handled by raft::server class. It uses raft::rpc interface to send and receive messages and raft::storage interface to implement persistence.	2020-10-01 14:30:59 +03:00
Gleb Natapov	c073997431	raft: Introduce raft interface header This commit introduce public raft interfaces. raft::server represents single raft server instance. raft::state_machine represents a user defined state machine. raft::rpc, raft::rpc_client and raft::storage are used to allow implementing custom networking and storage layers. A shared failure detector interface defines keep-alive semantics, required for efficient implementation of thousands of raft groups.	2020-10-01 14:30:59 +03:00

17 Commits