Useful for debugging.
Had to make `configuration` constructor explicit. Otherwise the
`operator<<` implementation for `configuration` would implicitly convert
the `server_address` to `configuration` when trying to output it, causing
infinite recursion.
Removed implicit uses of the constructor.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The construct
struct q {
a a;
};
Changes the meaning of `a` from a type to a data member. gcc dislikes
it and I agree. Fully qualify the type name to avoid an error.
This patch also fixes rare hangs in debug mode for drops_04 without
prevote.
Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-05-v2-dueling
Tests: unit ({dev}), unit ({debug}), unit ({release})
Changes in v2:
- Fixed commit message @kostja
Whithout prevote, a node disconnected for long enough becomes candidate.
While disconnected (A) it keeps increasing its term.
When it rejoins it disrupts the current leader (C) which steps down due
to the higher term in (A)'s append_entries_reply and (C) also increases
its term.
Meanwhile followers (B) and (D) don't know (C) stepped down but see it
alive according to the current failure detecture implementation, and
also (A) has shorter log than them.
So they reject (A)'s vote requests (Raft 4.2.3 Disruptive servers).
Then (C) rejects voting for (A) because it has shorter log.
And (C) becomes candidate but even though (A) votes for (C), the
previous followers (B) and (D) ignore a vote request while leader (C) is
still alive and election timeout has not passed.
(A) and (C) alone can't reach quorum 2/4. So elections never succeed.
This patch addresses this problem by making followers not ignore vote
requests from who they think is the current leader even though
election timout was not reached.
As @kostja noted, if failure detector would consider a leader alive only
as long as it sends heartbeats (append requests) this patch is no longer
needed.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210611172734.254757-1-alejo.sanchez@scylladb.com>
Recently, the logic of elect_new_leader was changed to allow the old
leader to vote for the new candidate. But the implementation is wrong as
it re-connects the old leader in all cases disregarding if the nodes
were already disconnected.
Check if both old leader and the requested new leader are connected
first and only if it is the case then the old leader can participate in
the election.
There were occasional hangs in the loop of elect_new_leader because
other nodes besides the candidate were ticked. This patch fixes the
loop by removing ticks inside of it.
The loop is needed to handle prevote corner cases (e.g. 2 nodes).
While there, also wait log on all followers to avoid a previously
dropped leader to be a dueling candidate.
And update _leader only if it was changed.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210609193945.910592-3-alejo.sanchez@scylladb.com>
Introduce a syntax helper tagged_id::create_random_id(),
used to create a new Raft server or group id.
Provide a default ordering for tagged ids, for use
in Raft leader discovery, which selects the smallest
id for leader.
Feature requests, fixes, and OOP refactor of replication_test.
Note: all known bugs and hangs are now fixed.
A new helper class "raft_cluster" is created.
Each move of a helper function to the class has its own commit.
New helpers are provided
To simplify code, for now only a single apply function can be set per
raft_cluster. No tests were using in any other way. In the future,
there could be custom apply functions per server dynamically assigned,
if this becomes needed.
* alejo/raft-tests-replication-02-v3-30: (66 commits)
raft: replication test: wait for log for both index and term
raft: replication test: reset network at construction
raft: replication test: use lambda visitor for updates
raft: replication test: move structs into class
raft: replication test: move data structures to cluster class
raft: replication test: remove shared pointers
raft: replication test: move get_states() to raft_cluster
raft: replication test: test_server inside raft_cluster
raft: replication test: rpc declarative tests
raft: replication test: add wait_log
raft: replication test: add stop and reset server
raft: replication test: disconnect 2 support
raft: replication test: explicit node_id naming
raft: replication test: move definitions up
raft: replication test: no append entries support
raft: replication test: fix helper parameter
raft: replication test: stop servers out of config
raft: replication test: wait log when removing leader from configuration
raft: replication test: only manipulate servers in configuration
raft: replication test: only cancel rearm ticker for removed server
...
Most RAFT packets are sent very rarely during special phases of the
protocol (like election or leader stepdown). The protocol itself does
not care if a packet is sent or dropped, so returning futures from their
send function does not serve any purpose. Change the raft's rpc interface
to return void for all packet types but append_request. We still want to
get a future from sending append_request for backpressure purposes since
replication protocol is more efficient if there is no packet loss, so
it is better to pause a sender than dropping packets inside the rpc. Rpc
is still allowed to drop append_requests if overloaded.
Waiting on index alone does not guarantee leader correct leader log
propagation. This patch add checking also the term of the leader's last
log entry.
This was exposed with occasional problems with packet drops.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Move auxiliary classes connection and hash_connection out of
raft_cluster and into connected class.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Move state_machine, persistence, connection, hash_connection, connected,
failure_detector, and rpc inside raft_cluster.
This commit moves declaration of class raft_cluster up.
(Minimize changed lines)
Moves apply_fn definition from state_machine to raft_cluster.
Fixes namespace in declarations
Keeps static rpc::net outside for now to keep this commit simple.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Following gleb, tomek, and kamil's suggestion, remove unnecessary use of
lw_shared_ptr.
This also solves the problem of constructing a lw_shared_ptr from a
forward declaration (connected) in a subsequent patch.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Since there are no more external users of test_server, move it to
raft_cluster and remove member access operator.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Convert rpc replication tests to declarative form.
This will enable moving remaining parts inside raft_cluster.
For test stability, add support for checking rpc config of a node
eventually changes to the expected configuration.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
As requested by @gleb-cloudious, stop servers taken out of
configuration.
Adjust other parts of code relying on all servers being active.
Remove temporary stop on rpc server.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
If leader is removed from configuration wait log first.
Remove wait_log_all for every case as it was too broad fix.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Only start/stop, init/start/reamr tickers, wait log, elapse_election,
run free election, check for leader, and verify servers in current
configuration.
This is necessary for having servers out of configuration not
present/stopped.
Temporarily stop a server in rpc test until we truly stop servers out of
configuration in next commit.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
When changing configuration, don't pause and restart all tickers.
Only do it for the specific server(s) being removed.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Only pause and restart tickers for servers in configuration.
Currently when a server is taken out it's reset and a new one is set up,
but out of configuration. @gleb-cloudious requested to have fully
stopped servers when out of configuration, until they are re-added.
This change is needed to allow that or else restart would arm tickers on
servers no longer present.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Do verifications in raft_cluster::verify().
This will enable having persisted snapshots inside the class and
de-clutter caller code.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Keep snapshots inside raft_cluster, removing this need outside.
If this is needed later, a const getter can be implemented.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Since create_server() is in raft_cluster, there's no need for
change_configuration() to pass total values anymore. Remove it.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Do wait_log() for the next leader always in elect_new_leader.
Only wait log for new leader if it's connected to the old leader.
Pause and restart tickers when creating a candidate to avoid another
node to become a dueling candidate.
Remove pause and restart tickers around calls to elect_new_leader.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Remove the global create_raft_server() and replace with a
create_server() helper in replication_test().
This will allow not requiring the user of raft_cluster to create special
objects.
Note this does not move(apply) anymore as it's kept in raft_cluster.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Add a helper to reset a server in raft_cluster.
Besides simplifying code and preventing errors, this will help move
create_raft_server logic to raft_cluster.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Move tickers to raft_cluster helper class. Ticker initialization and
pause is done automatically at start_all() and stop_all().
Add temporary helpers to manage specific tickers. These might be removed
later once proper node abort and reset are implemented.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>