Commit Graph

186 Commits

Author SHA1 Message Date
Alex Dathskovsky
5e89a78c8f raft: refactor can_vote logic and type
This PR refactors the can_vote function in the Raft algorithms for improved clarity and maintainability by providing safer strong boolean types to the raft algorithm.

Fixes: #21937

Backport: No backport required

Closes scylladb/scylladb#25787
2025-09-24 13:55:05 +02:00
Abhinav Jha
a0ee5e4b85 raft: replication test: change rpc_propose_conf_change test to SEASTAR_THREAD_TEST_CASE
RAFT_TEST_CASE macro creates 2 test cases, one with random 20% packet
loss named name_drops. The framework makes hard coded assumptions about
leader which doesn't hold well in case of packet losses.

This short term fix disables the packet drop variant of the specified test.
It should be safe to re-enable it once the whole framework is re-worked to
remove these hard coded assumptions.

This PR fixes a bug. Hence we need to backport it.

Fixes: scylladb/scylladb#23816

Closes scylladb/scylladb#25489
2025-08-14 13:15:16 +02:00
Kefu Chai
7215d4bfe9 utils: do not include unused headers
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.

please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-01-14 07:56:39 -05:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
db9e314965 treewide: apply codespell to the comments in source code
for less spelling errors in comment.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16408
2023-12-20 10:25:03 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Konstantin Osipov
990c7a209f raft: change the API of conf change notifications
Pass a change diff into the notification callback,
rather than add or remove servers one by one, so that
if we need to persist the state, we can do it once per
configuration change, not for every added or removed server.

For now still pass added and removed entries in two separate calls
per a single configuration change. This is done mainly to fulfill the
library contract that it never sends messages to servers
outside the current configuration. The group0 RPC
implementation doesn't need the two calls, since it simply
marks the removed servers as expired: they are not removed immediately
anyway, and messages can still be delivered to them.
However, there may be test/mock implementations of RPC which
could benefit from this contract, so we decided to keep it.
2022-11-17 12:07:31 +03:00
Petr Gusev
bc50b7407f raft replication_test, make backpressure test to do actual backpressure
Before this patch this test didn't actually experience any backpressure since all the commands were executed sequentially.
2022-09-27 12:04:14 +04:00
Petr Gusev
27e60ecbf4 raft server, log size limit in bytes
Before this patch we could get an OOM if we
received several big commands. The number of
commands was small, but their total size
in bytes was large.

snapshot_trailing_size is needed to guarantee
progress. Without this limit the fsm could
get stuck if the size of the next item is
greater than max_log_size - (size of trailing entries).
2022-09-26 13:10:10 +04:00
Kamil Braun
daf9c53bb8 raft: split can_vote field from server_address to separate struct
Whether a server can vote in a Raft configuration is not part of the
address. `server_address` was used in many context where `can_vote` is
irrelevant.

Split the struct: `server_address` now contains only `id` and
`server_info` as it did before `can_vote` was introduced. Instead we
have a `config_member` struct that contains a `server_address` and the
`can_vote` field.

Also remove an "unsafe" constructor from `server_address` where `id` was
provided but `server_info` was not. The constructor was used for tests
where `server_info` is irrelevant, but it's important not to forget
about the info in production code. The constructor was used for two
purposes:
- Invoking set operations such as `contains`. To solve this we use C++20
  transparent hash and comparator functions, which allow invoking
  `contains` and similar functions by providing a different key type (in
  this case `raft::server_id` in set of addresses, for example).
- constructing addresses without `info`s in tests. For this we provide
  helper functions in the test helpers module and use them.
2022-07-18 18:22:10 +02:00
Kamil Braun
44a1a8a8b0 raft: operator<<(ostream&, ...) implementation for server_address and configuration
Useful for debugging.

Had to make `configuration` constructor explicit. Otherwise the
`operator<<` implementation for `configuration` would implicitly convert
the `server_address` to `configuration` when trying to output it, causing
infinite recursion.

Removed implicit uses of the constructor.
2022-01-26 16:09:41 +01:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Konstantin Osipov
65e549946f raft: add a test case for adding entries on follower 2021-11-25 11:50:38 +03:00
Gleb Natapov
3ff6f76cef raft: test: add read_barrier test to replication_test 2021-08-25 08:57:13 +03:00
Alejo Sanchez
497af3167f raft: replication test: connectivity configuration
Pass packet drops within connectivity configuration struct.
Default to no packet drops.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
192ac5be4c raft: replication test: use minimum granularity
seastar lowres_clock minimum granularity is 10ms, not 1ms.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
2db3192ac3 raft: replication test: template clock type
Templetize clock type.

Use a struct for run_test to work around
https://bugs.llvm.org/show_bug.cgi?id=50345

With help from @kbr-

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
cb35588fb1 raft: replication test: tick delta inside raft_cluster
Store tick delta inside raft_cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Alejo Sanchez
6e2ab657b3 raft: replication test: move common code out
Common replication test code moved to header.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-08-23 17:50:16 +02:00
Avi Kivity
332b5c395f raft: avoid changing meaning of a symbol inside a class
The construct

struct q {
    a a;
};

Changes the meaning of `a` from a type to a data member. gcc dislikes
it and I agree. Fully qualify the type name to avoid an error.
2021-07-11 18:16:21 +03:00
Alejo Sanchez
9a22a30554 raft: replication test: split elect_new_leader for prevote
Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-02-v3-01

Tests: unit ({dev}), unit ({debug}), unit ({release})

This fixes current election hangs in next.

Message-Id: <20210610143558.131685-1-alejo.sanchez@scylladb.com>
2021-06-15 11:53:24 +02:00
Alejo Sanchez
5c8092cf42 raft: fix election with disruptive candidate
This patch also fixes rare hangs in debug mode for drops_04 without
prevote.

Branch URL: https://github.com/alecco/scylla/tree/raft-fixes-05-v2-dueling

Tests: unit ({dev}), unit ({debug}), unit ({release})

Changes in v2:
    - Fixed commit message                               @kostja

Whithout prevote, a node disconnected for long enough becomes candidate.
While disconnected (A) it keeps increasing its term.
When it rejoins it disrupts the current leader (C) which steps down due
to the higher term in (A)'s append_entries_reply and (C) also increases
its term.

Meanwhile followers (B) and (D) don't know (C) stepped down but see it
alive according to the current failure detecture implementation, and
also (A) has shorter log than them.
So they reject (A)'s vote requests (Raft 4.2.3 Disruptive servers).

Then (C) rejects voting for (A) because it has shorter log.
And (C) becomes candidate but even though (A) votes for (C), the
previous followers (B) and (D) ignore a vote request while leader (C) is
still alive and election timeout has not passed.

(A) and (C) alone can't reach quorum 2/4. So elections never succeed.

This patch addresses this problem by making followers not ignore vote
requests from who they think is the current leader even though
election timout was not reached.

As @kostja noted, if failure detector would consider a leader alive only
as long as it sends heartbeats (append requests) this patch is no longer
needed.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210611172734.254757-1-alejo.sanchez@scylladb.com>
2021-06-14 11:07:38 +02:00
Alejo Sanchez
ff34a6515d raft: replication test: fix elect_new_leader
Recently, the logic of elect_new_leader was changed to allow the old
leader to vote for the new candidate. But the implementation is wrong as
it re-connects the old leader in all cases disregarding if the nodes
were already disconnected.

Check if both old leader and the requested new leader are connected
first and only if it is the case then the old leader can participate in
the election.

There were occasional hangs in the loop of elect_new_leader because
other nodes besides the candidate were ticked.  This patch fixes the
loop by removing ticks inside of it.

The loop is needed to handle prevote corner cases (e.g. 2 nodes).

While there, also wait log on all followers to avoid a previously
dropped leader to be a dueling candidate.

And update _leader only if it was changed.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Message-Id: <20210609193945.910592-3-alejo.sanchez@scylladb.com>
2021-06-10 12:36:25 +02:00
Konstantin Osipov
c9a23e9b8a raft: (internal) tagged_id minor improvements
Introduce a syntax helper tagged_id::create_random_id(),
used to create a new Raft server or group id.

Provide a default ordering for tagged ids, for use
in Raft leader discovery, which selects the smallest
id for leader.
2021-06-08 14:52:32 +03:00
Tomasz Grabiec
50d64646cd Merge "raft: replication test fixes and OOP refactor" from Alejo
Feature requests, fixes, and OOP refactor of replication_test.

Note: all known bugs and hangs are now fixed.

A new helper class "raft_cluster" is created.
Each move of a helper function to the class has its own commit.
New helpers are provided

To simplify code, for now only a single apply function can be set per
raft_cluster. No tests were using in any other way. In the future,
there could be custom apply functions per server dynamically assigned,
if this becomes needed.

* alejo/raft-tests-replication-02-v3-30: (66 commits)
  raft: replication test: wait for log for both index and term
  raft: replication test: reset network at construction
  raft: replication test: use lambda visitor for updates
  raft: replication test: move structs into class
  raft: replication test: move data structures to cluster class
  raft: replication test: remove shared pointers
  raft: replication test: move get_states() to raft_cluster
  raft: replication test: test_server inside raft_cluster
  raft: replication test: rpc declarative tests
  raft: replication test: add wait_log
  raft: replication test: add stop and reset server
  raft: replication test: disconnect 2 support
  raft: replication test: explicit node_id naming
  raft: replication test: move definitions up
  raft: replication test: no append entries support
  raft: replication test: fix helper parameter
  raft: replication test: stop servers out of config
  raft: replication test: wait log when removing leader from configuration
  raft: replication test: only manipulate servers in configuration
  raft: replication test: only cancel rearm ticker for removed server
  ...
2021-06-06 19:18:49 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Gleb Natapov
bb822c92ab raft: change raft::rpc api to return void for most sending functions
Most RAFT packets are sent very rarely during special phases of the
protocol (like election or leader stepdown). The protocol itself does
not care if a packet is sent or dropped, so returning futures from their
send function does not serve any purpose. Change the raft's rpc interface
to return void for all packet types but append_request. We still want to
get a future from sending append_request for backpressure purposes since
replication protocol is more efficient if there is no packet loss, so
it is better to pause a sender than dropping packets inside the rpc. Rpc
is still allowed to drop append_requests if overloaded.
2021-06-06 19:18:49 +03:00
Alejo Sanchez
3e91a8ca0d raft: replication test: wait for log for both index and term
Waiting on index alone does not guarantee leader correct leader log
propagation. This patch add checking also the term of the leader's last
log entry.

This was exposed with occasional problems with packet drops.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-04 08:38:19 -04:00
Alejo Sanchez
545893145e raft: replication test: reset network at construction
Reset network in constructor, not in unrelated function.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-04 08:18:32 -04:00
Alejo Sanchez
294dcfb204 raft: replication test: use lambda visitor for updates
Process updates with a lambda visitor.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-04 08:18:31 -04:00
Alejo Sanchez
a3fc974de9 raft: replication test: move structs into class
Move auxiliary classes connection and hash_connection out of
raft_cluster and into connected class.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
5b688d42d7 raft: replication test: move data structures to cluster class
Move state_machine, persistence, connection, hash_connection, connected,
failure_detector, and rpc inside raft_cluster.

This commit moves declaration of class raft_cluster up.
(Minimize changed lines)
Moves apply_fn definition from state_machine to raft_cluster.
Fixes namespace in declarations
Keeps static rpc::net outside for now to keep this commit simple.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
1250d910ee raft: replication test: remove shared pointers
Following gleb, tomek, and kamil's suggestion, remove unnecessary use of
lw_shared_ptr.

This also solves the problem of constructing a lw_shared_ptr from a
forward declaration (connected) in a subsequent patch.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
aa1200ee50 raft: replication test: move get_states() to raft_cluster
Move get_states() helper inside raft cluster.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
740545cdc5 raft: replication test: test_server inside raft_cluster
Since there are no more external users of test_server, move it to
raft_cluster and remove member access operator.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
1ee4408869 raft: replication test: rpc declarative tests
Convert rpc replication tests to declarative form.

This will enable moving remaining parts inside raft_cluster.

For test stability, add support for checking rpc config of a node
eventually changes to the expected configuration.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
f11ae18158 raft: replication test: add wait_log
Allow test cases to specify waiting for log for one or more servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
fa84b15909 raft: replication test: add stop and reset server
Add stop an reset server support.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
19d28e7e0f raft: replication test: disconnect 2 support
Support custom disconnection of 2 servers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
e2612e5327 raft: replication test: explicit node_id naming
Use node_id{x} for more expressive naming in tests.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
bdfdd2da0b raft: replication test: move definitions up
Move definitions up for next patch.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
14bd29f974 raft: replication test: no append entries support
Handle test cases not appending entries.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
8468059d0e raft: replication test: stop servers out of config
As requested by @gleb-cloudious, stop servers taken out of
configuration.

Adjust other parts of code relying on all servers being active.

Remove temporary stop on rpc server.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
51343d4de7 raft: replication test: wait log when removing leader from configuration
If leader is removed from configuration wait log first.

Remove wait_log_all for every case as it was too broad fix.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
e032d8446f raft: replication test: only manipulate servers in configuration
Only start/stop, init/start/reamr tickers, wait log, elapse_election,
run free election, check for leader, and verify servers in current
configuration.

This is necessary for having servers out of configuration not
present/stopped.

Temporarily stop a server in rpc test until we truly stop servers out of
configuration in next commit.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
ec078ca55f raft: replication test: only cancel rearm ticker for removed server
When changing configuration, don't pause and restart all tickers.
Only do it for the specific server(s) being removed.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
802f68317e raft: replication test: only pause restart tickers in config
Only pause and restart tickers for servers in configuration.

Currently when a server is taken out it's reset and a new one is set up,
but out of configuration. @gleb-cloudious requested to have fully
stopped servers when out of configuration, until they are re-added.
This change is needed to allow that or else restart would arm tickers on
servers no longer present.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
85f299e39b raft: replication test: simplify calls to helpers
Pass test update directly to helpers.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00
Alejo Sanchez
27f50b3589 raft: replication test: persisted snapshots in raft_cluster
Move persisted snapshots inside raft_cluster, de-cluttering code.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2021-06-01 23:47:03 -04:00