Commit Graph

181 Commits

Author SHA1 Message Date
Sergey Zolotukhin
612a141660 raft: Fix race condition on override_snapshot_thresholds.
When the server_impl::applier_fiber is paused by a co_await at line raft/server.cc:1375:
```
co_await override_snapshot_thresholds();
```
a new snapshot may be applied, which updates the actual values of the log's last applied
and snapshot indexes. As a result, the new snapshot index could become higher than the
old value stored in _applied_idx at line raft/server.cc:1365, leading to an assertion
failure in log::last_conf_for().
Since error injection is disabled in release builds, this issue does not affect production releases.

This issue was introduced in the following commit
9dfa041fe1,
when error injection was added to override the log snapshot configuration parameters.

How to reproduce:

1. Build debug version of randomized_nemesis_test
```
ninja-build build/debug/test/raft/randomized_nemesis_test
```
2. Run
```
parallel --halt now,fail=1 -j20 'build/debug/test/raft/randomized_nemesis_test \
--run_test=test_frequent_snapshotting  -- -c2 -m2G --overprovisioned --unsafe-bypass-fsync 1 \
--kernel-page-cache 1 --blocked-reactor-notify-ms 2000000  --default-log-level \
trace > tmp/logs/eraseme_{}.log  2>&1 && rm tmp/logs/eraseme_{}.log' ::: {1..1000}
```

Fixes scylladb/scylladb#20363

Closes scylladb/scylladb#20555
2024-09-12 16:19:27 +02:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Abhi
9b09439065 raft: Add descriptions for requested abort errors
Fixes: scylladb/scylladb#18902

Closes scylladb/scylladb#20291
2024-09-10 17:56:29 +02:00
Sergey Zolotukhin
13b3d3a795 raft: Ensure const correctness in applier_fiber.
Add 'const' to non mutable varibales in server_impl::applier_fiber() function.
2024-08-20 15:24:00 +02:00
Sergey Zolotukhin
c3e52ab942 raft: Invoke store_snapshot_descriptor with actually preserved items.
- raft_sys_table_storage::store_snapshot_descriptor now receives a number of
preserved items in the log, rather than _config.snapshot_trailing value;
- Incorrect check for truncated number of items in store_snapshot_descriptor
was removed.

Fixes scylladb/scylladb#16817
Fixes scylladb/scylladb#20080
2024-08-20 15:22:49 +02:00
Sergey Zolotukhin
922e035629 raft: Use raft_server_set_snapshot_thresholds in tests.
Replace raft_server_snapshot_reduce_threshold with raft_server_set_snapshot_thresholds in tests
as raft_server_set_snapshot_thresholds fully covers the functionality of raft_server_snapshot_reduce_threshold.
2024-08-20 15:08:49 +02:00
Sergey Zolotukhin
00a1d3e305 raft: Fix indentation in server.cc 2024-08-20 15:08:45 +02:00
Sergey Zolotukhin
9dfa041fe1 raft: Add raft_server_set_snapshot_thresholds injection.
Use error injection to allow overriding following snapshot threshold settings:
- snapshot_threshold
- snapshot_threshold_log_size
- snapshot_trailing
- snapshot_trailing_size
2024-08-20 14:15:50 +02:00
Laszlo Ersek
4f1f207be1 raft/server: clean up index_t usage
With implicit conversion of tagged integers to untagged ones going away,
explicitly tag (or untag, as necessary) the operands of the following
operations, in "raft/server.cc":

- addition of tagged and untagged (both should be tagged)

- subscripting an array by tagged (should be untagged)

- comparing a size-like threshold against tagged (should be untagged)

- exposing tagged via gauges (should be untagged)

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Gleb Natapov
9ebdb23002 raft: add more raft metrics to make debug easier 2024-07-01 10:55:22 +02:00
Yaniv Michael Kaul
82875095e9 Raft: improve descriptions of metrics
1. Fixed a single typo (send -> sent)
2. Rephrase 'How many' to 'Number of' and use less passive tense.
3. Be more specific in the description of the different metrics insteda of the more generic descriptions.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#19067
2024-06-06 15:18:47 +03:00
Kefu Chai
e97ae6b0de raft: server: print pointee of server_impl::_fsm
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, instead of printing the `unique_ptr` instance, we
print the pointee of it. since `server_impl` uses pimpl paradigm,
`_fsm` is always valid after `server_impl::start()`, we can always
deference it without checking for null.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17953
2024-03-25 11:20:34 +02:00
Kamil Braun
0eda7a2619 raft: server: add trigger_snapshot API
This allows the user of `raft::server` to ask it to create a snapshot
and truncate the Raft log. In a later commit we'll add a REST endpoint
to Scylla to trigger group 0 snapshots.

One use case for this API is to create group 0 snapshots in Scylla
deployments which upgraded to Raft in version 5.2 and started with an
empty Raft log with no snapshot at the beginning. This causes problems,
e.g. when a new node bootstraps to the cluster, it will not receive a
snapshot that would contain both schema and group 0 history, which would
then lead to inconsistent schema state and trigger assertion failures as
observed in scylladb/scylladb#16683.

In 5.4 the logic of initial group 0 setup was changed to start the Raft
log with a snapshot at index 1 (ff386e7a44)
but a problem remains with these existing deployments coming from 5.2,
we need a way to trigger a snapshot in them (other than performing 1000
arbitrary schema changes).

Another potential use case in the future would be to trigger snapshots
based on external memory pressure in tablet Raft groups (for strongly
consistent tables).
2024-01-23 16:48:28 +01:00
Kamil Braun
3268be3860 raft: server: track last persisted snapshot descriptor index
Also introduce a condition variable notified whenever this index is
updated.

Will be user in following commits.
2024-01-22 16:48:08 +01:00
Kamil Braun
1e786d9d64 raft: server: framework for handling server requests
Add data structures and modify `io_fiber` code to prepare it for
handling requests generated by the `server`, not just `fsm`.
Used in later commits.
2024-01-22 16:47:34 +01:00
Kamil Braun
8d9b0a6538 raft: server: inline poll_fsm_output 2024-01-18 18:09:13 +01:00
Kamil Braun
754a7b54e4 raft: server: fix indentation 2024-01-18 18:09:11 +01:00
Kamil Braun
527780987b raft: server: move io_fiber's processing of batch to a separate function 2024-01-18 18:09:02 +01:00
Kamil Braun
3e6b4910a6 raft: move poll_output() from fsm to server
`server` was the only user of this function and it can now be
implemented using `fsm`'s public interface.

In later commits we'll extend the logic of `io_fiber` to also subscribe
to other events, triggered by `server` API calls, not only to outputs
from `fsm`.
2024-01-18 18:07:52 +01:00
Kamil Braun
95b6a60428 raft: move _sm_events from fsm to server
In later commits we will use it to wake up `io_fiber` directly from
`raft::server` based on events generated by `raft::server` itself -- not
only from events generated by `raft::fsm`.

`raft::fsm` still obtains a reference to the condition variable so it
can keep signaling it.
2024-01-18 18:07:44 +01:00
Kamil Braun
dccfd09d83 raft: pass max_trailing_entries through fsm_output to store_snapshot_descriptor
This parameter says how many entries at most should be left trailing
before the snapshot index. There are multiple places where this
decision is made:
- in `applier_fiber` when the server locally decides to take a snapshot
  due to log size pressure; this applies to the in-memory log
- in `fsm::step` when the server received an `install_snapshot` message
  from the leader; this also applies to the in-memory log
- and in `io_fiber` when calling `store_snapshot_descriptor`; this
  applies to the on-disk log.

The logic of how many entries should be left trailing is calculated
twice:
- first, in `applier_fiber` or in `fsm::step` when truncating the
  in-memory log
- and then again as the snapshot descriptor is being persisted.

The logic is to take `_config.snapshot_trailing` for locally generated
snapshots (coming from `applier_fiber`) and `0` for remote snapshots
(from `fsm::step`).

But there is already an error injection that changes the behavior of
`applier_fiber` to leave `0` trailing entries. However, this doesn't
affect the following `store_snapshot_descriptor` call which still uses
`_config.snapshot_trailing`. So if the server got restarted, the entries
which were truncated in-memory would get "revived" from disk.
Fortunately, this is test-only code.

However in future commits we'd like to change the logic of
`applier_fiber` even further. So instead of having a separate
calculation of trailing entries inside `io_fiber`, it's better for it to
use the number that was already calculated once. This number is passed to
`fsm::apply_snapshot` (by `applier_fiber` or `fsm::step`) and can then
be received by `io_fiber` from `fsm_output` to use it inside
`store_snapshot_descriptor`.
2024-01-18 18:05:45 +01:00
Kamil Braun
40cd91cff7 raft: server: pass *_aborted to set_exception call
This looks like a minor oversight, in `server_impl::abort` there are
multiple calls to `set_exception` on the different promises, only one of
them would not receive `*_aborted`.
2024-01-18 18:05:18 +01:00
Patryk Jędrzejczak
df2034ebd7 server, raft_group0_client: remove the default nullptr values
The previous commit has fixed 5 bugs of the same type - incorrectly
passing the default nullptr to one of the changed functions. At
least some of these bugs wouldn't appear if there was no default
value. It's much harder to make this kind of a bug if you have to
write "nullptr". It's also much easier to detect it in review.

Moreover, these default values are rarely used outside tests.
Keeping them is just not worth the time spent on debugging.
2024-01-05 18:45:50 +01:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Piotr Dulikowski
c58ff554d8 raft: rpc: introduce destination_not_alive_error
Add a new destination_not_alive_error, thrown from two-way RPCs in case
when the RPC is not issued because the destination is not reported as
alive by the failure detector.

In snapshot transfer code, lower the verbosity of the message printed in
case it fails on the new error. This is done to prevent flakiness in the
CI - in case of slow runs, nodes might get spuriously marked as dead if
they are busy, and a message with the "error" verbosity can cause some
tests to fail.
2023-11-23 11:14:28 +01:00
Piotr Dulikowski
a1ebfcf006 raft: add server::is_alive
Add a method which reports whether given raft server is running.

In following commits, the information about whether the local raft
group 0 is running or not will be included in the response to the
failure detector ping, and the is_alive method will be used there.
2023-11-23 00:34:22 +01:00
Gleb Natapov
9f6e93c144 raft: make sure that all operation forwarded to a leader are completed before destroying raft server
Hold a gate around all operations that are forwarded to a leader to be
able to wait for them during server::abort() otherwise the abort() may
complete while those operations are still running which may cause use
after free.
2023-10-25 13:29:36 +03:00
Piotr Dulikowski
64668e325e raft: expose current_leader in raft::server
The handler for join_node_request will need to know which node is
considered the group 0 leader right now by the local node.

If the topology coordinator crashes and a new node immediately wants to
replace it with the same IP, the node that handles join_node_request
will attempt to perform a read barrier. If this happens quickly enough,
due to the IP reuse the RPC will be sent to the new node instead of the
(now crashed) topology coordinator; the RPC will get an error and will
fail the barrier.

If we detect that the new node wants to replace the current topology
coordinator, the upcoming join_node_request_handler will wait until
there is a leader change.
2023-09-26 15:56:52 +02:00
Gleb Natapov
55f047f33f raft: drop assert in server_impl::apply_snapshot for a condition that may happen
server_impl::apply_snapshot() assumes that it cannot receive a snapshots
from the same host until the previous one is handled and usually this is
true since a leader will not send another snapshot until it gets
response to a previous one. But it may happens that snapshot sending
RPC fails after the snapshot was sent, but before reply is received
because of connection disconnect. In this case the leader may send
another snapshot and there is no guaranty that the previous one was
already handled, so the assumption may break.

Drop the assert that verifies the assumption and return an error in this
case instead.

Fixes: #15222

Message-ID: <ZO9JoEiHg+nIdavS@scylladb.com>
2023-09-01 07:17:49 +03:00
Mikołaj Grzebieluch
dc6017b71b raft topology: make mutation_size_threshold depends on max_command_size
`get_cdc_generation_mutations` splits data to mutations of maximal size
`mutation_size_treshold`. Before this commit it was hardcoded to 2 MB.

Calculate `mutation_size_threshold` to leave space for cdc generation
data and not exceed `max_command_size`.
2023-07-07 13:11:52 +02:00
Kamil Braun
5504da3745 raft: server: throw fewer commit_status_unknowns from wait_for_entry
There are some cases where we can deduce that the entry was committed,
but we were throwing `commit_status_unknown`. Handle one more such case.
The added comment explains it in detail.

Also add a FIXME for another case where we throw `commit_status_unknown`
but we could do better.

Fixes: #14029
2023-06-07 14:17:23 +02:00
Kefu Chai
f80f638bb9 raft: disambiguate promise name in raft::awaited_conf_changes
otherwise GCC 13 complains that

```
/home/kefu/dev/scylladb/raft/server.cc:42:15: error: declaration of ‘seastar::promise<void> raft::awaited_index::promise’ changes meaning of ‘promise’ [-Wchanges-meaning]
   42 |     promise<> promise;
      |               ^~~~~~~
/home/kefu/dev/scylladb/raft/server.cc:42:5: note: used here to mean ‘class seastar::promise<void>’
   42 |     promise<> promise;
      |     ^~~~~~~~~
```
see also cd4af0c722

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-29 17:02:25 +08:00
Tomasz Grabiec
c1fdbe79b7 raft: Introduce 'raft_server_force_snapshot' error injection
Will be used by tests to force followers to catch up from the snapshot.
2023-04-24 10:49:37 +02:00
Kefu Chai
3425184b2a raft: include boost header using <path/to/header> not "path/to/header"
for more consistency with the rest of the source tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:07:50 +08:00
Kefu Chai
0421d6d12f raft: include used header
at least, we need to access the declarations of exceptions, like
`not_a_leader` and `dropped_entry`, so, instead of relying on
other header to do this job for us, we should include the header
which include the declaration. so, in this chance "raft.h" is
include explicitly.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-26 14:07:50 +08:00
Botond Dénes
19560419d2 Merge 'treewide: improve compatibility with gcc 13' from Avi Kivity
An assortment of patches that reduce our incompatibilities with the upcoming gcc 13.

Closes #13243

* github.com:scylladb/scylladb:
  transport: correctly format unknown opcode
  treewide: catch by reference
  test: raft: avoid confusing string compare
  utils, types, test: extract lexicographical compare utilities
  test: raft: fsm_test: disambiguate raft::configuration construction
  test: reader_concurrency_semaphore_test: handle all enum values
  repair: fix signed/unsigned compare
  repair: fix incorrect signed/unsigned compare
  treewide: avoid unused variables in if statements
  keys: disambiguate construction from initializer_list<bytes>
  cql3: expr: fix serialize_listlike() reference-to-temporary with gcc
  compaction: error on invalid scrub type
  treewide: prevent redefining names
  api: task_manager: fix signed/unsigned compare
  alternator: streams: fix signed/unsigned comparison
  test: fix some mismatched signed/unsigned comparisons
2023-03-24 15:16:05 +02:00
Avi Kivity
a806024e1d treewide: avoid unused variables in if statements
gcc warns about unused variables declared in if statements. Just
drop them.
2023-03-21 13:42:49 +02:00
Botond Dénes
bf8b746bca Merge 'utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>' from Kefu Chai
this is a part of a series migrating from `operator<<(ostream&, ..)` based formatting to fmtlib based formatting. the goal here is to enable fmtlib to print UUID without using ostream<<. also, this change re-implements some formatting helpers using fmtlib for better performance and less dependencies on operator<<(), but we cannot drop it at this moment, as quite a few caller sites are still using operator<<(ostream&, const UUID&) and operator<<(ostream&, tagged_uuid<T>&). we will address them separately.

* add `fmt::formatter<UUID>`
* add `fmt::formatter<tagged_uuid<T>>`
* implement `UUID::to_string()` using `fmt::to_string()`
* implement `operator<<(std::ostream&, const UUID&)` with `fmt::print()`, this should help to improve the performance when printing uuid, as `fmt::print()` does not materialize a string when printing the uuid.
* treewide: use fmtlib when printing UUID

Refs #13245

Closes #13246

* github.com:scylladb/scylladb:
  treewide: use fmtlib when printing UUID
  utils: UUID: specialize fmt::formatter for UUID and tagged_uuid<>
2023-03-20 14:26:11 +02:00
Gleb Natapov
2fc8e13dd8 raft: add server::wait_for_state_change() function
Add a function that allows waiting for a state change of a raft server.
It is useful for a user that wants to know when a node becomes/stops
being a leader.

Message-Id: <20230316112801.1004602-4-gleb@scylladb.com>
2023-03-20 11:31:55 +01:00
Kefu Chai
94c6df0a08 treewide: use fmtlib when printing UUID
this change tries to reduce the number of callers using operator<<()
for printing UUID. they are found by compiling the tree after commenting
out `operator<<(std::ostream& out, const UUID& uuid)`. but this change
alone is not enough to drop all callers, as some callers are using
`operator<<(ostream&, const unordered_map&)` and other overloads to
print ranges whose elements contain UUID. so in order to limit the
 scope of the change, we are not changing them here.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-20 15:38:45 +08:00
Gleb Natapov
9bdef9158e raft: abort applier fiber when a state machine aborts
After 5badf20c7a applier fiber does not
stop after it gets abort error from a state machine which may trigger an
assertion because previous batch is not applied. Fix it.

Fixes #12863
2023-02-15 15:54:19 +02:00
Gleb Natapov
dfcd56736b raft: fix race in add_entry_on_leader that may cause incorrect log length accounting
In add_entry_on_leader after wait_for_memory_permit() resolves but before
the fiber continue to run the node may stop becoming the leader and then
become a leader again which will cause currently hold units outdated.
Detect this case by checking the term after the preemption.
2023-02-15 15:51:59 +02:00
Alejo Sanchez
346d02b477 raft conf error injection for snapshot
To trigger snapshot limit behavior provide an error injection to set
with one-shot.

Note this effectively changes it and there is no revert.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-02-03 22:33:33 +01:00
Gleb Natapov
229cef136d raft: add trace logging to raft::server::start
Allows to see initial state of the server during start.

Message-Id: <20221228144944.3299711-15-gleb@scylladb.com>
2023-01-02 11:57:53 +02:00
Gleb Natapov
5182543df2 raft: fix typo in read_barrier logging
The log logs applied index not append one.

Message-Id: <20221228144944.3299711-3-gleb@scylladb.com>
2023-01-02 11:38:47 +02:00
Gleb Natapov
022a825b33 raft: introduce not_a_member error and return it when non member tries to do add/modify_config
Currently if a node that is outside of the config tries to add an entry
or modify config transient error is returned and this causes the node
to retry. But the error is not transient. If a node tries to do one of
the operations above it means it was part of the cluster at some point,
but since a node with the same id should not be added back to a cluster
if it is not in the cluster now it will never be.

Return a new error not_a_member to a caller instead.

Message-Id: <Y42mTOx8bNNrHqpd@scylladb.com>
2022-12-05 17:11:04 +01:00
Konstantin Osipov
990c7a209f raft: change the API of conf change notifications
Pass a change diff into the notification callback,
rather than add or remove servers one by one, so that
if we need to persist the state, we can do it once per
configuration change, not for every added or removed server.

For now still pass added and removed entries in two separate calls
per a single configuration change. This is done mainly to fulfill the
library contract that it never sends messages to servers
outside the current configuration. The group0 RPC
implementation doesn't need the two calls, since it simply
marks the removed servers as expired: they are not removed immediately
anyway, and messages can still be delivered to them.
However, there may be test/mock implementations of RPC which
could benefit from this contract, so we decided to keep it.
2022-11-17 12:07:31 +03:00
Kamil Braun
0c9cb5c5bf Merge 'raft: wait for the next tick before retrying' from Gusev Petr
When `modify_config` or `add_entry` is forwarded to the leader, it may
reach the node at "inappropriate" time and result in an exception. There
are two reasons for it - the leader is changing and, in case of
`modify_config`, other `modify_config` is currently in progress. In both
cases the command is retried, but before this patch there was no delay
before retrying, which could led to a tight loop.

The patch adds a new exception type `transient_error`. When the client
receives it, it is obliged to retry the request after some delay.
Previously leader-side exceptions were converted to `not_a_leader`,
which is strange, especially for `conf_change_in_progress`.

Fixes: #11564

Closes #11769

* github.com:scylladb/scylladb:
  raft: rafactor: remove duplicate code on retries delays
  raft: use wait_for_next_tick in read_barrier
  raft: wait for the next tick before retrying
2022-11-16 18:20:54 +01:00