Commit Graph

33406 Commits

Author SHA1 Message Date
Tomasz Grabiec
ee2398960c Merge 'service/raft: simplify raft_address_map' from Kamil Braun
The `raft_address_map` code was "clever": it used two intrusive data structures and did a lot of manual lifetime management; raw pointer manipulation, manual deletion of objects... It wasn't clear who owns which object, who is responsible for deleting what. And there was a lot of code.

In this PR we replace one of the intrusive data structures with a good old `std::unordered_map` and make ownership clear by replacing the raw pointers with `std::unique_ptr`. Furthermore, some invariants which were not clear and enforced in runtime are now encoded in the type system.

The code also became shorter: we reduced its length from ~360 LOC to ~260 LOC.

Closes #11763

* github.com:scylladb/scylladb:
  service/raft: raft_address_map: get rid of `is_linked` checks
  service/raft: raft_address_map: get rid of `to_list_iterator`
  service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr`
  service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr
  service/raft: raft_address_map: don't use intrusive set for timestamped entries
  service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr`
2022-10-13 18:08:49 +02:00
Kamil Braun
954849799d test/topology: disable flaky test_decommission_add_column
Flaky due to #11780, causes next promotion failures.
We can reenable it after the issue is fixed or a workaround is found.
2022-10-13 17:45:46 +02:00
Kamil Braun
5a9371bcb0 service/raft: raft_address_map: get rid of is_linked checks
Being linked is an invariant of `expiring_entry_ptr`. Make it explicit
by moving the `_expiring_list.push_front` call into the constructor.
2022-10-13 15:17:07 +02:00
Kamil Braun
cdf3367c05 service/raft: raft_address_map: get rid of to_list_iterator
Unnecessary.
2022-10-13 15:17:06 +02:00
Kamil Braun
0e29495c38 service/raft: raft_address_map: simplify ownership of expiring_entry_ptr
The owner of `expiring_entry_ptr` was almost uniquely its corresponding
`timestamp_entry`; it would delete the expiring entry when it itself got
destroyed. There was one call to explicit `unlink_and_dispose`, which
made the picture unclear.

Make the picture clear: `timestamped_entry` now contains a `unique_ptr`
to its `expiring_entry_ptr`. The `unlink_and_dispose` was replaced with
`_lru_entry = nullptr`.

We can also get rid of the back-reference from `expiring_entry_ptr` to
`timestamped_entry`.

The code becomes shorter and simpler.
2022-10-13 15:16:40 +02:00
Petr Gusev
c76cf5956d removenode: don't stream data from the leaving node
If a removenode is run for a recently stopped node,
the gossiper may not yet know that the node is down,
and the removenode will fail with a Stream failed error
trying to stream data from that node.

In this patch we explicitly reject removenode operation
if the gossiper considers the leaving node up.

Closes #11704
2022-10-13 15:11:32 +02:00
Takuya ASADA
49d5e51d76 reloc: add support stripped binary installation for relocatable package
This add support stripped binary installation for relocatable package.
After this change, scylla and unified packages only contain stripped binary,
and introduce "scylla-debuginfo" package for debug symbol.
On scylla-debuginfo package, install.sh script will extract debug symbol
at /opt/scylladb/<dir>/.debug.

Note that we need to keep unstripped version of relocatable package for rpm/deb,
otherwise rpmbuild/debuild fails to create debug symbol package.
This version is renamed to scylla-unstripped-$version-$release.$arch.tar.gz.

See #8918

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9005
2022-10-13 15:11:32 +02:00
Asias He
6134fe4d1f storage_service: Prevent removed node to rejoin in handle_state_normal
- Start n1, n2, n3 (127.0.0.3)
- Stop n3
- Change ip address of n3 to 127.0.0.33 and restart n3
- Decommission n3
- Start new node n4

The node n4 will learn from the gossip entry for 127.0.0.3 that node
127.0.0.3 is in shutdown status which means 127.0.0.3 is still part of
the ring.

This patch prevents this by checking the status for the host id on all
the entries. If any of the entries shows the node with the host id is in
LEFT status, reject to put the node in NORMAL status.

Fixes #11355

Closes #11361
2022-10-13 15:11:32 +02:00
Jan Ciolek
52bbc1065c cql3: allow lists of IN elements to be NULL
Requests like `col IN NULL` used to cause
an error - Invalid null value for colum col.

We would like to allow NULLs everywhere.
When a NULL occurs on either side
of a binary operator, the whole operation
should just evaluate to NULL.

Signed-off-by: Jan Ciolek <jan.ciolek@scylladb.com>

Closes #11775
2022-10-13 15:11:32 +02:00
Avi Kivity
19e62d4704 commitlog: delete unused "num_deleted" variable
Since d478896d46 we update the variable, but never read it.
Clang 15 notices and complains. Remove the variable to make it
happy.

Closes #11765
2022-10-13 15:11:32 +02:00
Raphael S. Carvalho
ec79ac46c9 db/view: Add visibility to view updating of Staging SSTables
Today, we're completely blind about the progress of view updating
on Staging files. We don't know how long it will take, nor how much
progress we've made.

This patch adds visibility with a new metric that will inform
the number of bytes to be processed from Staging files.
Before any work is done, the metric tell us the total size to be
processed. As view updating progresses, the metric value is
expected to decrease, unless work is being produced faster than
we can consume them.

We're piggybacking on sstables::read_monitor, which allows the
progress metric to be updated whenever the SSTable reader makes
progress.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #11751
2022-10-12 16:57:37 +03:00
Avi Kivity
2e79bb431c tools: change source_location location
std::experimental::source_location is provided by <experimental/source_location>,
not <source_location>. libstdc++ 12 insists, so change the header.

Closes #11766
2022-10-12 15:29:14 +03:00
Takuya ASADA
6b246dc119 locator::ec2_snitch: Retry HTTP request to EC2 instance metadata service
EC2 instance metadata service can be busy, ret's retry to connect with
interval, just like we do in scylla-machine-image.

Fixes #10250

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #11688
2022-10-12 13:59:06 +03:00
Kamil Braun
92dd1f7307 service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr
`timestamped_entry` had two fields:

```
optional<clock_time_point> _last_accessed
expiring_entry_ptr* _lru_entry
```

The `raft_address_map` data structure maintained an invariant:
`_last_accessed` is set if and only if `_lru_entry` is not null.
This invariant could be broken for a while when constructing an expiring
`timestamped_entry`: the constructor was given an `expiring = true`
flag, which set the `_last_accessed` field; this was redundant, because
immediately after a corresponding `expiring_entry_ptr` was constructed
which again reset the `_last_accessed` field and set `_lru_entry`.

The code becomes simpler and shorter when we move `_last_accessed` field
into `expiring_entry_ptr`. The invariant is now guaranteed by the type
system: `_last_accessed` is no longer `optional`.
2022-10-12 12:22:57 +02:00
Kamil Braun
262b9473d5 service/raft: raft_address_map: don't use intrusive set for timestamped entries
Intrusive data structures are harder to reason about. In
`raft_address_map` there's a good reason to use an intrusive list for
storing `expiring_entry_ptr`s: we move the entries around in the list
(when their expiration times change) but we want for the objects to stay
in place because `timestamped_entry`s may point to them (although we
could simply update the pointers using the existing back-reference...)

However, there's not much reason to store `timestamped_entry` in an
intrusive set. It was basically used in one place: when dropping expired
entries, we iterate over the list of `expiring_entry_ptr`s and we want
to drop the corresponding `timestamped_entry` as well, which is easy
when we have a pointer to the entry and it's a member of an intrusive
container. But we can deal with it when using non-intrusive containers:
just `find` the element in the container to erase it.

The code becomes shorter with this change.

I also use a map instead of a set because we need to modify the
`timestamped_entry` which wouldn't be possible if it was used as an
`unordered_set` key. In fact using map here makes more sense: we were
using the intrusive set similarly to a map anyway because all lookups
were performed using the `_id` field of `timestamped_entry` (now the
field was moved outside the struct, it's used as the map's key).
2022-10-12 12:22:50 +02:00
Kamil Braun
3e84b1f69c Merge 'test.py: topology fix ssl var and improve pylint score' from Alecco
When code was moved to the new directory, a bug was reintroduced with `ssl` local hiding `ssl` module. Fix again.

Closes #11755

* github.com:scylladb/scylladb:
  test.py: improve pylint score for conftest
  test.py: fix variable name collision with ssl
2022-10-12 11:41:11 +02:00
Avi Kivity
f673d0abbe build: support fmt 9 ostream formatter deprecation
fmt 9 deprecates automatic fallback to std::ostream formatting.
We should migrate, but in order to do so incrementally, first enable
the deprecated fallback so the code continues to compile.

Closes #11768
2022-10-12 09:27:36 +03:00
Avi Kivity
0952cecfc9 build: mark abseil as a system header
Abseil is not under our control, so if a header generates a
warning, we can do nothing about it. So far this wasn't a problem,
but under clang 15 it spews a harmless deprecation warning. Silence
the warning by treating the header as a system header (which it is,
for us).

Closes #11767
2022-10-12 09:27:36 +03:00
Kamil Braun
0c13c85752 service/raft: raft_address_map: store reference to timestamped_entry in expiring_entry_ptr
The class was storing a pointer which couldn't be null.
A reference is a better fit in this case.
2022-10-11 17:21:01 +02:00
Asias He
810b424a8c storage_service: Reject to bootstrap new node when node has unknown gossip status
- Start a cluster with n1, n2, n3
- Full cluster shutdown n1, n2, n3
- Start n1, n2 and keep n3 as shutdown
- Add n4

Node n4 will learn the ip and uuid of n3 but it does not know the gossip
status of n3 since gossip status is published only by the node itself.
After full cluster shutdown, gossip status of n3 will not be present
until n3 is restarted again. So n4 will not think n3 is part of the
ring.

In this case, it is better to reject the bootstrap.

With this patch, one would see the following when adding n4:

```
ERROR 2022-09-01 13:53:14,480 [shard 0] init - Startup failed:
std::runtime_error (Node 127.0.0.3 has gossip status=UNKNOWN. Try fixing it
before adding new node to the cluster.)
```

The user needs to perform either of the following before adding a new node:

1) Run nodetool removenode to remove n3
2) Restart n3 to get it back to the cluster

Fixes #6088

Closes #11425
2022-10-11 15:47:34 +03:00
Botond Dénes
378c6aeebd Merge 'More Raft upgrade tests' from Kamil Braun
Refactor the existing upgrade tests, extracting some common functionality to
helper functions.

Add more tests. They are checking the upgrade procedure and recovery from
failure in scenarios like when a node fails causing the procedure to get stuck
or when we lose a majority in a fully upgraded cluster.

Add some new functionalities to `ScyllaRESTAPIClient` like injecting errors and
obtaining gossip generation numbers.

Extend the removenode function to allow ignoring dead nodes.

Improve checking for CQL availability when starting nodes to speed up testing.

Closes #11725

* github.com:scylladb/scylladb:
  test/topology_raft_disabled: more Raft upgrade tests
  test/topology_raft_disabled: refactor `test_raft_upgrade`
  test/pylib: scylla_cluster: pass a list of ignored nodes to removenode
  test/pylib: rest_client: propagate errors from put_json
  test/pylib: fix some type hints
  test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up
2022-10-11 15:30:00 +03:00
Kamil Braun
08e654abf5 Merge 'raft: (service) cleanups on the path for dynamic IP address support' from Konstantin Osipov
In preparation for supporting IP address changes of Raft Group 0:
1) Always use start_server_for_group0() to start a server for group 0.
   This will provide a single extension point when it's necessary to
   prompt raft_address_map with gossip data.
2) Don't use raft::server_address in discovery, since going forward
   discovery won't store raft::server_address. On the same token stop
   using discovery::peer_set anywhere outside discovery (for persistence),
   use a peer_list instead, which is easier to marshal.

Closes #11676

* github.com:scylladb/scylladb:
  raft: (discovery) do not use raft::server_address to carry IP data
  raft: (group0) API refactoring to avoid raft::server_address
  raft: rename group0_upgrade.hh to group0_fwd.hh
  raft: (group0) move the code around
  raft: (discovery) persist a list of discovered peers, not a set
  raft: (group0) always start group0 using start_server_for_group0()
2022-10-11 13:43:41 +02:00
Asias He
58c65954b8 storage_service: Reject decommission if nodes are down
- Start n1, n2, n3

- Apply network nemesis as below:
  + Block gossip traffic going from nodes 1 and 2 to node 3.
  + All the other rpc traffic flows normally, including gossip traffic
    from node 3 to nodes 1 and 2 and responses to node_ops commands from
    nodes 1 and 2 to node 3.

- Decommission n3

Currently, the decommission will be successful because all the network
traffic is ok. But n3 could not advertise status STATUS_LEFT to the rest
of the cluster due to the network nemesis applied. As a result, n1 and
n3 could not move the n3 from STATUS_LEAVING to STATUS_LEFT, so n3 will
stay in DL forever.

I know why the node stays DL forever. The problem is that with
node_ops_cmd based node operation, we still rely on the gossip status of
STATUS_LEFT from the node being decommissioned to notify other nodes
this node has finished decommission and can be moved from STATUS_LEAVING
to STATUS_LEFT.

This patch fixes by checking gossip liveness before running
decommission. Reject if required peer nodes are down.

With the fix, the decommission of n3 will fail like this:

$ nodetool decommission -p 7300
nodetool: Scylla API server HTTP POST to URL
'/storage_service/decommission' failed: std::runtime_error
(decommission[adb3950e-a937-4424-9bc9-6a75d880f23d]: Rejected
decommission operation, removing node=127.0.0.3, sync_nodes=[127.0.0.2,
127.0.0.3, 127.0.0.1], ignore_nodes=[], nodes_down={127.0.0.1})

Fixes #11302

Closes #11362
2022-10-11 14:09:28 +03:00
Botond Dénes
917fdb9e53 Merge "Cut database-system_keyspace circular dependency" from Pavel Emelyanov
"
There's one via the database's compaction manager and large data handler
sub-services. Both need system keyspace to put their info into, but the
latter needs database naturally via query_processor->storage_proxy link.

The solution is to make c.m. | l.d.h. -> sys.ks. dependency be weak with
the help of shared_from_this(), described in details in patch #2 commit
message.

As a (not-that-)side effect this set removes a bunch of global qctx
calls.

refs: #11684 (this set seem to increase the chance of stepping on it)
"

* 'br-sysks-async-users' of https://github.com/xemul/scylla:
  large_data_handler: Use local system_keyspace to update entries
  system_keyspace: De-static compaction history update
  compaction_manager: Relax history paths
  database: Plug/unplug system_keyspace
  system_keyspace: Add .shutdown() method
2022-10-11 08:52:04 +03:00
Nadav Har'El
ef0da14d6f test/cql-pytest: add simple tests for USE statement
This patch adds a couple of simple tests for the USE statement: that
without USE one cannot create a table without explicitly specifying
a keyspace name, and with USE, it is possible.

Beyond testing these specific feature, this patch also serves as an
example of how to write more tests that need to control the effective USE
setting. Specifically, it adds a "new_cql" function that can be used to
create a new connection with a fresh USE setting. This is necessary
in such tests, because if multiple tests use the same cql fixture
and its single connection, they will share their USE setting and there
is no way to undo or reset it after being set.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #11741
2022-10-11 08:20:19 +03:00
Kamil Braun
df2fb21972 test/topology: reenable test_remove_node_add_column
After #11691 was merged the test should no longer be flaky. Reenable it.

Closes #11754
2022-10-11 08:18:20 +03:00
Konstantin Osipov
3e46c32d7b raft: (discovery) do not use raft::server_address to carry IP data
We plan to remove IP information from Raft addresses.
raft::server_address is used in Raft configuration and
also in discovery, which is a separate algorithm, as a handy data
structure, to avoid having new entities in RPC.

Since we plan to remove IP addresses from Raft configuration,
using raft::server_address in discovery and still storing
IPs in it would create ambiguity: in some uses raft::server_address
would store an IP, and in others - would not.

So switch to an own data structure for the purposes of discovery,
discovery_peer, which contains a pair ip, raft server id.

Note to reviewers: ideally we should switch to URIs
in discovery_peer right away. Otherwise we may have to
deal with incompatible changes in discovery when adding URI
support to Scylla.
2022-10-10 16:24:33 +03:00
Pavel Emelyanov
b1f4273f0d large_data_handler: Use local system_keyspace to update entries
The l._d._h.'s way to update system keyspace is not like in other code.
Instead of a dedicated helper on the system_keyspace's side it executes
the insertion query directly with the help of qctx.

Now when the l._d._h. has the weak system keyspace reference it can
execute queries on _it_ rather than on the qctx.

Just like in previous patch, it needs to keep the sys._k.s. weak
reference alive until the query's future resolves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
907fd2d355 system_keyspace: De-static compaction history update
Compaction manager now has the weak reference on the system keyspace
object and can use it to update its stats. It only needs to take care
and keep the shared pointer until the respective future resolves.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
3e0b61d707 compaction_manager: Relax history paths
There's a virtual method on table_state to update the entry in system
keyspace. It's an overkill to facilitate tests that don't want this.
With new system_keyspace weak referencing it can be made simpled by
moving the updating call to the compaction_manager itself.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Pavel Emelyanov
f9b57df471 database: Plug/unplug system_keyspace
There's a circular dependency between system_keyspace and database. The
former needs the latter because it needs to execula local requests via
query_processor. The latter needs the former via compaction manager and
large data handler, database depends on both and these too need to
insert their entries into system keyspace.

To cut this loop the compaction manager and large data handler both get
a weak reference on the system keysace. Once system keyspace starts is
activcates this reference via the database call. When system keyspace is
shutdown-ed on stop, it deactivates the reference.

Technically the weak reference is implemented by marking the system_k.s.
object as async_sharded_service, and the "reference" in question is the
shared_from_this() pointer. When compaction manager or large data
handler need to update a system keyspace's table, they both hold an
extra reference on the system keyspace until the entry is committed,
thus making sure that sys._k.s. doesn't stop from under their feet. At
the same time, unplugging the reference on shutdown makes sure that no
new entries update will appear and the system_k.s. will eventually be
released.

It's not a C++ classical reference, because system_keyspace starts after
and stops before database.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 16:20:59 +03:00
Konstantin Osipov
8857e017c7 raft: (group0) API refactoring to avoid raft::server_address
Replace raft::server_address in a few raft_group0 API
calls with raft::server_id.

These API calls do not need raft::server_address, i.e. the
address part, anyway, and since going forward raft::server_address
will not contain the IP address, stop using it in these calls.

This is a beginning of a multi-patch series to reduce
raft::server_address usage to core raft only.
2022-10-10 15:58:48 +03:00
Konstantin Osipov
224dd9ce1e raft: rename group0_upgrade.hh to group0_fwd.hh
The plan is to add other group-0-related forward declarations
to this file, not just the ones for upgrade.
2022-10-10 15:58:48 +03:00
Konstantin Osipov
e226624daf raft: (group0) move the code around
Move load/store functions for discovered peers up,
since going forward they'll be used to in start_server_for_group0(),
to extend the address map prior to start (and thus speed up
bootstrap).
2022-10-10 15:58:48 +03:00
Konstantin Osipov
199b6d6705 raft: (discovery) persist a list of discovered peers, not a set
We plan to reuse the discovery table to store the peers
after discovery is over, so load/store API must be generalized
to use outside discovery. This includes sending
the list of persisted peers over to a new member of the cluster.
2022-10-10 15:58:48 +03:00
Konstantin Osipov
746322b740 raft: (group0) always start group0 using start_server_for_group0()
When IP addresses are removed from raft::configuration, it's key
to initialize raft_address_map with IP addresses before we start group
0. Best place to put this initialization is start_server_for_group0(),
so make sure all paths which create group 0 use
start_server_for_group0().
2022-10-10 15:58:48 +03:00
Kamil Braun
4974a31510 test/topology_raft_disabled: more Raft upgrade tests
The tests are checking the upgrade procedure and recovery from failure
in scenarios like when a node fails causing the procedure to get stuck
or when we lose a majority in a fully upgraded cluster.

Added some new functionalities to `ScyllaRESTAPIClient` like injecting
errors and obtaining gossip generation numbers.
2022-10-10 14:32:10 +02:00
Pavel Emelyanov
caed12c8f2 system_keyspace: Add .shutdown() method
Many services out there have one (sometimes called .drain()) that's
called early on stop and that's responsible for prearing the service for
stop -- aborting pending/in-flight fibers and alike.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-10 15:29:33 +03:00
Kamil Braun
4460b4e63c test/topology_raft_disabled: refactor test_raft_upgrade
Take reusable parts out of the test to helper functions.
2022-10-10 12:59:12 +02:00
Kamil Braun
fa8dcb0d54 test/pylib: scylla_cluster: pass a list of ignored nodes to removenode
The `removenode` operation normally requires the removing node to
contact every node in the cluster except the one that is being removed.
But if more than 1 node is down it's possible to specify a list of nodes
to ignore for the operation; the `/storage_service/remove_node` endpoint
accepts an `ignore_nodes` param which is a comma-separated list of IPs.

Extend `ScyllaRESTAPIClient`, `ScyllaClusterManager` and `ManagerClient`
so it's possible to pass the list of ignored nodes.

We also modify the `/cluster/remove-node` Manager endpoint to use
`put_json` instead of `get_text` and pass all parameters except the
initiator IP (the IP of the node who coordinates the `removenode`
operation) through JSON. This simplifies the URL greatly (it was already
messy with 3 parameters) and more closely resembles Scylla's endpoint.
2022-10-10 12:59:12 +02:00
Kamil Braun
130ab1d312 test/pylib: rest_client: propagate errors from put_json 2022-10-10 12:59:12 +02:00
Kamil Braun
63892326d5 test/pylib: fix some type hints 2022-10-10 12:59:12 +02:00
Kamil Braun
6e3fe13fcf test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up
Do a simple `SELECT` instead.

This speeds up tests - creating and dropping keyspaces is relatively
expensive, and we did this on every server restart.
2022-10-10 12:59:12 +02:00
Alejo Sanchez
7e2a3f2040 test.py: improve pylint score for conftest
Remove unused imports, fix long lines, add ignore flags.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-10 12:07:41 +02:00
Alejo Sanchez
aa1f4a321c test.py: fix variable name collision with ssl
Change variable name to avoid collision with module ssl.

This bug was reintroduced when moving code.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2022-10-10 11:59:13 +02:00
Tomasz Grabiec
fcf0628bc5 dbuild: Use .gdbinit from the host
Useful when starting gdb inside the dbuild container.
Message-Id: <20221007154230.1936584-1-tgrabiec@scylladb.com>
2022-10-09 11:14:33 +03:00
Petr Gusev
0923cb435f raft: mark removed servers as expiring instead of dropping them
There is a flaw in how the raft rpc endpoints are
currently managed. The io_fiber in raft::server
is supposed to first add new servers to rpc, then
send all the messages and then remove the servers
which have been excluded from the configuration.
The problem is that the send_messages function
isn't synchronous, it schedules send_append_entries
to run after all the current requests to the
target server, which can happen
after we have already removed the server from address_map.

In this patch the remove_server function is changed to mark
the server_id as expiring rather than synchronously dropping it.
This means all currently scheduled requests to
that server will still be able to resolve
the ip address for that server_id.

Fixes: #11228

Closes #11748
2022-10-07 19:08:34 +02:00
Kamil Braun
06b87869ba Merge 'Raft transport error' from Gusev Petr
The `add_entry` and `modify_config` methods sometimes do an rpc to
execute the request on the current leader. If the tcp connection
was broken, a `seastar::rpc::closed_error` would be thrown to the client.
This exception was not documented in the method comments and the
client could have missed handling it. For example, this exception
was not handled when calling `modify_config` in `raft_group0`,
which sometimes broke the `removenode` command.

An `intermittent_connection_error` exception was added earlier to
solve a similar problem with the `read_barrier` method. In this patch it
is renamed to `transport_error`, as it seems to better describe the
situation, and an explicit specification for this exception
was added - the rpc implementation can throw it if it is not known
whether the call reached the destination and whether any mutations were made.

In case of `read_barrier` it does not matter and we just retry, in case
of `add_entry` and `modify_config` we cannot retry because of possible mutations,
so we convert this exception to `commit_status_unknown`, which
the client has to handle.

Explicit comments have also been added to `raft::server` methods
describing all possible exceptions.

Closes #11691

* github.com:scylladb/scylladb:
  raft_group0: retry modify_config on commit_status_unknown
  raft: convert raft::transport_error to raft::commit_status_unknown
2022-10-07 15:53:22 +02:00
Petr Gusev
12bb8b7c8d raft_group0: retry modify_config on commit_status_unknown
modify_config can throw commit_status_unknown in case
of a leader change or when the leader is unavailable,
but the information about it has not yet reached the
current node. In this patch modify_config is run
again after some time in this case.
2022-10-07 13:34:23 +04:00
Petr Gusev
d79fbab682 raft: convert raft::transport_error to raft::commit_status_unknown
The add_entry and modify_config methods sometimes do an rpc to
execute the request on the current leader. If the tcp connection
was broken, a seastar::rpc::closed_error would be thrown to the client.
This exception was not documented in the method comments and the
client could have missed handling it. For example, this exception
was not handled when calling modify_config in raft_group0,
which sometimes broke the removenode command.

An intermittent_connection_error exception was added earlier to
solve a similar problem with the read_barrier method. In this patch it
is renamed to transport_error, as it seems to better describe the
situation, and an explicit specification for this exception
was added - the rpc implementation can throw it if it is not known
whether the call reached the target node and whether any
actions were performed on it.

In case of read_barrier it does not matter and we just retry. In case
of add_entry and modify_config we cannot retry
because the rpc calls are not idempotent, so we convert this
exception to commit_status_unknown, which the client has to handle.

Explicit comments have also been added to raft::server methods
describing all possible exceptions.
2022-10-07 13:34:16 +04:00