scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 19:35:12 +00:00

Author	SHA1	Message	Date
Kamil Braun	cbdcc944b5	service/raft: specialized verb for failure detector pinger We used GOSSIP_ECHO verb to perform failure detection. Now we use a special verb DIRECT_FD_PING introduced for this purpose. There are multiple reasons to do so. One minor reason: we want to use the same connection as other Raft verbs: if we can't deliver Raft append_entries or vote messages somewhere, that endpoint should be marked dead; if we can, the endpoint should be marked alive. So putting pings on the same connection as the other Raft verbs is important when dealing with weird situations where some connections are available but others are not. Observe that in `do_get_rpc_client_idx`, we put the new verb in the right place. Another minor reason: we remove the awkward gossiper `echo_pinger` abstraction which required storing and updating gossiper generation numbers. This also removes one dependency from Raft service code to gossiper. Major reason 1: the gossip echo handler has a weird mechanism where a replacing node returns errors during the replace operation to some of the nodes. In Raft however, we want to mark servers as alive when they are alive, including a server running on a node that's replacing another node. Major reason 2, related to the previous one: when server B is replacing server A with the same IP, the failure detector will try to ping both servers. Both servers are mapped to the same IP by the address map, so pings to both servers will reach server B. We want server B to respond to the pings destined for server B, but not to pings destined for server A, so the sender can mark B alive but keep A marked dead. To do this, we include the destination's Raft ID in our RPCs. The destination compares the received ID with its own. If it's different, it returns a `wrong_destination` response, and the failure detector knows that the ping did not reach the destination (it reached someone else). Yet another reason: removes "Not ready to respond gossip echo message" log spam during replace.	2022-12-01 20:54:18 +01:00
Kamil Braun	02c64becdc	db: system_keyspace: de-staticize `{get,set}_raft_server_id` Part of the anti-globals war.	2022-12-01 20:54:18 +01:00
Kamil Braun	99fe580068	service/raft: make this node's Raft ID available early in group registry Raft ID was loaded or created late in the boot procedure, in `storage_service::join_token_ring`. Create it earlier, as soon as it's possible (when `system_keyspace` is started), pass it to `raft_group_registry::start` and store it inside `raft_group_registry`. We will use this Raft ID stored in group registry in following patches. Also this reduces the number of disk accesses for this node's Raft ID. It's now loaded from disk once, stored in `raft_group_registry`, then obtained from there when needed. This moves `raft_group_registry::start` a bit later in the startup procedure - after `system_keyspace` is started - but it doesn't make a difference.	2022-12-01 20:54:18 +01:00
Kamil Braun	0f9d0dd86e	Merge 'raft: support IP address change' from Konstantin Osipov This is the core of dynamic IP address support in Raft, moving out the IP address sourcing from Raft Group 0 configuration to gossip. At start of Raft, the raft id <> IP address translation map is tuned into the gossiper notifications and learns IP addresses of Raft hosts from them. The series intentionally doesn't contain the part which speeds up the initial cluster assembly by persisting the translation cache and using more sources besides gossip (discovery, RPC) to show correctness of the approach. Closes #12035 * github.com:scylladb/scylladb: raft: (rpc) do not throw in case of a missing IP address in RPC raft: (address map) actively maintain ip <-> raft server id map	2022-11-30 15:40:18 +01:00
Botond Dénes	50aea9884b	Merge 'Improve the Raft upgrade procedure' from Kamil Braun Better logging, less code, a minor fix. Closes #12135 * github.com:scylladb/scylladb: service/raft: raft_group0: less repetitive logging calls service/raft: raft_group0: fix sleep_with_exponential_backoff	2022-11-30 11:24:20 +02:00
Avi Kivity	6a5d9ff261	treewide: use non-experimental std::source_location Now that we use libstdc++ 12, we can use the standardized source_location. Closes #12137	2022-11-30 11:06:43 +02:00
Konstantin Osipov	fbe7886cc0	raft: (rpc) do not throw in case of a missing IP address in RPC Remove raft_address_map::get_inet_address() While at it, coroutinize some rpc mehtods. To propagate up the event of missing IP address, use coroutine::exception( with a proper type (raft::transport_error) and a proper error message. This is a building block from removing raft_address_map::get_inet_address() which is too generic, and shifting the responsibility of handling missing addresses to the address map clients. E.g. one-way RPC shouldn't throw if an address is missing, but just drop the message. PS An attempt to use a single template function rendered to be too complex: - some functions require a gate, some don't - some return void, some future<> and some future<raft::data_type>	2022-11-29 19:55:48 +03:00
Konstantin Osipov	73e5298273	raft: (address map) actively maintain ip <-> raft server id map 1) make address map API flexible Before this patch: - having a mapping without an actual IP address was an internal error - not having a mapping for an IP address was an internal error - re-mapping to a new IP address wasn't allowed After this patch: - the address map may contain a mapping without an actual IP address, and the caller must be prepared for it: find() will return a nullopt. This happens when we first add an entry to Raft configuration and only later learn its IP address, e.g. via gossip. - it is allowed to re-map an existing entry to a new address; 2) subscribe to gossip notifications Learning IP addresses from gossip allows us to adjust the address map whenever a node IP address changes. Gossiper is also the only valid source of re-mapping, other sources (RPC) should not re-map, since otherwise a packet from a removed server can remap the id to a wrong address and impact liveness of a Raft cluster. 3) prompt address map state with app state Initialize the raft address map with initial gossip application state, specifically IPs of members of the cluster. With this, we no longer need to store these IPs in Raft configuration (and update them when they change). The obvious drawback of this approach is that a node may join Raft config before it propagates its IP address to the cluster via gossip - so the boot process has to wait until it happens. Gossip also doesn't tell us which IPs are members of Raft configuration, so we subscribe to Group0 configuration changes to mark the members of Raft config "non-expiring" in the address translation map. Thanks to the changes above, Raft configuration no longer stores IP addresses. We still keep the 'server_info' column in the raft_config system table, in case we change our mind or decide to store something else in there.	2022-11-29 19:55:43 +03:00
Kamil Braun	3dbcff435f	service/raft: raft_group0: less repetitive logging calls Some log messages in retry loops in the Raft upgrade procedure included a sentence like "sleeping before retrying..."; but not all of them. With the recently added `sleep_with_exponential_backoff` abstraction we can put this "sleeping..." message in a single place, and it's also easy to say how long we're going to sleep. I also enjoy using this `source_location` thing.	2022-11-29 17:42:43 +01:00
Kamil Braun	580bdec875	service/raft: raft_group0: fix sleep_with_exponential_backoff It was immediately jumping to _max_retry_period.	2022-11-29 16:27:59 +01:00
Konstantin Osipov	262566216b	raft: persist the initial raft address map	2022-11-17 14:26:36 +03:00
Konstantin Osipov	b35af73fdf	raft: (upgrade) do not use IP addresses from Raft config Always use raft address map to obtain the IP addresses of upgrade peers. Right now the map is populated from Raft configuration, so it's an equivalent transformation, but in the future raft address map will be populated from other sources: discovery and gossip, hence the logic of upgrade will change as well. Do not proceed with the upgrade if an address is missing from the map, since it means we failed to contact a raft member.	2022-11-17 14:26:31 +03:00
Konstantin Osipov	051dceeaff	raft: (and gossip) begin gossiping raft server ids We plan to use gossip data to educate Raft RPC about IP addresses of raft peers. Add raft server ids to application state, so that when we get a notification about a gossip peer we can identify which raft server id this notification is for, specifically, we can find what IP address stands for this server id, and, whenever the IP address changes, we can update Raft address map with the new address. On the same token, at boot time, we now have to start Gossip before Raft, since Raft won't be able to send any messages without gossip data about IP addresses.	2022-11-17 12:07:31 +03:00
Konstantin Osipov	990c7a209f	raft: change the API of conf change notifications Pass a change diff into the notification callback, rather than add or remove servers one by one, so that if we need to persist the state, we can do it once per configuration change, not for every added or removed server. For now still pass added and removed entries in two separate calls per a single configuration change. This is done mainly to fulfill the library contract that it never sends messages to servers outside the current configuration. The group0 RPC implementation doesn't need the two calls, since it simply marks the removed servers as expired: they are not removed immediately anyway, and messages can still be delivered to them. However, there may be test/mock implementations of RPC which could benefit from this contract, so we decided to keep it.	2022-11-17 12:07:31 +03:00
Gleb Natapov' via ScyllaDB development	2100a8f4ca	service: raft: demote configuration change error to warning since it is retried anyway Message-Id: <Y2ohbFtljmd5MNw0@scylladb.com>	2022-11-09 00:09:39 +01:00
Kamil Braun	e086521c1a	direct_failure_detector: get rid of complex `endpoint_id` translations The direct failure detector operates on abstract `endpoint_id`s for pinging. The `pigner` interface is responsible for translating these IDs to 'real' addresses. Earlier we used two types of addresses: IP addresses in 'production' code (`gms::gossiper::direct_fd_pinger`) and `raft::server_id`s in test code (in `randomized_nemesis_test`). For each of these use cases we would maintain mappings between `endpoint_id`s and the address type. In recent commits we switched the 'production' code to also operate on Raft server IDs, which are UUIDs underneath. In this commit we switch `endpoint_id`s from `unsigned` type to `utils::UUID`. Because each use case operates in Raft server IDs, we can perform a simple translation: `raft_id.uuid()` to get an `endpoint_id` from a Raft ID, `raft::server_id{ep_id}` to obtain a Raft ID from an `endpoint_id`. We no longer have to maintain complex sharded data structures to store the mappings.	2022-11-04 09:38:08 +01:00
Kamil Braun	bdeef77f20	service/raft: ping `raft::server_id`s, not `gms::inet_address`es Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found.	2022-11-04 09:38:08 +01:00
Kamil Braun	ac70a05c7e	service/raft: store `raft_address_map` reference in `direct_fd_pinger` The pinger will use the map to translate `raft::server_id`s to `gms::inet_address`es when pinging.	2022-11-04 09:38:08 +01:00
Kamil Braun	2c20f2ab9d	gms: gossiper: move `direct_fd_pinger` out to a separate service In later commit `direct_fd_pinger` will operate in terms of `raft::server_id`s. Decouple it from `gossiper` since we don't want to entangle `gossiper` with Raft-specific stuff.	2022-11-04 09:38:08 +01:00
Kamil Braun	7d84007fd5	service/raft: raft_address_map: replicate non-expiring entries to other shards Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong.	2022-10-31 09:17:12 +01:00
Kamil Braun	acacbad465	service/raft: raft_address_map: assert when entry is missing in drop_expired_entries	2022-10-31 09:17:12 +01:00
Kamil Braun	159bb32309	service/raft: turn raft_address_map into a service	2022-10-31 09:17:10 +01:00
Tomasz Grabiec	ee2398960c	Merge 'service/raft: simplify `raft_address_map`' from Kamil Braun The `raft_address_map` code was "clever": it used two intrusive data structures and did a lot of manual lifetime management; raw pointer manipulation, manual deletion of objects... It wasn't clear who owns which object, who is responsible for deleting what. And there was a lot of code. In this PR we replace one of the intrusive data structures with a good old `std::unordered_map` and make ownership clear by replacing the raw pointers with `std::unique_ptr`. Furthermore, some invariants which were not clear and enforced in runtime are now encoded in the type system. The code also became shorter: we reduced its length from ~360 LOC to ~260 LOC. Closes #11763 * github.com:scylladb/scylladb: service/raft: raft_address_map: get rid of `is_linked` checks service/raft: raft_address_map: get rid of `to_list_iterator` service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr` service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr service/raft: raft_address_map: don't use intrusive set for timestamped entries service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr`	2022-10-13 18:08:49 +02:00
Kamil Braun	5a9371bcb0	service/raft: raft_address_map: get rid of `is_linked` checks Being linked is an invariant of `expiring_entry_ptr`. Make it explicit by moving the `_expiring_list.push_front` call into the constructor.	2022-10-13 15:17:07 +02:00
Kamil Braun	cdf3367c05	service/raft: raft_address_map: get rid of `to_list_iterator` Unnecessary.	2022-10-13 15:17:06 +02:00
Kamil Braun	0e29495c38	service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr` The owner of `expiring_entry_ptr` was almost uniquely its corresponding `timestamp_entry`; it would delete the expiring entry when it itself got destroyed. There was one call to explicit `unlink_and_dispose`, which made the picture unclear. Make the picture clear: `timestamped_entry` now contains a `unique_ptr` to its `expiring_entry_ptr`. The `unlink_and_dispose` was replaced with `_lru_entry = nullptr`. We can also get rid of the back-reference from `expiring_entry_ptr` to `timestamped_entry`. The code becomes shorter and simpler.	2022-10-13 15:16:40 +02:00
Kamil Braun	92dd1f7307	service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr `timestamped_entry` had two fields: ``` optional<clock_time_point> _last_accessed expiring_entry_ptr* _lru_entry ``` The `raft_address_map` data structure maintained an invariant: `_last_accessed` is set if and only if `_lru_entry` is not null. This invariant could be broken for a while when constructing an expiring `timestamped_entry`: the constructor was given an `expiring = true` flag, which set the `_last_accessed` field; this was redundant, because immediately after a corresponding `expiring_entry_ptr` was constructed which again reset the `_last_accessed` field and set `_lru_entry`. The code becomes simpler and shorter when we move `_last_accessed` field into `expiring_entry_ptr`. The invariant is now guaranteed by the type system: `_last_accessed` is no longer `optional`.	2022-10-12 12:22:57 +02:00
Kamil Braun	262b9473d5	service/raft: raft_address_map: don't use intrusive set for timestamped entries Intrusive data structures are harder to reason about. In `raft_address_map` there's a good reason to use an intrusive list for storing `expiring_entry_ptr`s: we move the entries around in the list (when their expiration times change) but we want for the objects to stay in place because `timestamped_entry`s may point to them (although we could simply update the pointers using the existing back-reference...) However, there's not much reason to store `timestamped_entry` in an intrusive set. It was basically used in one place: when dropping expired entries, we iterate over the list of `expiring_entry_ptr`s and we want to drop the corresponding `timestamped_entry` as well, which is easy when we have a pointer to the entry and it's a member of an intrusive container. But we can deal with it when using non-intrusive containers: just `find` the element in the container to erase it. The code becomes shorter with this change. I also use a map instead of a set because we need to modify the `timestamped_entry` which wouldn't be possible if it was used as an `unordered_set` key. In fact using map here makes more sense: we were using the intrusive set similarly to a map anyway because all lookups were performed using the `_id` field of `timestamped_entry` (now the field was moved outside the struct, it's used as the map's key).	2022-10-12 12:22:50 +02:00
Kamil Braun	0c13c85752	service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr` The class was storing a pointer which couldn't be null. A reference is a better fit in this case.	2022-10-11 17:21:01 +02:00
Botond Dénes	378c6aeebd	Merge 'More Raft upgrade tests' from Kamil Braun Refactor the existing upgrade tests, extracting some common functionality to helper functions. Add more tests. They are checking the upgrade procedure and recovery from failure in scenarios like when a node fails causing the procedure to get stuck or when we lose a majority in a fully upgraded cluster. Add some new functionalities to `ScyllaRESTAPIClient` like injecting errors and obtaining gossip generation numbers. Extend the removenode function to allow ignoring dead nodes. Improve checking for CQL availability when starting nodes to speed up testing. Closes #11725 * github.com:scylladb/scylladb: test/topology_raft_disabled: more Raft upgrade tests test/topology_raft_disabled: refactor `test_raft_upgrade` test/pylib: scylla_cluster: pass a list of ignored nodes to removenode test/pylib: rest_client: propagate errors from put_json test/pylib: fix some type hints test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up	2022-10-11 15:30:00 +03:00
Konstantin Osipov	3e46c32d7b	raft: (discovery) do not use raft::server_address to carry IP data We plan to remove IP information from Raft addresses. raft::server_address is used in Raft configuration and also in discovery, which is a separate algorithm, as a handy data structure, to avoid having new entities in RPC. Since we plan to remove IP addresses from Raft configuration, using raft::server_address in discovery and still storing IPs in it would create ambiguity: in some uses raft::server_address would store an IP, and in others - would not. So switch to an own data structure for the purposes of discovery, discovery_peer, which contains a pair ip, raft server id. Note to reviewers: ideally we should switch to URIs in discovery_peer right away. Otherwise we may have to deal with incompatible changes in discovery when adding URI support to Scylla.	2022-10-10 16:24:33 +03:00
Konstantin Osipov	8857e017c7	raft: (group0) API refactoring to avoid raft::server_address Replace raft::server_address in a few raft_group0 API calls with raft::server_id. These API calls do not need raft::server_address, i.e. the address part, anyway, and since going forward raft::server_address will not contain the IP address, stop using it in these calls. This is a beginning of a multi-patch series to reduce raft::server_address usage to core raft only.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	224dd9ce1e	raft: rename group0_upgrade.hh to group0_fwd.hh The plan is to add other group-0-related forward declarations to this file, not just the ones for upgrade.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	e226624daf	raft: (group0) move the code around Move load/store functions for discovered peers up, since going forward they'll be used to in start_server_for_group0(), to extend the address map prior to start (and thus speed up bootstrap).	2022-10-10 15:58:48 +03:00
Konstantin Osipov	199b6d6705	raft: (discovery) persist a list of discovered peers, not a set We plan to reuse the discovery table to store the peers after discovery is over, so load/store API must be generalized to use outside discovery. This includes sending the list of persisted peers over to a new member of the cluster.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	746322b740	raft: (group0) always start group0 using start_server_for_group0() When IP addresses are removed from raft::configuration, it's key to initialize raft_address_map with IP addresses before we start group 0. Best place to put this initialization is start_server_for_group0(), so make sure all paths which create group 0 use start_server_for_group0().	2022-10-10 15:58:48 +03:00
Kamil Braun	4974a31510	test/topology_raft_disabled: more Raft upgrade tests The tests are checking the upgrade procedure and recovery from failure in scenarios like when a node fails causing the procedure to get stuck or when we lose a majority in a fully upgraded cluster. Added some new functionalities to `ScyllaRESTAPIClient` like injecting errors and obtaining gossip generation numbers.	2022-10-10 14:32:10 +02:00
Petr Gusev	0923cb435f	raft: mark removed servers as expiring instead of dropping them There is a flaw in how the raft rpc endpoints are currently managed. The io_fiber in raft::server is supposed to first add new servers to rpc, then send all the messages and then remove the servers which have been excluded from the configuration. The problem is that the send_messages function isn't synchronous, it schedules send_append_entries to run after all the current requests to the target server, which can happen after we have already removed the server from address_map. In this patch the remove_server function is changed to mark the server_id as expiring rather than synchronously dropping it. This means all currently scheduled requests to that server will still be able to resolve the ip address for that server_id. Fixes: #11228 Closes #11748	2022-10-07 19:08:34 +02:00
Kamil Braun	06b87869ba	Merge 'Raft transport error' from Gusev Petr The `add_entry` and `modify_config` methods sometimes do an rpc to execute the request on the current leader. If the tcp connection was broken, a `seastar::rpc::closed_error` would be thrown to the client. This exception was not documented in the method comments and the client could have missed handling it. For example, this exception was not handled when calling `modify_config` in `raft_group0`, which sometimes broke the `removenode` command. An `intermittent_connection_error` exception was added earlier to solve a similar problem with the `read_barrier` method. In this patch it is renamed to `transport_error`, as it seems to better describe the situation, and an explicit specification for this exception was added - the rpc implementation can throw it if it is not known whether the call reached the destination and whether any mutations were made. In case of `read_barrier` it does not matter and we just retry, in case of `add_entry` and `modify_config` we cannot retry because of possible mutations, so we convert this exception to `commit_status_unknown`, which the client has to handle. Explicit comments have also been added to `raft::server` methods describing all possible exceptions. Closes #11691 * github.com:scylladb/scylladb: raft_group0: retry modify_config on commit_status_unknown raft: convert raft::transport_error to raft::commit_status_unknown	2022-10-07 15:53:22 +02:00
Petr Gusev	12bb8b7c8d	raft_group0: retry modify_config on commit_status_unknown modify_config can throw commit_status_unknown in case of a leader change or when the leader is unavailable, but the information about it has not yet reached the current node. In this patch modify_config is run again after some time in this case.	2022-10-07 13:34:23 +04:00
Petr Gusev	d79fbab682	raft: convert raft::transport_error to raft::commit_status_unknown The add_entry and modify_config methods sometimes do an rpc to execute the request on the current leader. If the tcp connection was broken, a seastar::rpc::closed_error would be thrown to the client. This exception was not documented in the method comments and the client could have missed handling it. For example, this exception was not handled when calling modify_config in raft_group0, which sometimes broke the removenode command. An intermittent_connection_error exception was added earlier to solve a similar problem with the read_barrier method. In this patch it is renamed to transport_error, as it seems to better describe the situation, and an explicit specification for this exception was added - the rpc implementation can throw it if it is not known whether the call reached the target node and whether any actions were performed on it. In case of read_barrier it does not matter and we just retry. In case of add_entry and modify_config we cannot retry because the rpc calls are not idempotent, so we convert this exception to commit_status_unknown, which the client has to handle. Explicit comments have also been added to raft::server methods describing all possible exceptions.	2022-10-07 13:34:16 +04:00
Pavel Emelyanov	fb8ed684fa	raft_group0: Use local reference It now grabs one from gossiper which is weird. A bit later it will be possible to remove gossiper->system_keyspace dependency Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-05 17:35:58 +03:00
Pavel Emelyanov	8570fe3c30	raft_group0: Add system keyspace reference The sharded<system_keyspace> is already started by the time raft_group0 is created Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-05 17:35:13 +03:00
Kamil Braun	114419d6ab	service/raft: raft_group0_client: read on-disk an in-memory group0 upgrade atomically `set_group0_upgrade_state` writes the on-disk state first, then in-memory state second, both under a write lock. `get_group0_upgrade_state` would only take the lock if the in-memory state was `use_pre_raft_procedures`. If there's an external observer who watches the on-disk state to decide whether Raft upgrade finished yet, the following could happen: 1. The node wrote `use_post_raft_procedures` to disk but didn't update the in-memory state yet, which is still `synchronize`. 2. The external client reads the table and sees that the state is `use_post_raft_procedures`, and deduces that upgrade has finished. 3. The external client immediately tries to perform a schema change. The schema change code calls `get_group0_upgrade_state` which does not take the read lock and returns `synchronize`. The schema change gets denied because schema changes are not allowed in `synchronize`. Make sure that `get_group0_upgrade_state` cannot execute in-between writing to disk and updating the in-memory state by always taking the read lock before reading the in-memory state. As it was before, it will immediately drop the lock if the state is not `use_pre_raft_procedures`. This is useful for upgrade tests, which read the on-disk state to decide whether upgrade has finished and often try to perform a schema change immediately afterwards. Closes #11672	2022-10-03 19:04:16 +02:00
Kamil Braun	67ee6500e3	service/raft: raft_group_registry: pass `direct_fd_pinger` by reference It was passed to `raft_group_registry::direct_fd_proxy` by value. That is a bug, we want to pass a reference to the instance that is living inside `gossiper`. Fortunately this bug didn't cause problems, because the pinger is only used for one function, `get_address`, which looks up an address in a map and if it doesn't find it, accesses the map that lives inside `gossiper` on shard 0 (and then caches it in the local copy). Explicitly delete the copy constructor of `direct_fd_pinger` so this doesn't happen again. Closes #11661	2022-10-03 16:40:35 +02:00
Petr Gusev	27e60ecbf4	raft server, log size limit in bytes Before this patch we could get an OOM if we received several big commands. The number of commands was small, but their total size in bytes was large. snapshot_trailing_size is needed to guarantee progress. Without this limit the fsm could get stuck if the size of the next item is greater than max_log_size - (size of trailing entries).	2022-09-26 13:10:10 +04:00
Kamil Braun	728161003a	Merge 'raft server, abort on background errors' from Gusev Petr Halted background fibers render raft server effectively unusable, so report this explicitly to the clients. Fix: #11352 Closes #11370 * github.com:scylladb/scylladb: raft server, status metric raft server, abort group0 server on background errors raft server, provide a callback to handle background errors raft server, check aborted state on public server public api's	2022-09-15 14:12:11 +02:00
Petr Gusev	4ff0807cd0	raft server, status metric	2022-09-13 19:34:22 +04:00
Petr Gusev	1b5fa4088e	raft server, abort group0 server on background errors	2022-09-12 10:16:43 +04:00
Mikołaj Grzebieluch	803115d061	raft: broadcast_tables: add returning query result Intermediate language added new layer of abstraction between cql statement and quering mutations, thus this commit adds new layer of abstraction between mutations and returning query result. Result can't be directly returned from `group0_state_machine::apply`, so we decided to hold query results in map inside `raft_group0_client`. It can be safely read after `add_entry_unguarded`, because this method waits for applying raft command. After translating result to `result_message` or in case of exception, map entry is erased.	2022-09-08 15:25:36 +02:00

1 2 3 4

185 Commits