scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	64c9359443	storage_proxy: Don't use default-initialized endpoint in get_read_executor() After calling filter_for_query() the extra_replica to speculate to may be left default-initialized which is :0 ipv6 address. Later below this address is used as-is to check if it belongs to the same DC or not which is not nice, as :0 is not an address of any existing endpoint. Recent move of dc/rack data onto topology made this place reveal itself by emitting the internal error due to :0 not being present on the topology's collection of endpoints. Prior to this move the dc filter would count :0 as belonging to "default_dc" datacenter which may or may not match with the dc of the local node. The fix is to explicitly tell set extra_replica from unset one. fixes: #11825 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11833	2022-10-25 09:16:50 +03:00
Botond Dénes	396d9e6a46	Merge 'Subscribe repair_info::abort on node_ops_meta_data::abort_source' from Pavel Emelyanov The storage_service::stop() calls repair_service::abort_repair_node_ops() but at that time the sharded<repair_service> is already stopped and call .local() on it just crashes. The suggested fix is to remove explicit storage_service -> repair_service kick. Instead, the repair_infos generated for the sake of node-ops are subscribed on the node_ops_meta_data's abort source and abort themselves automatically. fixes: #10284 Closes #11797 * github.com:scylladb/scylladb: repair: Remove ops_uuid repair: Remove abort_repair_node_ops() altogether repair: Subscribe on node_ops_info::as abortion repair: Keep abort source on node_ops_info repair: Pass node_ops_info arg to do_sync_data_using_repair() repair: Mark repair_info::abort() noexcept node_ops: Remove _aborted bit node_ops: Simplify construction of node_ops_metadata main: Fix message about repair service starting	2022-10-21 10:08:43 +03:00
Pavel Emelyanov	898579027d	gossiper: Pass current snitch name into checker Gossiper makes sure local snitch name is the same as the one of other nodes in the ring. It now gets global snitch to get the name, this patch passes the name as an argument, because the caller (storage_service) has snitch instance local reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:33:38 +03:00
Pavel Emelyanov	ea8bfc4844	storage_service: Keep local snitch reference Storage service uses snitch in several places: - boot - snitch-reconfigured subscription - preferred IP reconnection At this point it's worth adding storage_service->snitch explicit dependency and patch the above to use local reference Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-20 12:30:00 +03:00
Nadav Har'El	264f453b9d	Merge 'Associate alternator user with its service level configuration' from Piotr Sarna Until now, authentication in alternator served only two purposes: - refusing clients without proper credentials - printing user information with logs After this series, this user information is passed to lower layers, which also means that users are capable of attaching service levels to roles, and this service level configuration will be effective with alternator requests. tests: manually by adding more debug logs and inspecting that per-service-level timeout value was properly applied for an authenticated alternator user Fixes #11379 Closes #11380 * github.com:scylladb/scylladb: alternator: propagate authenticated user in client state client_state: add internal constructor with auth_service alternator: pass auth_service and sl_controller to server	2022-10-19 23:27:48 +03:00
Botond Dénes	2d581e9e8f	Merge "Maintain dc/rack by topology" from Pavel Emelyanov " There's an ongoing effort to move the endpoint -> {dc/rack} mappings from snitch onto topology object and this set finalizes it. After it the snitch service stops depending on gossiper and system keyspace and is ready for de-globalization. As a nice side-effect the system keyspace no longer needs to maintain the dc/rack info cache and its starting code gets relaxed. refs: #2737 refs: #2795 " * 'br-snitch-dont-mess-with-topology-data-2' of https://github.com/xemul/scylla: (23 commits) system_keyspace: Dont maintain dc/rack cache system_keyspace: Indentation fix after previous patch system_keyspace: Coroutinuze build_dc_rack_info() topology: Move all post-configuration to topology::config snitch: Start early gossiper: Do not export system keyspace snitch: Remove gossiper reference snitch: Mark get_datacenter/_rack methods const snitch: Drop some dead dependency knots snitch, code: Make get_datacenter() report local dc only snitch, code: Make get_rack() report local rack only storage_service: Populate pending endpoint in on_alive() code: Populate pending locations topology: Put local dc/rack on topology early topology: Add pending locations collection topology: Make get_location() errors more verbose token_metadata: Add config, spread everywhere token_metadata: Hide token_metadata_impl copy constructor gosspier: Remove messaging service getter snitch: Get local address to gossip via config ...	2022-10-19 06:50:21 +03:00
Pavel Emelyanov	8231b4ec1b	repair: Subscribe on node_ops_info::as abortion When node_ops_meta_data aborts it also kicks repair to find and abort all relevant repair_infos. Now it can be simplified by subscribing repair_meta on the abort source and aborting it without explicit kick Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	bf5825daac	repair: Keep abort source on node_ops_info Next patches will need to subscribe on node_ops_meta_data's abort source inside repair code, so keep the pointer on node_ops_info too. At the same time, the node_ops_info::abort becomes obsolete, because the same check can be performed via the abort_source->abort_requested() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:23 +03:00
Pavel Emelyanov	34458ec2c5	node_ops: Remove _aborted bit A short cleanup "while at it" -- the node_ops_meta_data doesn't need to carry dedicated _aborted boolean -- the abort source that sets it is available instantly Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:04:22 +03:00
Pavel Emelyanov	96f0695731	node_ops: Simplify construction of node_ops_metadata It always constructs node_ops_info the same way Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-18 20:03:53 +03:00
Tomasz Grabiec	87b7e7ff9c	Merge 'storage_proxy: prepare for fencing, complex ops' from Avi Kivity Following up on `69aea59d97`, which added fencing support for simple reads and writes, this series does the same for the complex ops: - partition scan - counter mutation - paxos With this done, the coordinator knows about all in-flight requests and can delay topology changes until they are retired. Closes #11296 * github.com:scylladb/scylladb: storage_proxy: hold effective_replication_map for the duration of a paxos transaction storage_proxy: move paxos_response_handler class to .cc file storage_proxy: deinline paxos_response_handler constructor/destructor storage_proxy: use consistent effective_replication_map for counter coordinator storage_proxy: improve consistency in query_partition_key_range{,_concurrent} storage_proxy: query_partition_key_range_concurrent: reduce smart pointer use storage_proxy: query_partition_key_range_concurrent: improve token_metadata consistency storage_proxy: query_singular: use fewer smart pointers storage_proxy: query_singular: simplify lambda captures locator: effective_replication_map: provide non-smart-pointer accessor to token_metadata storage_proxy: use consistent token_metadata with rest of singular read	2022-10-14 15:44:35 +02:00
Avi Kivity	1feaa2dfb4	storage_proxy: handle_write: use coroutine::all() instead of when_all() coroutine::all() saves an allocation. Since it's safe for lambda coroutines, remove a coroutine::lambda wrapper. Closes #11749	2022-10-14 06:56:16 +03:00
Tomasz Grabiec	ee2398960c	Merge 'service/raft: simplify `raft_address_map`' from Kamil Braun The `raft_address_map` code was "clever": it used two intrusive data structures and did a lot of manual lifetime management; raw pointer manipulation, manual deletion of objects... It wasn't clear who owns which object, who is responsible for deleting what. And there was a lot of code. In this PR we replace one of the intrusive data structures with a good old `std::unordered_map` and make ownership clear by replacing the raw pointers with `std::unique_ptr`. Furthermore, some invariants which were not clear and enforced in runtime are now encoded in the type system. The code also became shorter: we reduced its length from ~360 LOC to ~260 LOC. Closes #11763 * github.com:scylladb/scylladb: service/raft: raft_address_map: get rid of `is_linked` checks service/raft: raft_address_map: get rid of `to_list_iterator` service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr` service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr service/raft: raft_address_map: don't use intrusive set for timestamped entries service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr`	2022-10-13 18:08:49 +02:00
Kamil Braun	5a9371bcb0	service/raft: raft_address_map: get rid of `is_linked` checks Being linked is an invariant of `expiring_entry_ptr`. Make it explicit by moving the `_expiring_list.push_front` call into the constructor.	2022-10-13 15:17:07 +02:00
Kamil Braun	cdf3367c05	service/raft: raft_address_map: get rid of `to_list_iterator` Unnecessary.	2022-10-13 15:17:06 +02:00
Kamil Braun	0e29495c38	service/raft: raft_address_map: simplify ownership of `expiring_entry_ptr` The owner of `expiring_entry_ptr` was almost uniquely its corresponding `timestamp_entry`; it would delete the expiring entry when it itself got destroyed. There was one call to explicit `unlink_and_dispose`, which made the picture unclear. Make the picture clear: `timestamped_entry` now contains a `unique_ptr` to its `expiring_entry_ptr`. The `unlink_and_dispose` was replaced with `_lru_entry = nullptr`. We can also get rid of the back-reference from `expiring_entry_ptr` to `timestamped_entry`. The code becomes shorter and simpler.	2022-10-13 15:16:40 +02:00
Petr Gusev	c76cf5956d	removenode: don't stream data from the leaving node If a removenode is run for a recently stopped node, the gossiper may not yet know that the node is down, and the removenode will fail with a Stream failed error trying to stream data from that node. In this patch we explicitly reject removenode operation if the gossiper considers the leaving node up. Closes #11704	2022-10-13 15:11:32 +02:00
Asias He	6134fe4d1f	storage_service: Prevent removed node to rejoin in handle_state_normal - Start n1, n2, n3 (127.0.0.3) - Stop n3 - Change ip address of n3 to 127.0.0.33 and restart n3 - Decommission n3 - Start new node n4 The node n4 will learn from the gossip entry for 127.0.0.3 that node 127.0.0.3 is in shutdown status which means 127.0.0.3 is still part of the ring. This patch prevents this by checking the status for the host id on all the entries. If any of the entries shows the node with the host id is in LEFT status, reject to put the node in NORMAL status. Fixes #11355 Closes #11361	2022-10-13 15:11:32 +02:00
Avi Kivity	a2da08f9f9	storage_proxy: hold effective_replication_map for the duration of a paxos transaction Luckily, all topology calculations are done in get_paxos_participants(), so all we have to do is it hold the effective_replication_map for the duration of the transaction, and pass it to get_paxos_participants(). This ensures that the coordinator knows about all in-flight requests and can fence them from topology changes.	2022-10-13 14:27:26 +03:00
Avi Kivity	69aaa5e131	storage_proxy: move paxos_response_handler class to .cc file It's not used elsewhere.	2022-10-13 14:27:26 +03:00
Avi Kivity	b2f3934e95	storage_proxy: deinline paxos_response_handler constructor/destructor They have no business being inline as it's a heavyweight object.	2022-10-13 14:27:26 +03:00
Avi Kivity	94e4ff11be	storage_proxy: use consistent effective_replication_map for counter coordinator Hold the effective_replication_map while talking to the counter leader, to allow for fencing in the future. The code is somewhat awkward because the API allows for multiple keyspaces to be in use. The error code generation, already broken as it doesn't use the correct table, continues to be broken in that it doesn't use the correct effective_replication_map, for the same reason.	2022-10-13 14:27:23 +03:00
Avi Kivity	406a046974	storage_proxy: improve consistency in query_partition_key_range{,_concurrent} query_partition_key_range captures a token_metadata_ptr and uses it consistently in sequential calls to query_partition_key_range_concurrent (via tail recursion), but each invocation of query_partition_key_range_concurrent captures its own effective_replication_map_ptr. Since these are captured at different times, they can be inconsistent after the first iteration. Fix by capturing it once in the caller and propagating it everywhere.	2022-10-13 13:56:52 +03:00
Avi Kivity	5d320e95d5	storage_proxy: query_partition_key_range_concurrent: reduce smart pointer use Capture token_metadata by reference rather than smart pointer, since out effective_replication_map_ptr protects it.	2022-10-13 13:56:52 +03:00
Avi Kivity	f75efa965f	storage_proxy: query_partition_key_range_concurrent: improve token_metadata consistency Derive the token_metadata from the effective_replication_map rather than getting it independently. Not a real bug since these were in the same continuation, but safer this way.	2022-10-13 13:56:52 +03:00
Avi Kivity	161ce4b34f	storage_proxy: query_singular: use fewer smart pointers Capture token_metadata by reference since we're protecting it with the mighty effective_replication_map_ptr. This saves a few instructions to manage smart pointers.	2022-10-13 13:56:33 +03:00
Avi Kivity	efd89c1890	storage_proxy: query_singular: simplify lambda captures The lambdas in query_singular do not outlive the enclosing coroutine, so they can capture everything by reference. This simplifies life for a future update of the lambda, since there's one thing less to worry about.	2022-10-13 13:52:54 +03:00
Avi Kivity	86a48cf12f	storage_proxy: use consistent token_metadata with rest of singular read query_singular() uses get_token_metadata_ptr() and later, in get_read_executor(), captures the effective_replication_map(). This isn't a bug, since the two are captured in the same continuation and are therefore consistent, but a way to ensure it stays so is to capture the effective_replication_map earlier and derive the token_metadata from it.	2022-10-13 13:46:04 +03:00
Kamil Braun	92dd1f7307	service/raft: raft_address_map: move _last_accessed field from timestamped_entry to expiring_entry_ptr `timestamped_entry` had two fields: ``` optional<clock_time_point> _last_accessed expiring_entry_ptr* _lru_entry ``` The `raft_address_map` data structure maintained an invariant: `_last_accessed` is set if and only if `_lru_entry` is not null. This invariant could be broken for a while when constructing an expiring `timestamped_entry`: the constructor was given an `expiring = true` flag, which set the `_last_accessed` field; this was redundant, because immediately after a corresponding `expiring_entry_ptr` was constructed which again reset the `_last_accessed` field and set `_lru_entry`. The code becomes simpler and shorter when we move `_last_accessed` field into `expiring_entry_ptr`. The invariant is now guaranteed by the type system: `_last_accessed` is no longer `optional`.	2022-10-12 12:22:57 +02:00
Kamil Braun	262b9473d5	service/raft: raft_address_map: don't use intrusive set for timestamped entries Intrusive data structures are harder to reason about. In `raft_address_map` there's a good reason to use an intrusive list for storing `expiring_entry_ptr`s: we move the entries around in the list (when their expiration times change) but we want for the objects to stay in place because `timestamped_entry`s may point to them (although we could simply update the pointers using the existing back-reference...) However, there's not much reason to store `timestamped_entry` in an intrusive set. It was basically used in one place: when dropping expired entries, we iterate over the list of `expiring_entry_ptr`s and we want to drop the corresponding `timestamped_entry` as well, which is easy when we have a pointer to the entry and it's a member of an intrusive container. But we can deal with it when using non-intrusive containers: just `find` the element in the container to erase it. The code becomes shorter with this change. I also use a map instead of a set because we need to modify the `timestamped_entry` which wouldn't be possible if it was used as an `unordered_set` key. In fact using map here makes more sense: we were using the intrusive set similarly to a map anyway because all lookups were performed using the `_id` field of `timestamped_entry` (now the field was moved outside the struct, it's used as the map's key).	2022-10-12 12:22:50 +02:00
Kamil Braun	0c13c85752	service/raft: raft_address_map: store reference to `timestamped_entry` in `expiring_entry_ptr` The class was storing a pointer which couldn't be null. A reference is a better fit in this case.	2022-10-11 17:21:01 +02:00
Asias He	810b424a8c	storage_service: Reject to bootstrap new node when node has unknown gossip status - Start a cluster with n1, n2, n3 - Full cluster shutdown n1, n2, n3 - Start n1, n2 and keep n3 as shutdown - Add n4 Node n4 will learn the ip and uuid of n3 but it does not know the gossip status of n3 since gossip status is published only by the node itself. After full cluster shutdown, gossip status of n3 will not be present until n3 is restarted again. So n4 will not think n3 is part of the ring. In this case, it is better to reject the bootstrap. With this patch, one would see the following when adding n4: ``` ERROR 2022-09-01 13:53:14,480 [shard 0] init - Startup failed: std::runtime_error (Node 127.0.0.3 has gossip status=UNKNOWN. Try fixing it before adding new node to the cluster.) ``` The user needs to perform either of the following before adding a new node: 1) Run nodetool removenode to remove n3 2) Restart n3 to get it back to the cluster Fixes #6088 Closes #11425	2022-10-11 15:47:34 +03:00
Botond Dénes	378c6aeebd	Merge 'More Raft upgrade tests' from Kamil Braun Refactor the existing upgrade tests, extracting some common functionality to helper functions. Add more tests. They are checking the upgrade procedure and recovery from failure in scenarios like when a node fails causing the procedure to get stuck or when we lose a majority in a fully upgraded cluster. Add some new functionalities to `ScyllaRESTAPIClient` like injecting errors and obtaining gossip generation numbers. Extend the removenode function to allow ignoring dead nodes. Improve checking for CQL availability when starting nodes to speed up testing. Closes #11725 * github.com:scylladb/scylladb: test/topology_raft_disabled: more Raft upgrade tests test/topology_raft_disabled: refactor `test_raft_upgrade` test/pylib: scylla_cluster: pass a list of ignored nodes to removenode test/pylib: rest_client: propagate errors from put_json test/pylib: fix some type hints test/pylib: scylla_cluster: don't create and drop keyspaces to check if cql is up	2022-10-11 15:30:00 +03:00
Kamil Braun	08e654abf5	Merge 'raft: (service) cleanups on the path for dynamic IP address support' from Konstantin Osipov In preparation for supporting IP address changes of Raft Group 0: 1) Always use start_server_for_group0() to start a server for group 0. This will provide a single extension point when it's necessary to prompt raft_address_map with gossip data. 2) Don't use raft::server_address in discovery, since going forward discovery won't store raft::server_address. On the same token stop using discovery::peer_set anywhere outside discovery (for persistence), use a peer_list instead, which is easier to marshal. Closes #11676 * github.com:scylladb/scylladb: raft: (discovery) do not use raft::server_address to carry IP data raft: (group0) API refactoring to avoid raft::server_address raft: rename group0_upgrade.hh to group0_fwd.hh raft: (group0) move the code around raft: (discovery) persist a list of discovered peers, not a set raft: (group0) always start group0 using start_server_for_group0()	2022-10-11 13:43:41 +02:00
Asias He	58c65954b8	storage_service: Reject decommission if nodes are down - Start n1, n2, n3 - Apply network nemesis as below: + Block gossip traffic going from nodes 1 and 2 to node 3. + All the other rpc traffic flows normally, including gossip traffic from node 3 to nodes 1 and 2 and responses to node_ops commands from nodes 1 and 2 to node 3. - Decommission n3 Currently, the decommission will be successful because all the network traffic is ok. But n3 could not advertise status STATUS_LEFT to the rest of the cluster due to the network nemesis applied. As a result, n1 and n3 could not move the n3 from STATUS_LEAVING to STATUS_LEFT, so n3 will stay in DL forever. I know why the node stays DL forever. The problem is that with node_ops_cmd based node operation, we still rely on the gossip status of STATUS_LEFT from the node being decommissioned to notify other nodes this node has finished decommission and can be moved from STATUS_LEAVING to STATUS_LEFT. This patch fixes by checking gossip liveness before running decommission. Reject if required peer nodes are down. With the fix, the decommission of n3 will fail like this: $ nodetool decommission -p 7300 nodetool: Scylla API server HTTP POST to URL '/storage_service/decommission' failed: std::runtime_error (decommission[adb3950e-a937-4424-9bc9-6a75d880f23d]: Rejected decommission operation, removing node=127.0.0.3, sync_nodes=[127.0.0.2, 127.0.0.3, 127.0.0.1], ignore_nodes=[], nodes_down={127.0.0.1}) Fixes #11302 Closes #11362	2022-10-11 14:09:28 +03:00
Pavel Emelyanov	8b8b37cdda	system_keyspace: Dont maintain dc/rack cache Some good news finally. The saved dc/rack info about the ring is now only loaded once on start. So the whole cache is not needed and the loading code in storage_service can be greatly simplified Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:18:31 +03:00
Pavel Emelyanov	4206b1f98f	snitch, code: Make get_datacenter() report local dc only The continuation of the previous patch -- all the code uses topology::get_datacenter(endpoint) to get peers' dc string. The topology still uses snitch for that, but it already contains the needed data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	6c6711404f	snitch, code: Make get_rack() report local rack only All the code out there now calls snitch::get_rack() to get rack for the local node. For other nodes the topology::get_rack(endpoint) is used. Since now the topology is properly populated with endpoints, it can finally be patched to stop using snitch and get rack from its internal collections Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	bc813771e8	storage_service: Populate pending endpoint in on_alive() A special-purpose add-on to the previous patch. When messaging service accepts a new connection it sometimes may want to drop it early based on whether the client is from the same dc/rack or not. However, at this stage the information might have not yet had chances to be spread via storage service pending-tokens updating paths, so here's one more place -- the on_alive() callback Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	1be97a0a76	code: Populate pending locations Previous patches added the concept of pending endpoints in the topology, this patch populates endpoints in this state. Also, the set_pending_ranges() is patched to make sure that the tokens added for the enpoint(s) are added for something that's known by the topology. Same check exists in update_normal_tokens() Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Pavel Emelyanov	77bde21024	storage_service: Shuffle on_alive() callback No functional changes, just keep some conditions from if()s as local variables. This is the churn-reducing preparation for one of the the next patches Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-11 05:17:08 +03:00
Konstantin Osipov	3e46c32d7b	raft: (discovery) do not use raft::server_address to carry IP data We plan to remove IP information from Raft addresses. raft::server_address is used in Raft configuration and also in discovery, which is a separate algorithm, as a handy data structure, to avoid having new entities in RPC. Since we plan to remove IP addresses from Raft configuration, using raft::server_address in discovery and still storing IPs in it would create ambiguity: in some uses raft::server_address would store an IP, and in others - would not. So switch to an own data structure for the purposes of discovery, discovery_peer, which contains a pair ip, raft server id. Note to reviewers: ideally we should switch to URIs in discovery_peer right away. Otherwise we may have to deal with incompatible changes in discovery when adding URI support to Scylla.	2022-10-10 16:24:33 +03:00
Konstantin Osipov	8857e017c7	raft: (group0) API refactoring to avoid raft::server_address Replace raft::server_address in a few raft_group0 API calls with raft::server_id. These API calls do not need raft::server_address, i.e. the address part, anyway, and since going forward raft::server_address will not contain the IP address, stop using it in these calls. This is a beginning of a multi-patch series to reduce raft::server_address usage to core raft only.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	224dd9ce1e	raft: rename group0_upgrade.hh to group0_fwd.hh The plan is to add other group-0-related forward declarations to this file, not just the ones for upgrade.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	e226624daf	raft: (group0) move the code around Move load/store functions for discovered peers up, since going forward they'll be used to in start_server_for_group0(), to extend the address map prior to start (and thus speed up bootstrap).	2022-10-10 15:58:48 +03:00
Konstantin Osipov	199b6d6705	raft: (discovery) persist a list of discovered peers, not a set We plan to reuse the discovery table to store the peers after discovery is over, so load/store API must be generalized to use outside discovery. This includes sending the list of persisted peers over to a new member of the cluster.	2022-10-10 15:58:48 +03:00
Konstantin Osipov	746322b740	raft: (group0) always start group0 using start_server_for_group0() When IP addresses are removed from raft::configuration, it's key to initialize raft_address_map with IP addresses before we start group 0. Best place to put this initialization is start_server_for_group0(), so make sure all paths which create group 0 use start_server_for_group0().	2022-10-10 15:58:48 +03:00
Kamil Braun	4974a31510	test/topology_raft_disabled: more Raft upgrade tests The tests are checking the upgrade procedure and recovery from failure in scenarios like when a node fails causing the procedure to get stuck or when we lose a majority in a fully upgraded cluster. Added some new functionalities to `ScyllaRESTAPIClient` like injecting errors and obtaining gossip generation numbers.	2022-10-10 14:32:10 +02:00
Pavel Emelyanov	caed12c8f2	system_keyspace: Add .shutdown() method Many services out there have one (sometimes called .drain()) that's called early on stop and that's responsible for prearing the service for stop -- aborting pending/in-flight fibers and alike. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-10 15:29:33 +03:00
Petr Gusev	0923cb435f	raft: mark removed servers as expiring instead of dropping them There is a flaw in how the raft rpc endpoints are currently managed. The io_fiber in raft::server is supposed to first add new servers to rpc, then send all the messages and then remove the servers which have been excluded from the configuration. The problem is that the send_messages function isn't synchronous, it schedules send_append_entries to run after all the current requests to the target server, which can happen after we have already removed the server from address_map. In this patch the remove_server function is changed to mark the server_id as expiring rather than synchronously dropping it. This means all currently scheduled requests to that server will still be able to resolve the ip address for that server_id. Fixes: #11228 Closes #11748	2022-10-07 19:08:34 +02:00

1 2 3 4 5 ...

3075 Commits