scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 20:27:03 +00:00

Author	SHA1	Message	Date
Benny Halevy	c61083852c	storage_service: handle_state_normal: calculate candidates_for_removal when replacing tokens We currently try to detect a replaced node so to insert it to endpoints_to_remove when it has no owned tokens left. However, for each token we first generate a multimap using get_endpoint_to_token_map_for_reading(). There are 2 problems with that: 1. unless the replaced node owns a single token, this map will not be empty after erasing one token out of it, since the token metadata has not changed yet (this is done later with update_normal_tokens(owned_tokens, endpoint)). 2. generating this map for each token is inefficient, turning this algorithm complexity to quadratic in the number of tokens... This change copies the current token_to_endpoint map to temporary map and erases replaced tokens from it, while maintaining a set of candidates_for_removal. After traversing all replaced tokens, we check again the `token_to_endpoint_map` erasing from `candidates_for_removal` any endpoint that still owns tokens. The leftover candidates are endpoints the own no tokens and so they are added to `hosts_to_remove`. Fixes #12082 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12141	2022-12-05 16:17:18 +01:00
Kamil Braun	cbdcc944b5	service/raft: specialized verb for failure detector pinger We used GOSSIP_ECHO verb to perform failure detection. Now we use a special verb DIRECT_FD_PING introduced for this purpose. There are multiple reasons to do so. One minor reason: we want to use the same connection as other Raft verbs: if we can't deliver Raft append_entries or vote messages somewhere, that endpoint should be marked dead; if we can, the endpoint should be marked alive. So putting pings on the same connection as the other Raft verbs is important when dealing with weird situations where some connections are available but others are not. Observe that in `do_get_rpc_client_idx`, we put the new verb in the right place. Another minor reason: we remove the awkward gossiper `echo_pinger` abstraction which required storing and updating gossiper generation numbers. This also removes one dependency from Raft service code to gossiper. Major reason 1: the gossip echo handler has a weird mechanism where a replacing node returns errors during the replace operation to some of the nodes. In Raft however, we want to mark servers as alive when they are alive, including a server running on a node that's replacing another node. Major reason 2, related to the previous one: when server B is replacing server A with the same IP, the failure detector will try to ping both servers. Both servers are mapped to the same IP by the address map, so pings to both servers will reach server B. We want server B to respond to the pings destined for server B, but not to pings destined for server A, so the sender can mark B alive but keep A marked dead. To do this, we include the destination's Raft ID in our RPCs. The destination compares the received ID with its own. If it's different, it returns a `wrong_destination` response, and the failure detector knows that the ping did not reach the destination (it reached someone else). Yet another reason: removes "Not ready to respond gossip echo message" log spam during replace.	2022-12-01 20:54:18 +01:00
Kamil Braun	02c64becdc	db: system_keyspace: de-staticize `{get,set}_raft_server_id` Part of the anti-globals war.	2022-12-01 20:54:18 +01:00
Kamil Braun	99fe580068	service/raft: make this node's Raft ID available early in group registry Raft ID was loaded or created late in the boot procedure, in `storage_service::join_token_ring`. Create it earlier, as soon as it's possible (when `system_keyspace` is started), pass it to `raft_group_registry::start` and store it inside `raft_group_registry`. We will use this Raft ID stored in group registry in following patches. Also this reduces the number of disk accesses for this node's Raft ID. It's now loaded from disk once, stored in `raft_group_registry`, then obtained from there when needed. This moves `raft_group_registry::start` a bit later in the startup procedure - after `system_keyspace` is started - but it doesn't make a difference.	2022-12-01 20:54:18 +01:00
Kamil Braun	0f9d0dd86e	Merge 'raft: support IP address change' from Konstantin Osipov This is the core of dynamic IP address support in Raft, moving out the IP address sourcing from Raft Group 0 configuration to gossip. At start of Raft, the raft id <> IP address translation map is tuned into the gossiper notifications and learns IP addresses of Raft hosts from them. The series intentionally doesn't contain the part which speeds up the initial cluster assembly by persisting the translation cache and using more sources besides gossip (discovery, RPC) to show correctness of the approach. Closes #12035 * github.com:scylladb/scylladb: raft: (rpc) do not throw in case of a missing IP address in RPC raft: (address map) actively maintain ip <-> raft server id map	2022-11-30 15:40:18 +01:00
Michał Jadwiszczak	8e64e18b80	forward_service: add debug logs Adds a few debug logs to see what is happening in https://github.com/scylladb/scylladb/issues/11684 Wrapped `forward_result::printer` into `seastar::value_of` to lazy evaluate the printer Closes #12113	2022-11-30 12:15:26 +02:00
Botond Dénes	50aea9884b	Merge 'Improve the Raft upgrade procedure' from Kamil Braun Better logging, less code, a minor fix. Closes #12135 * github.com:scylladb/scylladb: service/raft: raft_group0: less repetitive logging calls service/raft: raft_group0: fix sleep_with_exponential_backoff	2022-11-30 11:24:20 +02:00
Avi Kivity	6a5d9ff261	treewide: use non-experimental std::source_location Now that we use libstdc++ 12, we can use the standardized source_location. Closes #12137	2022-11-30 11:06:43 +02:00
Konstantin Osipov	fbe7886cc0	raft: (rpc) do not throw in case of a missing IP address in RPC Remove raft_address_map::get_inet_address() While at it, coroutinize some rpc mehtods. To propagate up the event of missing IP address, use coroutine::exception( with a proper type (raft::transport_error) and a proper error message. This is a building block from removing raft_address_map::get_inet_address() which is too generic, and shifting the responsibility of handling missing addresses to the address map clients. E.g. one-way RPC shouldn't throw if an address is missing, but just drop the message. PS An attempt to use a single template function rendered to be too complex: - some functions require a gate, some don't - some return void, some future<> and some future<raft::data_type>	2022-11-29 19:55:48 +03:00
Konstantin Osipov	73e5298273	raft: (address map) actively maintain ip <-> raft server id map 1) make address map API flexible Before this patch: - having a mapping without an actual IP address was an internal error - not having a mapping for an IP address was an internal error - re-mapping to a new IP address wasn't allowed After this patch: - the address map may contain a mapping without an actual IP address, and the caller must be prepared for it: find() will return a nullopt. This happens when we first add an entry to Raft configuration and only later learn its IP address, e.g. via gossip. - it is allowed to re-map an existing entry to a new address; 2) subscribe to gossip notifications Learning IP addresses from gossip allows us to adjust the address map whenever a node IP address changes. Gossiper is also the only valid source of re-mapping, other sources (RPC) should not re-map, since otherwise a packet from a removed server can remap the id to a wrong address and impact liveness of a Raft cluster. 3) prompt address map state with app state Initialize the raft address map with initial gossip application state, specifically IPs of members of the cluster. With this, we no longer need to store these IPs in Raft configuration (and update them when they change). The obvious drawback of this approach is that a node may join Raft config before it propagates its IP address to the cluster via gossip - so the boot process has to wait until it happens. Gossip also doesn't tell us which IPs are members of Raft configuration, so we subscribe to Group0 configuration changes to mark the members of Raft config "non-expiring" in the address translation map. Thanks to the changes above, Raft configuration no longer stores IP addresses. We still keep the 'server_info' column in the raft_config system table, in case we change our mind or decide to store something else in there.	2022-11-29 19:55:43 +03:00
Kamil Braun	3dbcff435f	service/raft: raft_group0: less repetitive logging calls Some log messages in retry loops in the Raft upgrade procedure included a sentence like "sleeping before retrying..."; but not all of them. With the recently added `sleep_with_exponential_backoff` abstraction we can put this "sleeping..." message in a single place, and it's also easy to say how long we're going to sleep. I also enjoy using this `source_location` thing.	2022-11-29 17:42:43 +01:00
Kamil Braun	580bdec875	service/raft: raft_group0: fix sleep_with_exponential_backoff It was immediately jumping to _max_retry_period.	2022-11-29 16:27:59 +01:00
Avi Kivity	0da66371a5	storage_proxy: coroutinize inner continuation of create_hint_sync_point() It is part of a coroutine::parallel_for_each(), which is safe for lambda coroutines. Closes #12057	2022-11-28 11:30:00 +02:00
Avi Kivity	70bfa708f5	storage_proxy: coroutinize change_hints_host_filter() Trivial straight-line code, no performance implications. Closes #12056	2022-11-23 15:34:24 +02:00
Benny Halevy	731a74c71f	storage_proxy: pass topology& to sort_endpoints_by_proximity It mustn't use the latest topology that may differ from the one used by the query as it may be missing nodes (e.g. after concurrent decommission). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-22 15:02:40 +02:00
Benny Halevy	ab3fc1e069	storage_proxy: pass topology& to is_worth_merging_for_range_query It mustn't use the latest topology that may differ from the one used by the query as it may be missing nodes (e.g. after concurrent decommission). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-22 15:01:58 +02:00
Avi Kivity	994603171b	Merge 'Add validator to the mutation compactor' from Botond Dénes Fragment reordering and fragment dropping bugs have been plaguing us since forever. To fight them we added a validator to the sstable write path to prevent really messed up sstables from being written. This series adds validation to the mutation compactor. This will cover reads and compaction among others, hopefully ridding us of such bugs on the read path too. This series fixes some benign looking issues found by unit tests after the validator was added -- although how benign a producer emitting two partition-ends depends entirely on how the consumer reacts to it, so no such bug is actually benign. Fixes: https://github.com/scylladb/scylladb/issues/11174 Closes #11532 * github.com:scylladb/scylladb: mutation_compactor: add validator mutation_fragment_stream_validator: add a 'none' validation level test/boost/mutation_query_test: test_partition_limit: sort input data querier: consume_page(): use partition_start as the sentinel value treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{} treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} position_in_partition: add for_partition_{start,end}()	2022-11-20 20:33:26 +02:00
Kamil Braun	d7649a86c4	Merge 'Build up to support of dynamic IP address changes in Raft' from Konstantin Osipov We plan to stop storing IP addresses in Raft configuration, and instead use the information disseminated through gossip to locate Raft peers. Implement patches that are building up to that: * improve Raft API of configuration change notifications * disseminate raft host id in Gossip * avoid using Raft addresses from Raft configuraiton, and instead consistently use the translation layer between raft server id <-> IP address Closes #11953 * github.com:scylladb/scylladb: raft: persist the initial raft address map raft: (upgrade) do not use IP addresses from Raft config raft: (and gossip) begin gossiping raft server ids raft: change the API of conf change notifications	2022-11-18 11:38:19 +01:00
Asias He	4571fcf9e7	token_metadata: Rename is_member to is_normal_token_owner The name is_normal_token_owner is more clear than is_member. The is_normal_token_owner reflects what it really checks.	2022-11-18 09:29:20 +08:00
Pavel Emelyanov	a396c27efc	Merge 'message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client' from Kamil Braun `get_rpc_client` calculates a `topology_ignored` field when creating a client which says whether the client's endpoint had topology information when this client was created. This is later used to check if that client needs to be dropped and replaced with a new client which uses the correct topology information. The `topology_ignored` field was incorrectly calculated as `true` for pending endpoints even though we had topology information for them. This would lead to unnecessary drops of RPC clients later. Fix this. Remove the default parameter for `with_pending` from `topology::has_endpoint` to avoid similar bugs in the future. Apparently this fixes #11780. The verbs used by decommission operation use RPC client index 1 (see `do_get_rpc_client_idx` in message/messaging_service.cc). From local testing with additional logging I found that by the time this client is created (i.e. the first verb in this group is used), we already know the topology. The node is pending at that point - hence the bug would cause us to assume we don't know the topology, leading us to dropping the RPC client later, possibly in the middle of a decommission operation. Fixes: #11780 Closes #11942 * github.com:scylladb/scylladb: message: messaging_service: check for known topology before calling is_same_dc/rack test: reenable test_topology::test_decommission_node_add_column test/pylib: util: configurable period in wait_for message: messaging_service: fix topology_ignored for pending endpoints in get_rpc_client message: messaging_service: topology independent connection settings for GOSSIP verbs	2022-11-17 20:14:32 +03:00
Konstantin Osipov	262566216b	raft: persist the initial raft address map	2022-11-17 14:26:36 +03:00
Konstantin Osipov	b35af73fdf	raft: (upgrade) do not use IP addresses from Raft config Always use raft address map to obtain the IP addresses of upgrade peers. Right now the map is populated from Raft configuration, so it's an equivalent transformation, but in the future raft address map will be populated from other sources: discovery and gossip, hence the logic of upgrade will change as well. Do not proceed with the upgrade if an address is missing from the map, since it means we failed to contact a raft member.	2022-11-17 14:26:31 +03:00
Konstantin Osipov	051dceeaff	raft: (and gossip) begin gossiping raft server ids We plan to use gossip data to educate Raft RPC about IP addresses of raft peers. Add raft server ids to application state, so that when we get a notification about a gossip peer we can identify which raft server id this notification is for, specifically, we can find what IP address stands for this server id, and, whenever the IP address changes, we can update Raft address map with the new address. On the same token, at boot time, we now have to start Gossip before Raft, since Raft won't be able to send any messages without gossip data about IP addresses.	2022-11-17 12:07:31 +03:00
Konstantin Osipov	990c7a209f	raft: change the API of conf change notifications Pass a change diff into the notification callback, rather than add or remove servers one by one, so that if we need to persist the state, we can do it once per configuration change, not for every added or removed server. For now still pass added and removed entries in two separate calls per a single configuration change. This is done mainly to fulfill the library contract that it never sends messages to servers outside the current configuration. The group0 RPC implementation doesn't need the two calls, since it simply marks the removed servers as expired: they are not removed immediately anyway, and messages can still be delivered to them. However, there may be test/mock implementations of RPC which could benefit from this contract, so we decided to keep it.	2022-11-17 12:07:31 +03:00
Kamil Braun	9b2449d3ea	test: reenable test_topology::test_decommission_node_add_column Also improve the test to increase the probability of reproducing #11780 by injecting sleeps in appropriate places. Without the fix for #11780 from the earlier commit, the test reproduces the issue in roughly half of all runs in dev build on my laptop.	2022-11-16 14:01:50 +01:00
Botond Dénes	cbf9be9715	Merge 'Avoid 0.0.0.0 (and :0) as preferred IP' from Pavel Emelyanov Despite docs discourage from using INADDR_ANY as listen address, this is not disabled in code. Worse -- some snitch drivers may gossip it around as the INTERNAL_IP state. This set prevents this from happening and also adds a sanity check not to use this value if it somehow sneaks in. Closes #11846 * github.com:scylladb/scylladb: messaging_service: Deny putting INADD_ANY as preferred ip messaging_service: Toss preferred ip cache management gossiping_property_file_snitch: Dont gossip INADDR_ANY preferred IP gossiping_property_file_snitch: Make _listen_address optional	2022-11-16 08:30:42 +02:00
Pavel Emelyanov	bd48fdaad5	Merge 'handle_state_normal: do not update topology of removed endpoint' from Benny Halevy Currently, when replacing a node ip, keeping the old host, we might end up with the the old endpoint in system.peers if it is inserted back into the topology by `handle_state_normal` when on_join is called with the old endpoint. Then, later on, on_change sees that: ``` if (get_token_metadata().is_member(endpoint)) { co_await do_update_system_peers_table(endpoint, state, value); ``` As described in #11925. Fixes #11925 Closes #11930 * github.com:scylladb/scylladb: storage_service, system_keyspace: add debugging around system.peers update storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed	2022-11-14 13:58:28 +03:00
Botond Dénes	f1a039fc2b	treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} We just added a convenience static factory method for partition start, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Benny Halevy	38d8777d42	storage_service, system_keyspace: add debugging around system.peers update Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 14:45:47 +02:00
Benny Halevy	5401b6055c	storage_service: handle_state_normal: update topology and notify_joined endpoint only if not removed Currently, when replacing a node ip, keeping the old host, we might end up with the the old endpoint in system.peers if it is inserted back into the topology by `handle_state_normal` when on_join is called with the old endpoint. Then, later on, on_change sees that: ``` if (get_token_metadata().is_member(endpoint)) { co_await do_update_system_peers_table(endpoint, state, value); ``` As described in #11925. Fixes #11925 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 14:45:22 +02:00
Gleb Natapov' via ScyllaDB development	2100a8f4ca	service: raft: demote configuration change error to warning since it is retried anyway Message-Id: <Y2ohbFtljmd5MNw0@scylladb.com>	2022-11-09 00:09:39 +01:00
Kamil Braun	e086521c1a	direct_failure_detector: get rid of complex `endpoint_id` translations The direct failure detector operates on abstract `endpoint_id`s for pinging. The `pigner` interface is responsible for translating these IDs to 'real' addresses. Earlier we used two types of addresses: IP addresses in 'production' code (`gms::gossiper::direct_fd_pinger`) and `raft::server_id`s in test code (in `randomized_nemesis_test`). For each of these use cases we would maintain mappings between `endpoint_id`s and the address type. In recent commits we switched the 'production' code to also operate on Raft server IDs, which are UUIDs underneath. In this commit we switch `endpoint_id`s from `unsigned` type to `utils::UUID`. Because each use case operates in Raft server IDs, we can perform a simple translation: `raft_id.uuid()` to get an `endpoint_id` from a Raft ID, `raft::server_id{ep_id}` to obtain a Raft ID from an `endpoint_id`. We no longer have to maintain complex sharded data structures to store the mappings.	2022-11-04 09:38:08 +01:00
Kamil Braun	bdeef77f20	service/raft: ping `raft::server_id`s, not `gms::inet_address`es Whenever a Raft configuration change is performed, `raft::server` calls `raft_rpc::add_server`/`raft_rpc::remove_server`. Our `raft_rpc` implementation has a function, `_on_server_update`, passed in the constructor, which it called in `add_server`/`remove_server`; that function would update the set of endpoints detected by the direct failure detector. `_on_server_update` was passed an IP address and that address was added to / removed from the failure detector set (there's another translation layer between the IP addresses and internal failure detector 'endpoint ID's; but we can ignore it for the purposes of this commit). Therefore: the failure detector was pinging a certain set of IP addresses. These IP addresses were updated during Raft configuration changes. To implement the `is_alive(raft::server_id)` function (required by `raft::failure_detector` interface), we would translate the ID using the Raft address map, which is currently also updated during configuration changes, to an IP address, and check if that IP address is alive according to the direct failure detector (which maintained an `_alive_set` of type `unordered_set<gms::inet_address>`). This all works well but it assumes that servers can be identified using IP addresses - it doesn't play well with the fact that servers may change their IP addresses. The only immutable identifier we have for a server is `raft::server_id`. In the future, Raft configurations will not associate IP addresses with Raft servers; instead we will assume that IP addresses can change at any time, and there will be a different mechanism that eventually updates the Raft address map with the latest IP address for each `raft::server_id`. To prepare us for that future, in this commit we no longer operate in terms of IP addresses in the failure detector, but in terms of `raft::server_id`s. Most of the commit is boilerplate, changing `gms::inet_address` to `raft::server_id` and function/variable names. The interesting changes are: - in `is_alive`, we no longer need to translate the `raft::server_id` to an IP address, because now the stored `_alive_set` already contains `raft::server_id`s instead of `gms::inet_address`es. - the `ping` function now takes a `raft::server_id` instead of `gms::inet_address`. To send the ping message, we need to translate this to IP address; we do it by the `raft_address_map` pointer introduced in an earlier commit. Thus, there is still a point where we have to translate between `raft::server_id` and `gms::inet_address`; but observe we now do it at the last possible moment - just before sending the message. If we have no translation, we consider the `ping` to have failed - it's equivalent to a network failure where no route to a given address was found.	2022-11-04 09:38:08 +01:00
Kamil Braun	ac70a05c7e	service/raft: store `raft_address_map` reference in `direct_fd_pinger` The pinger will use the map to translate `raft::server_id`s to `gms::inet_address`es when pinging.	2022-11-04 09:38:08 +01:00
Kamil Braun	2c20f2ab9d	gms: gossiper: move `direct_fd_pinger` out to a separate service In later commit `direct_fd_pinger` will operate in terms of `raft::server_id`s. Decouple it from `gossiper` since we don't want to entangle `gossiper` with Raft-specific stuff.	2022-11-04 09:38:08 +01:00
Pavel Emelyanov	efbfcdb97e	Merge 'Replicate `raft_address_map` non-expiring entries to other shards' from Kamil Braun Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong. Closes #11791 * github.com:scylladb/scylladb: test/raft: raft_address_map_test: add replication test service/raft: raft_address_map: replicate non-expiring entries to other shards service/raft: raft_address_map: assert when entry is missing in drop_expired_entries service/raft: turn raft_address_map into a service	2022-11-03 18:34:42 +03:00
Kamil Braun	7d84007fd5	service/raft: raft_address_map: replicate non-expiring entries to other shards Replicating `raft_address_map` entries is needed for the following use cases: - the direct failure detector - currently it assumes a static mapping of `raft::server_id`s to `gms::inet_address`es, which is obtained on Raft group 0 configuration changes. To handle dynamic mappings we need to modify the failure detector so it pings `raft::server_id`s and obtains the `gms::inet_address` before sending the message from `raft_address_map`. The failure detector is sharded, so we need the mappings to be available on all shards. - in the future we'll have multiple Raft groups running on different shards. To send messages they'll need `raft_address_map`. Initially I tried to replicate all entries - expiring and non-expiring. The implementation turned out to be very complex - we need to handle dropping expired entries and refreshing expiring entries' timestamps across shards, and doing this correctly while accounting for possible races is quite problematic. Eventually I arrived at the conclusion that replicating only non-expiring entries, and furthermore allowing non-expiring entries to be added only on shard 0, is good enough for our use cases: - The direct failure detector is pinging group 0 members only; group 0 members correspond exactly to the non-expiring entries. - Group 0 configuration changes are handled on shard 0, so non-expiring entries are added/removed on shard 0. - When we have multiple Raft groups, we can reuse a single Raft server ID for all Raft servers running on a single node belonging to different groups; they are 'namespaced' by the group IDs. Furthermore, every node has a server that belongs to group 0. Thus for every Raft server in every group, it has a corresponding server in group 0 with the same ID, which has a non-expiring entry in `raft_address_map`, which is replicated to all shards; so every group will be able to deliver its messages. With these assumptions the implementation is short and simple. We can always complicate it in the future if we find that the assumptions are too strong.	2022-10-31 09:17:12 +01:00
Kamil Braun	acacbad465	service/raft: raft_address_map: assert when entry is missing in drop_expired_entries	2022-10-31 09:17:12 +01:00
Kamil Braun	159bb32309	service/raft: turn raft_address_map into a service	2022-10-31 09:17:10 +01:00
Botond Dénes	2c021affd1	Merge 'storage_service, repair: use per-shard abort_source' from Benny Halevy Prevent copying shared_ptr across shards in do_sync_data_using_repair by allocating a shared_ptr<abort_source> per shard in node_ops_meta_data and respectively in node_ops_info. Fixes #11826 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #11827 * github.com:scylladb/scylladb: repair: use sharded abort_source to abort repair_info repair: node_ops_info: add start and stop methods storage_service: node_ops_abort_thread: abort all node ops on shutdown storage_service: node_ops_abort_thread: co_return only after printing log message storage_service: node_ops_meta_data: add start and stop methods repair: node_ops_info: prevent accidental copy	2022-10-31 09:43:34 +02:00
Benny Halevy	9ef2631ec2	api, service: storage_service: removenode: allow passing ignore_nodes as uuid:s Currently the api is inconsistent: requiring a uuid for the host_id of the node to be removed, while the ignored nodes list is given as comma-separated ip addresses. Instead, support identifying the ignored_nodes either by their host_id (uuid) or ip address. Also, require all ignore_nodes to be of the same kind: either UUIDs or ip addresses, as a mix of the 2 is likely indicating a user error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:49:03 +03:00
Benny Halevy	40cd685371	storage_service: get_ignore_dead_nodes_for_replace: use tm.parse_host_id_and_endpoint Allow specifying the dead node to ignore either as host_id or ip address. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00
Benny Halevy	340a5a0c94	api: storage_service: remove_node: validate host_id The node to be removed must be identified by its host_id. Validate that at the api layer and pass the parsed host_id down to storage_service::removenode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00
Pavel Emelyanov	aa7a759ac9	messaging_service: Toss preferred ip cache management Make it call cache_preferred_ip() even when the cache is loaded from system_keyspace and move the connection reset there. This is mainly to prepare for the next patch, but also makes the code a bit shorter Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2022-10-27 14:25:43 +03:00
Benny Halevy	88f993e5ed	repair: node_ops_info: add start and stop methods Prepare for adding a sharded<abort_source> member. Wire start/stop in storage_service::node_ops_meta_data. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:18:30 +03:00
Benny Halevy	c2f384093d	storage_service: node_ops_abort_thread: abort all node ops on shutdown A later patch adds a sharded<abort_source> to node_ops_info. On shutdown, we must orderly stop it, so use node_ops_abort_thread shutdown path (where node_ops_singal_abort is called will a nullopt) to abort (and stop) all outstanding node_ops by passing a null_uuid to node_ops_abort, and let it iterate over all node ops to abort and stop them. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:06 +03:00
Benny Halevy	0efd290378	storage_service: node_ops_abort_thread: co_return only after printing log message Currently the function co_returns if (!uuid_opt) so the log info message indicating it's stopped is not printed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Benny Halevy	47e4761b4e	storage_service: node_ops_meta_data: add start and stop methods Prepare for starting and stopping repair node_ops_info Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Benny Halevy	5c25066ea7	repair: node_ops_info: prevent accidental copy Delete node_ops_info copy and move constructors before we add a sharded<abort_source> member for the per-shard repairs in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-27 12:14:03 +03:00
Pavel Emelyanov	64c9359443	storage_proxy: Don't use default-initialized endpoint in get_read_executor() After calling filter_for_query() the extra_replica to speculate to may be left default-initialized which is :0 ipv6 address. Later below this address is used as-is to check if it belongs to the same DC or not which is not nice, as :0 is not an address of any existing endpoint. Recent move of dc/rack data onto topology made this place reveal itself by emitting the internal error due to :0 not being present on the topology's collection of endpoints. Prior to this move the dc filter would count :0 as belonging to "default_dc" datacenter which may or may not match with the dc of the local node. The fix is to explicitly tell set extra_replica from unset one. fixes: #11825 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #11833	2022-10-25 09:16:50 +03:00

1 2 3 4 5 ...

3124 Commits