scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-28 04:06:59 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	ed55261650	treewide: distinguish all nodes from all token owners In one of the following patches, we introduce support for zero-token nodes. From that point, getting all nodes and getting all token owners isn't equivalent. In this patch, we ensure that we consider only token owners when we want to consider only token owners (for example, in the replication logic), and we consider all nodes when we want to consider all nodes (for example, in the topology logic). The main purpose of this patch is to make the PR introducing zero-token nodes easier to review. The patch that introduces zero-token nodes is already complicated. We don't want trivial changes from this patch to make noise there. This patch introduces changes needed for zero-token nodes only in the Raft-based topology and in the recovery mode. Zero-token nodes are unsupported in the gossip-based topology outside recovery. Some functions added to `token_metadata` and `topology` are inefficient because they compute a new data structure in every call. They are never called in the hot path, so it's not a serious problem. Nevertheless, we should improve it somehow. Note that it's not obvious how to do it because we don't want to make `token_metadata` store topology-related data. Similarly, we don't want to make `topology` store token-related data. We can think of an improvement in a follow-up. We don't remove unused `topology::get_datacenter_rack_nodes` and `topology::get_datacenter_nodes`. These function can be useful in the future. Also, `topology::_dc_nodes` is used internally in `topology`.	2024-08-29 10:37:07 +02:00
Patryk Jędrzejczak	366605224c	token_metadata: rename get_all_endpoints and get_all_ips In one of the following patches, we introduce support for zero-token nodes. A zero-token node that has successfully joined the cluster is in the normal state but is not a normal token owner. Hence, the names of `get_all_endpoints` and `get_all_ips` become misleading. They should specify that the functions return only IDs/IPs of token owners.	2024-08-29 10:37:07 +02:00
Benny Halevy	7c2bd8dc34	locator: host_id_or_endpoint: keep value as variant Rather than allowing to keep both host_id and endpoint, keep only one of them and provide resolve functions that use the token_metadata to resolve the host_id into an inet_address or vice verse. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2024-04-14 15:25:50 +03:00
Tomasz Grabiec	ef9e5e64a3	locator: token_metadata: Introduce topology barrier stall detector When topology barrier is blocked for longer than configured threshold (2s), stale versions are marked as stalled and when they get released they report backtrace to the logs. This should help to identify what was holding for token metadata pointer for too long. Example log: token_metadata - topology version 30 held for 299.159 [s] past expiry, released at: 0x2397ae1 0x23a36b6 ... Closes scylladb/scylladb#17427	2024-02-21 15:05:34 +02:00
Avi Kivity	605bf6e221	range.hh: retire range.hh was deprecated in `bd794629f9` (2020) since its names conflict with the C++ library concept of an iterator range. The name ::range also mapped to the dangerous wrapping_interval rather than nonwrapping_interval. Complete the deprecation by removing range.hh and replacing all the aliases by the names they point to from the interval library. Note this now exposes uses of wrapping intervals as they are now explicit. The unit tests are renamed and range.hh is deleted. Closes scylladb/scylladb#17428	2024-02-21 00:24:25 +02:00
Kefu Chai	76b9e4f4f4	locator: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16914	2024-01-23 09:12:23 +02:00
Petr Gusev	11a4908683	token_metadata: add_replacing_endpoint: forbid replacing node with itself This used to work before in replace-with-same-ip scenario, but with host_id-s it's no longer relevant. base_token_metadata has been removed from topology_change_info because the conditions needed for its creation are no longer met.	2023-12-12 23:19:54 +04:00
Petr Gusev	8c551f9104	dc_rack_fn: make it non-template	2023-12-12 23:19:54 +04:00
Petr Gusev	7b55ccbd8e	token_metadata: drop the template Replace token_metadata2 ->token_metadata, make token_metadata back non-template. No behavior changes, just compilation fixes.	2023-12-12 23:19:54 +04:00
Petr Gusev	799f747c8f	shared_token_metadata: switch to the new token_metadata	2023-12-12 23:19:54 +04:00
Petr Gusev	f53f34f989	storage_service: get_token_to_endpoint_map: use new token_metadata The token_metadata::get_normal_and_bootstrapping_token_to_endpoint_map method was used only here. It's inlined in this commit since it's too specific and incurs the overhead of creating an intermediate map.	2023-12-12 23:19:53 +04:00
Petr Gusev	5a1418fdba	token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known This commit fixes an inconsistency in method names: get_host_id and get_host_id_if_known are (internal_error, returns null), but there was only one method for the opposite conversion - get_endpoint_for_host_id, and it returns null. In this commit we change it to on_internal_error if it can't find the argument and add another method get_endpoint_for_host_id_if_known which returns null in this case. We can't use get_endpoint_for_host_id/get_host_id in host_id_or_endpoint::resolve since it's called from storage_service::parse_node_list -> token_metadata::parse_host_id_and_endpoint, and exceptions are caught and handled in `storage_service::parse_node_list`.	2023-12-11 12:51:34 +04:00
Petr Gusev	39bbe5f457	token_metadata: add get_all_ips method This is convenient for migrating code that uses get_all_endpoints.	2023-12-11 12:51:34 +04:00
Petr Gusev	9edf0709e6	token_metadata: support host_id-based version In this commit we enhance token_metadata with a pointer to the new host_id-based generic_token_metadata specialisation (token_metadata2). The idea is that in the following commits we'll go over all token_metadata modifications and make the corresponding modifications to its new host_id-based alternative. The pointer to token_metadata2 is stored in the generic_token_metadata::_new_value field. The pointer can be mutable, immutable, or absent altogether (std::monostate). It's mutable if this generic_token_metadata owns it, meaning it was created using the generic_token_metadata(config cfg) constructor. It's immutable if the generic_token_metadata(lw_shared_ptr<const token_metadata2> new_value); constructor was used. This means this old token_metadata is a wrapper for new token_metadata and we can only use the get_new() method on it. The field _new_value is empty for the new host_id-based token_metadata version. The generic_token_metadata(std::unique_ptr<token_metadata_impl<NodeId>> impl, token_metadata2 new_value); constructor is used for clone methods. We clone both versions, and we need to pass a cloned token_metadata2 into constructor. There are two overloads of get_new, for mutable and immutable generic_token_metadata. Both of them throws an exception if they can't get the appropriate pointer. There is also a get_new_strong method, which returns an immutable owning pointer. This is convenient since a lot of API's want an owning pointer. We can't make the get_new/get_new_strong API simpler and use get_new_strong everywhere since it mutate the original generic_token_metadata by incrementing the reference counter and this causes raises when it's passed between shards in replicate_to_all_cores.	2023-12-11 12:51:34 +04:00
Petr Gusev	63f64f3303	token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter. generic_token_metadata::update_topology overload with host_id parameter is added to make update_topology_change_info work, it now uses NodeId as a parameter type. topology::remove_endpoint(host_id) is added to make generic_token_metadata::remove_endpoint(NodeId) work. pending_endpoints_for and endpoints_for_reading are just removed - they are not used and not implemented. The declarations were left by mistake from a refactoring in which these methods were moved to erm. generic_token_metadata_base is extracted to contain declarations, common to both token_metadata versions. Templates are explicitly instantiated inside token_metadata.cc, since implementation part is also a template and it's not exposed to the header. There are no other behavioral changes in this commit, just syntax fixes to make token_metadata a template.	2023-12-11 12:51:34 +04:00
Petr Gusev	c9fbe3d377	locator: make dc_rack_fn a template In the next commits token_metadata will be made a template with NodeId=inet_address\|host_id parameter. This parameter will be passed to dc_rack_fn function, so it also should be made a template.	2023-12-11 12:51:33 +04:00
Petr Gusev	2f137776c3	token_metadata: topology_change_info: change field types to token_metadata_ptr In subsequent commits we'll need the following api for token_metadata: token_metadata(token_metadata2_ptr); get_new() -> token_metadata2* where token_metadata2 is the new version of token_metadata, based on host_id. In other words: * token_metadata knows the new version of itself and returns a pointer to it through get_new() * token_metadata can be constructed based solely on the new version, without its own implementation. In this case the only method we can use on it is get_new. This allows to pass token_metadata2 to API's with token_metadata in method signature, if these APIs are known to only use the get_new method on the passed token_metadata. And back to topology_change_info - if we got it from the new token_metadata we want to be able to construct token_metadata from token_metadata2 contained in it, and this requires it to be a ptr, not value.	2023-12-11 12:51:33 +04:00
Petr Gusev	f21f23483c	token_metadata: drop unused method get_endpoint_to_token_map_for_reading	2023-12-11 12:51:22 +04:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Benny Halevy	6de1cc2993	locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh define token_metadata_ptr in token_metadata_fwd.hh So that the declaration of `make_splitter` can be moved to token_range_splitter.hh, where it belongs, and so token_metadata.hh won't have to include it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:01:29 +02:00
Benny Halevy	7119c1d8cc	token_metadata: update_topology: make endpoint_dc_rack arg optional It's better to pass a disengaged optional when the caller doesn't have the information rather than passing the default dc_rack location so the latter will never implicitly override a known endpoint dc/rack location. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15300	2023-09-11 16:16:19 +02:00
Tomasz Grabiec	f2fdf37415	token_metadata: Add non-const getter of tablet_metadata Needed for tests.	2023-07-25 21:08:51 +02:00
Tomasz Grabiec	e110167a2a	locator: Store node shard count in topology Will be needed by tablet allocator.	2023-06-21 00:58:25 +02:00
Petr Gusev	f6b019c229	raft topology: add fence_version It's stored outside of topology table, since it's updated not through RAFT, but with a new 'fence' raft command. The current value is cached in shared_token_metadata. An initial fence version is loaded in main during storage_service initialisation.	2023-06-15 15:48:00 +04:00
Petr Gusev	4f99302c2b	raft_topology: add barrier_and_drain cmd We use utils::phased_barrier. The new phase is started each time the version is updated. We track all instances of token_metadata, when an instance is destroyed the corresponding phased_barrier::operation is released.	2023-06-15 15:48:00 +04:00
Petr Gusev	253d8a8c65	token_metadata: add topology version It's stored in as a static column in topology table, will be updated at various steps of the topology change state machine. The initial value is 1, zero means that topology versions are not yet supported, will be used in RPC handling.	2023-06-15 15:48:00 +04:00
Petr Gusev	5976277c2c	token_metadata: drop has_pending_ranges and migration_info Use the new erm::has_pending_ranges function, drop the old implementation from token_metadata.	2023-05-21 13:17:42 +04:00
Petr Gusev	8cb709d3d6	token_metadata: drop update_pending_ranges The function storage_service::update_pending_ranges is turned to update_topology_changes_info. The pending_endpoints and read_endpoints will be computed later, when the erms are rebuilt.	2023-05-21 13:17:42 +04:00
Petr Gusev	87307781c4	effective_replication_map: use new get_pending_endpoints and get_endpoints_for_reading We already use the new pending_endpoints from erm though the get_pending_ranges virtual function, in this commit we update all the remaining places to use the new implementation in erm, as well as remove the old implementation in token_metadata.	2023-05-21 13:17:42 +04:00
Petr Gusev	10bf8c7901	token_metadata: introduce topology_change_info We plan to move pending_endpoints and read_endpoints, along with their computation logic, from token_metadata to vnode_effective_replication_map. The vnode_effective_replication_map seems more appropriate for them since it contains functionally similar _replication_map and we will be able to reuse pending_endpoints/read_endpoints across keyspaces sharing the same factory_key. At present, pending_endpoints and read_endpoints are updated in the update_pending_ranges function. The update logic comprises two parts - preparing data common to all keyspaces/replication_strategies, and calculating the migration_info for specific keyspaces. In this commit, we introduce a new topology_change_info structure to hold the first part's data add create an update_topology_change_info function to update it. This structure will later be used in vnode_effective_replication_map to compute pending_endpoints and read_endpoints. This enables the reuse of topology_change_info across all keyspaces, unlike the current update_pending_ranges implementation, which is another benefit of this refactoring. The update_topology_change_info implementation is mostly derived from update_pending_ranges, there are a few differences though: * replacing async and thread with plain co_awaits; * adding a utils::clear_gently call for the previous value to mitigate reactor stalls if target_token_metadata grows large; * substituting immediately invoked lambdas with simple variables and blocks to reduce noise, as lambdas would need to be converted into coroutines. The original update_pending_ranges remains unchanged, and will be removed entirely upon transitioning to the new implementation. Meanwhile, we add an update_topology_change_info call to storage_service::update_pending_ranges so that we can iteratively switch the system to the new implementation.	2023-05-19 19:04:43 +04:00
Petr Gusev	51e80691ef	token_metadata: replace set_topology_transition_state with set_read_new This helps isolate topology::transition_state dependencies, token_metadata doesn't need the entire enum, just this boolean flag.	2023-05-19 19:04:43 +04:00
Petr Gusev	0e4e2df657	token_metadata: add endpoints for reading In this patch we add token_metadata::set_topology_transition_state method. If the current state is write_both_read_new update_pending_ranges will compute new ranges for read requests. The default value of topology_transition_state is null, meaning no read ranges are computed. We will add the appropriate set_topology_transition_state calls later. Also, we add endpoints_for_reading method to get read endpoints based on the computed ranges.	2023-05-09 18:41:59 +04:00
Petr Gusev	56c2b3e893	token_metadata_impl: refactor update_pending_ranges Now update_pending_ranges is quite complex, mainly because it tries to act efficiently and update only the affected intervals. However, it uses the function abstract_replication_strategy::get_ranges, which calls calculate_natural_endpoints for every token in the ring anyway. Our goal is to start reading from the new replicas for ranges in write_both_read_new state. In the current code structure this is quite difficult to do, so in this commit we first simplify update_pending_ranges. The main idea of the refactoring is to build a new version of token_metadata based on all planned changes (join, bootstrap, replace) and then for each token range compare the result of calculate_natural_endpoints on the old token_metadata and on the new one. Those endpoints that are in the new version and are not in the old version should be added to the pending_ranges. The add_mapping function is extracted for the future - we are going to use it to handle read mappings. Special care is taken when replacing with the same IP. The coordinator employs the get_natural_endpoints_without_node_being_replaced function, which excludes such endpoints from its result. If we compare the new (merged) and current token_metadata configurations, such endpoints will also be absent from pending_endpoints since they exist in both. To address this, we copy the current token_metadata and remove these endpoints prior to comparison. This ensures that nodes being replaced are treated like those being deleted.	2023-05-09 13:56:28 +04:00
Tomasz Grabiec	e6b76ac4b9	dht: token_metadata: Introduce get_my_id()	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	fceb5f8cf6	locator: Introduce tablet_metadata token_metadata now stores tablet metadata with information about tablets in the system.	2023-04-24 10:49:37 +02:00
Tomasz Grabiec	34a9c62ae5	locator: token_metadata: Fix confusing comment on ring_range() It could be interpreted to mean that the search token is excluded.	2023-04-24 10:49:36 +02:00
Tomasz Grabiec	e4865bd4d1	dht, storage_proxy: Abstract token space splitting Currently, scans are splitting partition ranges around tokens. This will have to change with tablets, where we should split at tablet boundaries. This patch introduces token_range_splitter which abstracts this task. It is provided by effective_replication_map implementation.	2023-04-24 10:49:36 +02:00
Benny Halevy	e635aa30d6	token_metadata: get endpoint to node map from topology Don't maintain a "shadow" endpoint_to_host_id_map in token_metadata_impl. Instead, get the nodes_by_endpoint map from topology and use it to build the endpoint_to_host_id_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-11 15:48:30 +03:00
Benny Halevy	c17df1759e	topology: add node state Add a simple node state model with: `joining`, `normal`, `leaving`, and `left` states to help managing nodes during replace with the the same ip address. Later on, this could also help prevent nodes that were decommissioned, removed, or replaced from rejoining the cluster. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:18:31 +03:00
Benny Halevy	f3d5df5448	locator: add class node And keep per node information (idx, host_id, endpoint, dc_rack, is_pending) in node objects, indexed by topology on several indices like: idx, host_id, endpoint, current/pending, per dc, per dc/rack. The node index is a shorthand identifier for the node. node* and index are valid while the respective topology instance is valid. To be used, the caller must hold on to the topology / token_metadata object (e.g. via a token_metadata_ptr or effective_replication_map) Refs #6403 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> topology: add node idx Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:13:02 +03:00
Benny Halevy	fd1a2591b5	shared_token_metadata: mutate_token_metadata: replicate to all shards storage_service::replicate_to_all_cores has a sophisticated way to mutate the token_metadata and effective_replication_map on shard 0 and cloning those to all other shards, applying the changes only mutate and clone succeeded on all shards so we don't end up with only some of the shards with the mutated copy if an error happend mid-way (and then we would need to roll-back the change for exception safety). shared_token_metadata::mutate_token_metadata is currently only called from a unit test that needs to mutate the token metadata only on shard 0, but a following patch will require doing that on all shards. This change adds this capbility by enforcing the call to be on shard 0m mutating the token_metdata into a temporary pending copy and cloning it on all other shards. Only then, when all shard succeeded, set the modified token_metadata on all shards. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-02 20:07:17 +03:00
Benny Halevy	68141d0aac	topology: get rid of pending state Now, with `a44ca06906`, is_normal_token_owner that replaced is_member does not rely anymore on the pending status of endpoints in topology. With that we can get rid of this state and just keep all endpoints we know about in the topology. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-12-13 14:17:18 +02:00
Asias He	4571fcf9e7	token_metadata: Rename is_member to is_normal_token_owner The name is_normal_token_owner is more clear than is_member. The is_normal_token_owner reflects what it really checks.	2022-11-18 09:29:20 +08:00
Asias He	965097cde5	token_metadata: Add docs for is_member Make it clear, is_member checks if a node is part of the token ring and checks nothing else.	2022-11-18 09:28:56 +08:00
Avi Kivity	76be6402ed	Merge 'repair: harden effective replication map' from Benny Halevy As described in #11993 per-shard repair_info instances get the effective_replication_map on their own with no centralized synchronization. This series ensures that the effective replication maps used by repair (and other associated structures like the token metadata and topology) are all in sync with the one used to initiate the repair operation. While at at, the series includes other cleanups in this area in repair and view that are not fixes as the calls happen in synchronous functions that do not yield. Fixes #11993 Closes #11994 * github.com:scylladb/scylladb: repair: pass erm down to get_hosts_participating_in_repair and get_neighbors repair: pass effective_replication_map down to repair_info repair: coroutinize sync_data_using_repair repair: futurize do_repair_start effective_replication_map: add global_effective_replication_map shared_token_metadata: get_lock is const repair: sync_data_using_repair: require to run on shard 0 repair: require all node operations to be called on shard 0 repair: repair_info: keep effective_replication_map repair: do_repair_start: use keyspace erm to get keyspace local ranges repair: do_repair_start: use keyspace erm for get_primary_ranges repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc repair: do_repair_start: check_in_shutdown first repair: get_db().local() where needed repair: get topology from erm/token_metdata_ptr view: get_view_natural_endpoint: get topology from erm	2022-11-17 13:29:02 +02:00
Benny Halevy	2c677e294b	shared_token_metadata: get_lock is const The lock is acquired using an a function that doesn't modify the shared_token_metadata object. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-17 07:58:21 +02:00
Benny Halevy	d0bd305d16	locator: refactor topology out of token_metadata Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:55:54 +02:00
Benny Halevy	297a4de4e4	locator: add types.hh To export low-level types that are used by oher modules for the locator interfaces. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 21:53:05 +02:00
Benny Halevy	0c94ffcc85	topology: delete copy constructor Topology is copied only from token_metadata_impl::clone_only_token_map which copies the token_metadata_impl with yielding to prevent reactor stalls. This should apply to topology as well, so add a clone_gently function for cloning the topology from token_metadata_impl::clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-16 15:27:28 +02:00
Benny Halevy	b74807cb8a	locator: token_metadata: add parse_host_id_and_endpoint To be used for specifying nodes either by their host_id or ip address and using the token_metadata to resolve the mapping. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-10-28 07:38:13 +03:00

1 2 3 4

184 Commits