scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 14:15:46 +00:00

Author	SHA1	Message	Date
Juliusz Stasiewicz	b6fb5ee912	locator: Check DC names in NTS The same trick is used as in C*: `79e693e16e/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java (L241)` Fixes #7595	2021-02-09 07:04:17 +01:00
Pavel Emelyanov	d3ee8774ad	storage-service: Subscribe to snitch to update topology Currently snitch explicitly calls storage service (if it's initialized) to update topology on snitch data change. Instead of it -- make storage service subscribe on the snitch reconfigure signal upon creation. This finally makes snitch fully independent from storage service. In tests the snitch instance is not created, so check for it before subscribing. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-01-13 16:41:34 +03:00
Pavel Emelyanov	d1a2d0f894	snitch: Introduce reconfiguration signal Add a notifier to snitch_base that gets triggered when the snitch configuration changes to which others may subscribe. For now only the gossiping-file-snitch triggers it when it re-reads its config file. Other existing snitches are kinda static in this sense. The subscribe-trigger engine is based on scoped connection from boost::signals2. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-01-13 16:41:34 +03:00
Pavel Emelyanov	ca336409d7	snitch: Always gossip snitch info itself The gossiping_property_file_snitch updates the gossip RACK and DC values upon config change. Right now this is done with the help of storage service, but the needed code to gossip rack and dc is already available in the snitch itself. Said that -- gossip snitch info by snitch helper and remove the storage_service's one. This makes the 2nd step decoupling snitch and storage service. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-01-13 16:41:34 +03:00
Pavel Emelyanov	99e71bd1f6	snitch: Do gossip DC and RACK itself This is the 2nd step in generalizing the snitch data gossiping and at the same the 1st step in decoupling storage service and snitch. During start storage service starts gossiper, which notifies the snicth with .gossiper_starting() call, then the storage service calls gossip_snitch_info. This patch makes snitch itself do the last step. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-01-13 16:41:34 +03:00
Pavel Emelyanov	bc1a3a358d	snitch: Add generic gossiping helper Nowadays some snitch implementations gossip the INTERNAL_IP value and storage_service gossip RACK and DC for all of them. This functionality is going to be generalized and the first step is in making a common method for a snitch to gossip its data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-01-13 16:41:34 +03:00
Benny Halevy	322aa2f8b5	token_metadata: add clear_gently clear_gently gently clears the token_metadata members. It uses continuations to allow yielding if needed to prevent reactor stalls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 11:22:21 +02:00
Benny Halevy	56aa49ca81	token_metadata: shared_token_metadata: add mutate_token_metadata mutate_token_metadata acquires the shared_token_metadata lock, clones the token_metadata (using clone_async) and calls an asynchronous functor on the cloned copy of the token_metadata to mutate it. If the functor is successful, the mutated clone is set back to to the shared_token_metadata, otherwise, the clone is destroyed. With that, get rid of shared_token_metadata::clone Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 11:22:19 +02:00
Benny Halevy	e089c22ec1	token_metdata: futurize update_normal_tokens The function complexity if O(#tokens) in the worst case as for each endpoint token to traverses _token_to_endpoint_map lineraly to erase the endpoint mapping if it exists. This change renames the current implementation of update_normal_tokens to update_normal_tokens_sync and clones the code as a coroutine that returns a future and may yield if needed. Eventually we should futurize the whole token_metadata and abstract_replication_strategy interface and get rid of the synchronous functions. Until then the sync version is still required from call sites that are neither returning a future nor run in a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 10:35:15 +02:00
Benny Halevy	e7f4cd89a9	abstract_replication_strategy: get_pending_address_ranges: invoke clone_only_token_map if can_yield Optimize the can_yield case by invoking the futurized version of clone_only_token_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 09:49:08 +02:00
Asias He	829b4c1438	repair: Make removenode safe by default Currently removenode works like below: - The coordinator node advertises the node to be removed in REMOVING_TOKEN status in gossip - Existing nodes learn the node in REMOVING_TOKEN status - Existing nodes sync data for the range it owns - Existing nodes send notification to the coordinator - The coordinator node waits for notification and announce the node in REMOVED_TOKEN Current problems: - Existing nodes do not tell the coordinator if the data sync is ok or failed. - The coordinator can not abort the removenode operation in case of error - Failed removenode operation will make the node to be removed in REMOVING_TOKEN forever. - The removenode runs in best effort mode which may cause data consistency issues. It means if a node that owns the range after the removenode operation is down during the operation, the removenode node operation will continue to succeed without requiring that node to perform data syncing. This can cause data consistency issues. For example, Five nodes in the cluster, RF = 3, for a range, n1, n2, n3 is the old replicas, n2 is being removed, after the removenode operation, the new replicas are n1, n5, n3. If n3 is down during the removenode operation, only n1 will be used to sync data with the new owner n5. This will break QUORUM read consistency if n1 happens to miss some writes. Improvements in this patch: - This patch makes the removenode safe by default. We require all nodes in the cluster to participate in the removenode operation and sync data if needed. We fail the removenode operation if any of them is down or fails. If the user want the removenode operation to succeed even if some of the nodes are not available, the user has to explicitly pass a list of nodes that can be skipped for the operation. $ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id> Example restful api: $ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5" - The coordinator can abort data sync on existing nodes For example, if one of the nodes fails to sync data. It makes no sense for other nodes to continue to sync data because the whole operation will fail anyway. - The coordinator can decide which nodes to ignore and pass the decision to other nodes Previously, there is no way for the coordinator to tell existing nodes to run in strict mode or best effort mode. Users will have to modify config file or run a restful api cmd on all the nodes to select strict or best effort mode. With this patch, the cluster wide configuration is eliminated. Fixes #7359 Closes #7626	2020-12-10 10:14:39 +02:00
Benny Halevy	157a964a63	locator: extract can_yield to utils/maybe_yield.hh Move the definition of bool_class can_yield to a standalone header file and define there a maybe_yield(can_yield) helper. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-24 12:23:56 +02:00
Calle Wilund	9f48dc7dac	locator::ec2_multi_region_snitch: Handle ipv6 broadcast/public ip Fixes #7064 Iff broadcast address is set to ipv6 from main (meaning prefer ipv6), determine the "public" ipv6 address (which should be the same, but might not be), via aws metadata query. Closes #7633	2020-11-18 12:48:25 +02:00
Piotr Jastrzebski	661b52c7df	token_metadata: Remove std::iterator from tokens_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Piotr Jastrzebski	87bf577450	token_metadata: Remove std::iterator from tokens_iterator_impl std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Benny Halevy	275fe30628	token_metadata_impl: set_pending_ranges: add can_yield_param To prevent a > 10 ms stall when inserting to boost::icl::interval_map. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	1e2138e8ef	abstract_replication_strategy: get rid of get_ranges_in_thread Use the can_yield param to get_ranges instead. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	ba31350239	abstract_replication_strategy: add can_yield param to get_pending_ranges and friends To prevent reactor stalls as seen in #7313. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	6c2a089a6f	abstract_replication_strategy: define can_yield bool_class To be used by convention by several other methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	7fb489d338	token_metadata_impl: calculate_pending_ranges_for_* reindent Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	6ce2436a4c	token_metadata_impl: calculate_pending_ranges_for_* pass new_pending_ranges by ref We can use the seastar thread to keep the vector rather thna creating a lw_shared_ptr for it. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	0ca423dcfc	token_metadata_impl: calculate_pending_ranges_for_* call in thread The functions can be simplified as they are all now being called from a seastar thread. Make them sequential, returning void, and yielding if necessary. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	84d086dc77	token_metadata: update_pending_ranges: create seastar thread So we can yield in this path to prevent reactor stalls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	1e6c181678	abstract_replication_strategy: add get_address_ranges method for specific endpoint Some of the callers of get_address_ranges are interested in the ranges of a specific endpoint. Rather than building a map for all endpoints and then traversing it looking for this specific endpoint, build a multimap of token ranges relating only to the specified endpoint. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	2ce6773dae	token_metadata_impl: clone_after_all_left: sort tokens only once Currently the sorted tokens are copied needlessly by on this path by `clone_only_token_map` and then recalculated after calling remove_endpoint for each leaving endpoint. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	0abd8e62cd	token_metadata: futurize clone_after_all_left Call the futurized clone_only_token_map and remove the _leaving_endpoints from the cloned token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	4a622c14e1	token_metadata: futurize clone_only_token_map Does part of clone_async() using continuations to prevent stalls. Rename synchronous variant to clone_only_token_map_sync that is going to be deprecated once all its users will be futurized. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	d1a73ec7b3	token_metadata: use mutable_token_metadata_ptr in calculate_pending_ranges_for_* Replacing old code using lw_shared_ptr<token_metadata> with the "modern" mutable_token_metadata_ptr alias. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	4fc5997949	token_metadata: add clone_async Clone token_metadata object using async continuation to prevent reactor stalls. Refs https://github.com/scylladb/scylla/issues/7220 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	5ab7b0b2ea	abstract_replication_strategy: accept a token_metadata_ptr in get_pending_address_ranges methods Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	349aa966ba	abstract_replication_strategy: accept a token_metadata_ptr in get_ranges methods In preparation to returning future<dht::token_range_vector> from async variants. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	41c7efd0c0	storage_service: convert to token_metadata_ptr clone _token_metadata for updating into _updated_token_metadata and use it to update the local token_metadata on all shard via do_update_pending_ranges(). Adjust get_token_metadata to get either the update the updated_token_metadata, if available, or the base token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	fa880439c9	storage_service: use token_metadata_lock to serialize updates to token_metadata Rather than using `serialized_action`, grab a lock before mutating _token_metadata and hold it until its replicated to all shards. A following patch will use a mutable token_metadata_ptr that is updated out of line under the lock. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6d06853e6c	abstract_replication_strategy: convert to shared_token_metadata To facilitate that, keep a const shared_token_metadata& in class database rather than a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	45fb57a2ec	abstract_replication_strategy: pass token_metadata& to get_cached_endpoints Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	ade8c77a7c	abstract_replication_strategy: pass token_metadata& to do_get_natural_endpoints Rather than accessing abstract_replication_strategy::_token_metedata directly. In preparation to changing it to a shared_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	29ed59f8c4	main: start a shared_token_metadata And use it to get a token_metadata& compatible with current usage, until the services are converted to use token_metadata_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	0e739aa801	storage_service: add update_topology method Move the functionality from gossiping_property_file_snitch::reload_configuration to the storage_service class. With that we can make get_mutable_token_metadata private. TODO: update token_metadata on shard 0 and then replicate_to_all_cores rather than updating on all shards in parallel. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	5a250f529f	token_metadata: get rid of unused calculate_pending_ranges_for_* methods They are only called inernally by token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:16:23 +03:00
Benny Halevy	41e5a3a245	token_metadata: get rid of clone_after_all_settled It's unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:15:11 +03:00
Benny Halevy	105a2f5244	token_metadata_impl: remove_endpoint: do not sort tokens Call sort_tokens at the caller as all call sites from within token_metadata_impl call remove_endpoint for multiple endpoints so the tokens can be re-sorted only once, when done removing all tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:12:32 +03:00
Benny Halevy	86303f4fdd	token_metadata_impl: always sort_tokens in place No need to return the sorted tokens vector as it's always assigned to _sorted_tokens. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:08:56 +03:00
Benny Halevy	f207cff73d	token_metadata: set_pending_ranges: prep new interval_map out of line And move-assign to _pending_ranges_interval_map[keyspace_name] only when done. This is more effient since there's no need to look up _pending_ranges_interval_map[keyspace_name] for every insert to the interval_map. And it is exception safe in case we run out of memory mid-way. Refs #7220 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200916115059.788606-1-bhalevy@scylladb.com>	2020-09-16 15:28:42 +03:00
Avi Kivity	64ebb9c052	Merge 'Remove _pending_ranges and _pending_ranges_map in token_metadata' from Asias " This PR removes _pending_ranges and _pending_ranges_map in token_metadata. This removal of makes copying of token_metadata faster and reduces the chance to cause reactor stall. Refs: #7220 " * asias-token_metadata_replication_config_less_maps: token_metadata: Remove _pending_ranges token_metadata: Get rid of unused _pending_ranges_map	2020-09-15 17:16:35 +03:00
Benny Halevy	0dc45529c8	abstract_replication_strategy: get_ranges_in_thread: copy _token_metadata if func may yield Change `94995acedb` added yielding to abstract_replication_strategy::do_get_ranges. And `07e253542d` used get_ranges_in_thread in compaction_manager. However, there is nothing to prevent token_metadata, and in particular its `_sorted_tokens` from changing while iterating over them in do_get_ranges if the latter yields. Therefore copy the the replication strategy `_token_metadata` in `get_ranges_in_thread(inet_address ep)`. If the caller provides `token_metadata` to get_ranges_in_thread, then the caller must make sure that we can safely yield while accessing token_metadata (like in `do_rebuild_replace_with_repair`). Fixes #7044 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20200915074555.431088-1-bhalevy@scylladb.com>	2020-09-15 11:33:55 +03:00
Asias He	c38ec98c6e	token_metadata: Remove _pending_ranges - Remove get_pending_ranges and introduce has_pending_ranges, since the caller only needs to know if there is a pending range for the keyspace and the node. - Remove print_pending_ranges which is only used in logging. If we really want to log the new pending token ranges, we can log when we set the new pending token ranges. This removal of _pending_ranges makes copying of token_metadata faster and reduces the chance to cause reactor stall. Refs: #7220	2020-09-15 16:27:50 +08:00
Asias He	d38506fbf0	token_metadata: Get rid of unused _pending_ranges_map It is not used anymore. The size of _pending_ranges_map is is O(number of keyspaces). It can be very big when we have lots of keyspaces. Refs: #7220	2020-09-15 14:47:00 +08:00
Rafael Ávila de Espíndola	d18af34205	everywhere: Use future::get0 when appropriate This works with current seastar and clears most of the way for updating to a version that doesn't use std::tuple in futures. Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com> Message-Id: <20200826231947.1145890-1-espindola@scylladb.com>	2020-08-27 15:05:51 +03:00
Avi Kivity	0dcb16c061	Merge "Constify access to token_metadata" from Benny " We keep refrences to locator::token_metadata in many places. Most of them are for read-only access and only a few want to modify the token_metadata. Recently, in `94995acedb`, we added yielding loops that access token_metadata in order to avoid cpu stalls. To make that possible we need to make sure they token_metadata object they are traversing won't change mid-loop. This series is a first step in ensuring the serialization of updates to shared token metadata to reading it. Test: unit(dev) Dtest: bootstrap_test:TestBootstrap.start_stop_test{,_node}, update_cluster_layout_tests.py -a next-gating(dev) " * tag 'constify-token-metadata-access-v2' of github.com:bhalevy/scylla: api/http_context: keep a const sharded<locator::token_metadata>& gossiper: keep a const token_metadata& storage_service: separate get_mutable_token_metadata range_streamer: keep a const token_metadata& storage_proxy: delete unused get_restricted_ranges declaration storage_proxy: keep a const token_metadata& storage_proxy: get rid of mutable get_token_metadata getter database: keep const token_metadata& database: keyspace_metadata: pass const locator::token_metadata& around everywhere_replication_strategy: move methods out of line replication_strategy: keep a const token_metadata& abstract_replication_strategy: get_ranges: accept const token_metadata& token_metadata: rename calculate_pending_ranges to update_pending_ranges token_metadata: mark const methods token_ranges: pending_endpoints_for: return empty vector if keyspace not found token_ranges: get_pending_ranges: return empty vector if keyspace not found token_ranges: get rid of unused get_pending_ranges variant replication_strategy: calculate_natural_endpoints: make token_metadata& param const token_metadata: add get_datacenter_racks() const variant	2020-08-22 20:47:45 +03:00
Benny Halevy	2f7c529c1c	storage_service: separate get_mutable_token_metadata Use a different getter for a token_metadata& that may be changed so we can better synchronize readers and writers of token_metadata and eventually allow them to yield in asynchronous loops. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00

1 2 3 4 5 ...

310 Commits