scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 03:45:11 +00:00

Author	SHA1	Message	Date
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Benny Halevy	17e006106b	token_metadata: update_normal_tokens: avoid unneeded sort when token ownership doesn't change Currently, we first delete all existing token mappings for the endpoint from _token_to_endpoint_map and then we add all updated token mappings for it and set should_sort_tokens if the token is newly inserted, but since we removed all existing mappings for the endpoint unconditionally, we will sort the tokens even if the token existed and its ownership did not change. This is worthwhile since there are scenarios where none of the token ownership change. Searching and erasing tokens from the tokens unordered_set runs at constant time on average so doing it for n tokens is O(n), while sorting the tokens is O(n*log(n)). Test: unit(dev) DTest: replace_address_test.py::TestReplaceAddress::test_serve_writes_during_bootstrap(dev,debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220117101242.122512-2-bhalevy@scylladb.com>	2022-01-17 12:18:42 +02:00
Benny Halevy	25977db7b4	token_metadata: remove update_normal_token entry point It's currently used only by unit tests and it is dangerous to use on a populated token_metadata as update_normal_tokens assumes that the set of tokens owned by the given endpoint is compelte, i.e. previous tokens owned by the endpoint are no longer owned by it, but the single-token update_normal_token interface seems commulative (and has no documentation whatsoever). It is better to remove this interface and calculate a complete map of endpoint->tokens from the tests. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20220117101242.122512-1-bhalevy@scylladb.com>	2022-01-17 12:18:42 +02:00
Avi Kivity	bbad8f4677	replica: move ::database, ::keyspace, and ::table to replica namespace Move replica-oriented classes to the replica namespace. The main classes moved are ::database, ::keyspace, and ::table, but a few ancillary classes are also moved. There are certainly classes that should be moved but aren't (like distributed_loader) but we have to start somewhere. References are adjusted treewide. In many cases, it is obvious that a call site should not access the replica (but the data_dictionary instead), but that is left for separate work. scylla-gdb.py is adjusted to look for both the new and old names.	2022-01-07 12:04:38 +02:00
Avi Kivity	ae3a360725	database: Move database, keyspace, table classes to replica/ directory The database, keyspace, and table classes represent the replica-only part of the objects after which they are named. Reading from a table doesn't give you the full data, just the replica's view, and it is not consistent since reconciliation is applied on the coordinator. As a first step in acknowledging this, move the related files to a replica/ subdirectory.	2022-01-06 17:07:30 +02:00
Benny Halevy	044e4a6b72	token_metadata: delete private constructor It is not used. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211205174306.450536-1-bhalevy@scylladb.com>	2021-12-05 19:49:29 +02:00
Benny Halevy	d953e7b01a	token_metadata: get rid of now-unused sync methods Now that abstract_replication_strategy methods are all async clone_only_token_map_sync, and update_normal_tokens_sync are unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 16:10:06 +03:00
Benny Halevy	bb0ea0b1c0	shared_token_metadata: set: check version monotonicity Setting the ring version backwards means it got out of sync. Possibly concurrent updates weren't serialized properly using token_metadata_lock / mutate_token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 14:03:51 +03:00
Benny Halevy	685f5e7704	token_metadata: get rid of copy constructor and assignment operator Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 14:00:55 +03:00
Benny Halevy	3393df45eb	token_metadata, storage_service: unify token_metadata_lock and merge_lock. Serialize the metadata changes with keyspace create, update, or drop. This will become necessary in the following patch when we update the effective_replication_map on all keyspaces and we want instances on all shards end up with the same replication map. Note that storage_service::keyspace_changed is called from the scheme_merge path so it already holds the merge_lock. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 13:01:25 +03:00
Avi Kivity	d2157dfea7	Merge 'locator: token_metadata: simplify `tokens_iterator`' from Michał Chojnowski `ring_range()`/`tokens_iterator` are more complicated than they need to be. The `include_min` parameter is not used anywhere, and `tokens_iterator` is pimplified without a good reason. Simplify that. Closes #8805 * github.com:scylladb/scylla: locator: token_metadata: depimplify tokens_iterator locator: token_metadata: remove _ring_pos from tokens_iterator_impl locator: token_metadata: remove tokens_end() locator: token_metadata: remove `include_min` from tokens_iterator_impl locator: token_metadata: remove the `include_min` parameter from `ring_range()`	2021-06-08 15:42:41 +03:00
Michał Chojnowski	3ea97e7a11	locator: token_metadata: depimplify tokens_iterator This class has no meaningful dependencies, so pimpl is unreasonable here.	2021-06-07 10:41:23 +02:00
Michał Chojnowski	30e5290cea	locator: token_metadata: remove tokens_end() It's an internal method of token_metadata_impl and doesn't have to exist.	2021-06-07 10:41:11 +02:00
Avi Kivity	a55b434a2b	treewide: extent copyright statements to present day	2021-06-06 19:18:49 +03:00
Michał Chojnowski	2a3bd2babe	locator: token_metadata: remove the `include_min` parameter from `ring_range()` `include_min` is always set to the default value. Remove it.	2021-06-05 17:40:35 +02:00
Kamil Braun	d71513d814	abstract_replication_strategy: avoid reactor stalls in `get_address_ranges` and friends The algorithm used in `get_address_ranges` and `get_range_addresses` calls `calculate_natural_endpoints` in a loop; the loop iterates over all tokens in the token ring. If the complexity of a particular implementation of `calculate_natural_endpoints` is large - say `θ(n)`, where `n` is the number of tokens - this results in an `θ(n^2)` algorithm (or worse). This case happens for `Everywhere` replication strategy. For small clusters this doesn't matter that much, but if `n` is, say, `20*255`, this may result in huge reactor stalls, as observed in practice. We avoid these stalls by inserting tactical yields. We hope that some day someone actually implements a subquadratic algortihm here. The commit also adds a comment on `abstract_replication_strategy::calculate_natural_endpoints` explaining that the interface does not give a complexity guarantee (at this point); the different implementations have different complexities. For example, `Everywhere` implementation always iterates over all tokens in the token ring, so it has `θ(n)` worst and best case complexity. On the other hand, `NetworkTopologyStrategy` implementation usually finishes after visiting a small part of the token ring (specifically, as soon as it finds a token for each node in the ring) and performs a constant number of operations for each visited token on average, but theoretically its worst case complexity is actually `O(n + k^2)`, where `n` is the number of all tokens and `k` is the number of endpoints (the `k^2` appears since for each endpoint we must perform finds and inserts on `unordered_set` of size `O(k)`; `unordered_set` operations have `O(1)` average complexity but `O(size of the set)` worst case complexity). Therefore it's not easy to put any complexity guarantee in the interface at this point. Instead, we say that: - some implementations may yield - if their complexities force us to do so - but in general, there is no guarantee that the implementation may yield - e.g. the `Everywhere` implementation does not yield. Fixes #8555. Closes #8647	2021-05-25 11:53:28 +03:00
Nadav Har'El	fb0c4e469a	Merge 'token_metadata: Fix get_all_endpoints to return nodes in the ring' from Asias He The get_all_endpoints() should return the nodes that are part of the ring. A node inside _endpoint_to_host_id_map does not guarantee that the node is part of the ring. To fix, return from _token_to_endpoint_map. Fixes #8534 Closes #8536 * github.com:scylladb/scylla: token_metadata: Get rid of get_all_endpoints_count range_streamer: Handle everywhere_topology range_streamer: Adjust use_strict_sources_for_ranges token_metadata: Fix get_all_endpoints to return nodes in the ring	2021-05-11 18:39:10 +03:00
Asias He	5a410cb6e3	token_metadata: Get rid of get_all_endpoints_count It is now only a wrapper for count_normal_token_owners. Refs #8534	2021-05-06 15:36:20 +08:00
Avi Kivity	cea5493cb7	storage_proxy, treewide: introduce names for vectors of inet_address storage_proxy works with vectors of inet_addresses for replica sets and for topology changes (pending endpoints, dead nodes). This patch introduces new names for these (without changing the underlying type - it's still std::vector<gms::inet_address>). This is so that the following patch, that changes those types to utils::small_vector, will be less noisy and highlight the real changes that take place.	2021-05-05 18:36:48 +03:00
Benny Halevy	322aa2f8b5	token_metadata: add clear_gently clear_gently gently clears the token_metadata members. It uses continuations to allow yielding if needed to prevent reactor stalls. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 11:22:21 +02:00
Benny Halevy	56aa49ca81	token_metadata: shared_token_metadata: add mutate_token_metadata mutate_token_metadata acquires the shared_token_metadata lock, clones the token_metadata (using clone_async) and calls an asynchronous functor on the cloned copy of the token_metadata to mutate it. If the functor is successful, the mutated clone is set back to to the shared_token_metadata, otherwise, the clone is destroyed. With that, get rid of shared_token_metadata::clone Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 11:22:19 +02:00
Benny Halevy	e089c22ec1	token_metdata: futurize update_normal_tokens The function complexity if O(#tokens) in the worst case as for each endpoint token to traverses _token_to_endpoint_map lineraly to erase the endpoint mapping if it exists. This change renames the current implementation of update_normal_tokens to update_normal_tokens_sync and clones the code as a coroutine that returns a future and may yield if needed. Eventually we should futurize the whole token_metadata and abstract_replication_strategy interface and get rid of the synchronous functions. Until then the sync version is still required from call sites that are neither returning a future nor run in a seastar thread. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-12-22 10:35:15 +02:00
Asias He	829b4c1438	repair: Make removenode safe by default Currently removenode works like below: - The coordinator node advertises the node to be removed in REMOVING_TOKEN status in gossip - Existing nodes learn the node in REMOVING_TOKEN status - Existing nodes sync data for the range it owns - Existing nodes send notification to the coordinator - The coordinator node waits for notification and announce the node in REMOVED_TOKEN Current problems: - Existing nodes do not tell the coordinator if the data sync is ok or failed. - The coordinator can not abort the removenode operation in case of error - Failed removenode operation will make the node to be removed in REMOVING_TOKEN forever. - The removenode runs in best effort mode which may cause data consistency issues. It means if a node that owns the range after the removenode operation is down during the operation, the removenode node operation will continue to succeed without requiring that node to perform data syncing. This can cause data consistency issues. For example, Five nodes in the cluster, RF = 3, for a range, n1, n2, n3 is the old replicas, n2 is being removed, after the removenode operation, the new replicas are n1, n5, n3. If n3 is down during the removenode operation, only n1 will be used to sync data with the new owner n5. This will break QUORUM read consistency if n1 happens to miss some writes. Improvements in this patch: - This patch makes the removenode safe by default. We require all nodes in the cluster to participate in the removenode operation and sync data if needed. We fail the removenode operation if any of them is down or fails. If the user want the removenode operation to succeed even if some of the nodes are not available, the user has to explicitly pass a list of nodes that can be skipped for the operation. $ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id> Example restful api: $ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5" - The coordinator can abort data sync on existing nodes For example, if one of the nodes fails to sync data. It makes no sense for other nodes to continue to sync data because the whole operation will fail anyway. - The coordinator can decide which nodes to ignore and pass the decision to other nodes Previously, there is no way for the coordinator to tell existing nodes to run in strict mode or best effort mode. Users will have to modify config file or run a restful api cmd on all the nodes to select strict or best effort mode. With this patch, the cluster wide configuration is eliminated. Fixes #7359 Closes #7626	2020-12-10 10:14:39 +02:00
Piotr Jastrzebski	661b52c7df	token_metadata: Remove std::iterator from tokens_iterator std::iterator is deprecated since C++17 so define all the required iterator_traits directly. Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>	2020-11-17 16:53:20 +01:00
Benny Halevy	0abd8e62cd	token_metadata: futurize clone_after_all_left Call the futurized clone_only_token_map and remove the _leaving_endpoints from the cloned token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	4a622c14e1	token_metadata: futurize clone_only_token_map Does part of clone_async() using continuations to prevent stalls. Rename synchronous variant to clone_only_token_map_sync that is going to be deprecated once all its users will be futurized. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:24 +02:00
Benny Halevy	4fc5997949	token_metadata: add clone_async Clone token_metadata object using async continuation to prevent reactor stalls. Refs https://github.com/scylladb/scylla/issues/7220 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	41c7efd0c0	storage_service: convert to token_metadata_ptr clone _token_metadata for updating into _updated_token_metadata and use it to update the local token_metadata on all shard via do_update_pending_ranges(). Adjust get_token_metadata to get either the update the updated_token_metadata, if available, or the base token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	fa880439c9	storage_service: use token_metadata_lock to serialize updates to token_metadata Rather than using `serialized_action`, grab a lock before mutating _token_metadata and hold it until its replicated to all shards. A following patch will use a mutable token_metadata_ptr that is updated out of line under the lock. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	6d06853e6c	abstract_replication_strategy: convert to shared_token_metadata To facilitate that, keep a const shared_token_metadata& in class database rather than a const token_metadata& Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	29ed59f8c4	main: start a shared_token_metadata And use it to get a token_metadata& compatible with current usage, until the services are converted to use token_metadata_ptr. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-11-11 14:20:23 +02:00
Benny Halevy	5a250f529f	token_metadata: get rid of unused calculate_pending_ranges_for_* methods They are only called inernally by token_metadata_impl. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:16:23 +03:00
Benny Halevy	41e5a3a245	token_metadata: get rid of clone_after_all_settled It's unused. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-09-30 23:15:11 +03:00
Asias He	c38ec98c6e	token_metadata: Remove _pending_ranges - Remove get_pending_ranges and introduce has_pending_ranges, since the caller only needs to know if there is a pending range for the keyspace and the node. - Remove print_pending_ranges which is only used in logging. If we really want to log the new pending token ranges, we can log when we set the new pending token ranges. This removal of _pending_ranges makes copying of token_metadata faster and reduces the chance to cause reactor stall. Refs: #7220	2020-09-15 16:27:50 +08:00
Benny Halevy	8b63523fb7	token_metadata: rename calculate_pending_ranges to update_pending_ranges Since it sets the token_metadata_impl's pending ranges. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:34 +03:00
Benny Halevy	22275e579e	token_metadata: mark const methods Many token_metadata methods do not modify the object and can be marked as const. The motivation is to better control who may modify token_metadata. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:20:21 +03:00
Benny Halevy	65d89512d0	token_ranges: pending_endpoints_for: return empty vector if keyspace not found Rather than creating a bogus empty entry. With that, it can be marked as const. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:16:14 +03:00
Benny Halevy	ca61c2797a	token_ranges: get_pending_ranges: return empty vector if keyspace not found Rather than creating a bogus empty entry. With that, it can be marked as const. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 16:14:44 +03:00
Benny Halevy	23a0625998	token_ranges: get rid of unused get_pending_ranges variant Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 14:46:53 +03:00
Benny Halevy	78f40cac8d	token_metadata: add get_datacenter_racks() const variant Needed for passing a const token_metadata& to calculate_natural_endpoints methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2020-08-20 14:38:45 +03:00
Asias He	bd6691301e	token_metadata: Calculate pending ranges for replacing node It will be needed soon for making replace node take writes. Refs: #5482	2020-04-29 16:02:10 +08:00
Avi Kivity	91c4409376	locator: token_metadata: remove unused include "query-request.hh" sstable_datafile_test.cc lost access to interval_map (via position_in_partition.hh), so it now includes that directly.	2020-02-14 20:46:25 +02:00
Avi Kivity	bee1cc42fe	locator: token_metadata: move implementation classes to .cc With pimplification complete, move the implementation classes to .cc and remove boost/icl includes.	2020-02-14 20:34:44 +02:00
Avi Kivity	ef41b45142	locator: token_metadata: pimplify tokens_iterator Because tokens_iterator refers to token_metadata_impl, the latter cannot be moved out-of-line. So this patch pimplifies tokens_iterator as well.	2020-02-14 20:29:14 +02:00
Avi Kivity	9425e9c13d	locator: token_metadata: make token_metadata_impl::tokens_iterator a non-nested class In order to pimplify token_metadata_impl::tokens_iterator, we must make it a non-nested class, since eventually token_metadata_impl will be an incomplete class for users and nested classes cannot be forward declared. So this patch makes it a non-nested class. Two inline functions that referred to it were moved out of class scope so they can see the definition. No functional changes.	2020-02-14 20:29:13 +02:00
Avi Kivity	6d53f240d1	locator: token_metadata: pimplify token_metadata is a heavyweight class, with heavyweight include dependencies (icl, which has tens of thousands of lines in headers), heavyweight methods, but it rarely used. So it is a classic candidate for pimmplication. This patch splits off the implementation into token_metadata_impl and leaves token_metadata as a forwarding class. Actual movement of the code is left to a later patch to ease review. Notes: - some constructors were made public due to limitations of std::make_unique - a few token_metadata methods pass *this along to external functions, so we now pass the holder object as "unpimplified_this" to support this.	2020-02-14 20:29:12 +02:00
Avi Kivity	90a3670952	locator: token_metadata: use non-deduced return type for ring_range() Deduced return types are user hostile as the user has to look at the implementation in order to understand what the return type is.	2020-02-14 15:44:46 +02:00
Kamil Braun	96e5d6c924	token_metadata: add count_normal_token_owners method	2020-01-30 11:10:08 +01:00
Pavel Emelyanov	6e06c88b4c	token_metadata: Remove unused helper There are two _identical_ methods in token_metadata class: get_all_endpoints_count() and number_of_endpoints(). The former one is used (called) the latter one is not used, so let's remove it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2019-12-23 14:22:43 +02:00
Kamil Braun	e4ac4db1c5	token_metadata::update_normal_tokens: take tokens by const ref	2019-10-21 10:38:45 +02:00

1 2 3

113 Commits