Commit Graph

134 Commits

Author SHA1 Message Date
Benny Halevy
3cee0f8bd9 shared_token_metadata: mutate_token_metadata: bump cloned copy ring_version
Currently this is done only in
storage_service::get_mutable_token_metadata_ptr
but it needs to be done here as well for code paths
calling mutate_token_metadata directly.

Currently, this it is only called from network_topology_strategy_test.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220130152157.2596086-1-bhalevy@scylladb.com>
2022-01-30 18:15:08 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Benny Halevy
17e006106b token_metadata: update_normal_tokens: avoid unneeded sort when token ownership doesn't change
Currently, we first delete all existing token mappings
for the endpoint from _token_to_endpoint_map and then
we add all updated token mappings for it and set should_sort_tokens
if the token is newly inserted, but since we removed all
existing mappings for the endpoint unconditionally, we
will sort the tokens even if the token existed and
its ownership did not change.

This is worthwhile since there are scenarios where
none of the token ownership change.  Searching and
erasing tokens from the tokens unordered_set runs
at constant time on average so doing it for n tokens
is O(n), while sorting the tokens is O(n*log(n)).

Test: unit(dev)
DTest: replace_address_test.py::TestReplaceAddress::test_serve_writes_during_bootstrap(dev,debug)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-2-bhalevy@scylladb.com>
2022-01-17 12:18:42 +02:00
Benny Halevy
25977db7b4 token_metadata: remove update_normal_token entry point
It's currently used only by unit tests
and it is dangerous to use on a populated token_metadata
as update_normal_tokens assumes that the set of tokens
owned by the given endpoint is compelte, i.e. previous
tokens owned by the endpoint are no longer owned by it,
but the single-token update_normal_token interface
seems commulative (and has no documentation whatsoever).

It is better to remove this interface and calculate a
complete map of endpoint->tokens from the tests.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20220117101242.122512-1-bhalevy@scylladb.com>
2022-01-17 12:18:42 +02:00
Benny Halevy
044e4a6b72 token_metadata: delete private constructor
It is not used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211205174306.450536-1-bhalevy@scylladb.com>
2021-12-05 19:49:29 +02:00
Benny Halevy
9d2631daaf token_metadata: calculate_pending_ranges_for_leaving: maybe yield
We see long stalls as reported in
https://github.com/scylladb/scylla/issues/8030#issuecomment-974783526

everywhere_replication_strategy::calculate_natural_endpoints
is synchronous and doesn't yield, so add maybe_yield() calls
when looping over many token ranges.

Refs #8030

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20211121090339.3955278-1-bhalevy@scylladb.com>
Message-Id: <20211121102606.76700-1-bhalevy@scylladb.com>
2021-11-22 10:48:25 +02:00
Benny Halevy
d953e7b01a token_metadata: get rid of now-unused sync methods
Now that abstract_replication_strategy methods are all async
clone_only_token_map_sync, and update_normal_tokens_sync
are unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
cbe58345b9 abstract_replication_strategy: futurize get_*address_ranges
Remaining callers of get_address_ranges and get_pending_address_ranges
are all either from a seastar thread or from a coroutine
so we can make the methods always async and drop the
can_yield param.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 16:10:06 +03:00
Benny Halevy
bb0ea0b1c0 shared_token_metadata: set: check version monotonicity
Setting the ring version backwards means it got out of sync.
Possibly concurrent updates weren't serialized properly
using token_metadata_lock / mutate_token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:03:51 +03:00
Benny Halevy
43160abaec token_metadata: use static ring version
For generating unique _ring_version.

Currently when we clone a mutable token_metadata_ptr
it remains with the same _ring_version
and the ring version is updated only when the topology changes.

To be able to distinguish these traqnsient copies
from the ones that got applied, be stricter about
the ring version and change it to a unique number
using a static counter.

Next patch will update the ring version
(and consequently invalidate the cached_endpoints
on the replication strategy) every time the token_metadata
changes, not only when the topology changes.

Note that the _cached_endpoints will go away
once the transition to effective_replication_map
is finished, so this will not degrade performance.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:03:17 +03:00
Benny Halevy
685f5e7704 token_metadata: get rid of copy constructor and assignment operator
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:00:55 +03:00
Benny Halevy
3393df45eb token_metadata, storage_service: unify token_metadata_lock and merge_lock.
Serialize the metadata changes with
keyspace create, update, or drop.

This will become necessary in the following patch
when we update the effective_replication_map
on all keyspaces and we want instances on all shards
end up with the same replication map.

Note that storage_service::keyspace_changed is called
from the scheme_merge path so it already holds
the merge_lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 13:01:25 +03:00
Benny Halevy
a1c573e6d3 abstract_replication_strategy: make calculate_natural_endpoints_sync private
And with that rename calculate_natural_endpoints(const token& search_token, const token_metadata&, can_yield)
to do_calculate_natural_endpoints and make it protected,

With this patch, all its external users call the async version, so
rename it back to calculate_natural_endpoints, and make
calculate_natural_endpoints_sync private since it's being called
only within abstract_replication_strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 12:39:36 +03:00
Avi Kivity
369afe3124 treewide: use coroutine::maybe_yield() instead of co_await make_ready_future()
The dedicated API shows the intent, and may be a tiny bit faster.

Closes #9382
2021-09-23 12:28:56 +02:00
Benny Halevy
4ffdafe6dc token_metadata: delete old java code
We no longer need to keep it for reference.
It's just causing confusion at this point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210826095457.994834-1-bhalevy@scylladb.com>
2021-08-26 13:03:59 +03:00
Avi Kivity
222ef17305 build, treewide: enable -Wredundant-move
Returning a function parameter guarantees copy elision and does not
require a std::move().  Enable -Wredundant-move to warn us that the
move is unneeded, and gain slightly more readable code. A few violations
are trivially adjusted.

Closes #9004
2021-07-11 12:53:02 +03:00
Benny Halevy
612793c2d4 locator: token_metadata: reuse utils::stall_free clear_gently helpers
Use the generic clear_gently functions that were added
in eca9f45c59.

Test: unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Message-Id: <20210706090243.1776466-1-bhalevy@scylladb.com>
2021-07-06 12:06:43 +03:00
Avi Kivity
d2157dfea7 Merge 'locator: token_metadata: simplify tokens_iterator' from Michał Chojnowski
`ring_range()`/`tokens_iterator` are more complicated than they need to be. The `include_min` parameter is not used anywhere, and `tokens_iterator` is pimplified without a good reason. Simplify that.

Closes #8805

* github.com:scylladb/scylla:
  locator: token_metadata: depimplify tokens_iterator
  locator: token_metadata: remove _ring_pos from tokens_iterator_impl
  locator: token_metadata: remove tokens_end()
  locator: token_metadata: remove `include_min` from tokens_iterator_impl
  locator: token_metadata: remove the `include_min` parameter from `ring_range()`
2021-06-08 15:42:41 +03:00
Michał Chojnowski
3ea97e7a11 locator: token_metadata: depimplify tokens_iterator
This class has no meaningful dependencies, so pimpl is unreasonable here.
2021-06-07 10:41:23 +02:00
Michał Chojnowski
baaac5bb7c locator: token_metadata: remove _ring_pos from tokens_iterator_impl
_ring_pos is slightly confusing. I thought at first that it doesn't do anything
since operator== doesn't use it.
This cosmetic patch tries to improve the readability, and also removes
operator!= which is generated automatically in C++20.
2021-06-07 10:41:22 +02:00
Michał Chojnowski
30e5290cea locator: token_metadata: remove tokens_end()
It's an internal method of token_metadata_impl and doesn't have to exist.
2021-06-07 10:41:11 +02:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Michał Chojnowski
81c1a7f7e9 locator: token_metadata: remove include_min from tokens_iterator_impl
`include_min` is always set to the default value. Get rid of it.
2021-06-05 17:40:35 +02:00
Michał Chojnowski
2a3bd2babe locator: token_metadata: remove the include_min parameter from ring_range()
`include_min` is always set to the default value. Remove it.
2021-06-05 17:40:35 +02:00
Nadav Har'El
fb0c4e469a Merge 'token_metadata: Fix get_all_endpoints to return nodes in the ring' from Asias He
The get_all_endpoints() should return the nodes that are part of the ring.

A node inside _endpoint_to_host_id_map does not guarantee that the node
is part of the ring.

To fix, return from _token_to_endpoint_map.

Fixes #8534

Closes #8536

* github.com:scylladb/scylla:
  token_metadata: Get rid of get_all_endpoints_count
  range_streamer: Handle everywhere_topology
  range_streamer: Adjust use_strict_sources_for_ranges
  token_metadata: Fix get_all_endpoints to return nodes in the ring
2021-05-11 18:39:10 +03:00
Asias He
5a410cb6e3 token_metadata: Get rid of get_all_endpoints_count
It is now only a wrapper for count_normal_token_owners.

Refs #8534
2021-05-06 15:36:20 +08:00
Asias He
ddeabba6aa token_metadata: Fix get_all_endpoints to return nodes in the ring
The get_all_endpoints() should return the nodes that are part of the ring.

A node inside _endpoint_to_host_id_map does not guarantee that the node
is part of the ring.

To fix, return from _token_to_endpoint_map.

Fixes #8534
2021-05-06 10:02:11 +08:00
Avi Kivity
cea5493cb7 storage_proxy, treewide: introduce names for vectors of inet_address
storage_proxy works with vectors of inet_addresses for replica sets
and for topology changes (pending endpoints, dead nodes). This patch
introduces new names for these (without changing the underlying
type - it's still std::vector<gms::inet_address>). This is so that
the following patch, that changes those types to utils::small_vector,
will be less noisy and highlight the real changes that take place.
2021-05-05 18:36:48 +03:00
Benny Halevy
322aa2f8b5 token_metadata: add clear_gently
clear_gently gently clears the token_metadata members.
It uses continuations to allow yielding if needed
to prevent reactor stalls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 11:22:21 +02:00
Benny Halevy
56aa49ca81 token_metadata: shared_token_metadata: add mutate_token_metadata
mutate_token_metadata acquires the shared_token_metadata lock,
clones the token_metadata (using clone_async)
and calls an asynchronous functor on
the cloned copy of the token_metadata to mutate it.

If the functor is successful, the mutated clone
is set back to to the shared_token_metadata,
otherwise, the clone is destroyed.

With that, get rid of shared_token_metadata::clone

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 11:22:19 +02:00
Benny Halevy
e089c22ec1 token_metdata: futurize update_normal_tokens
The function complexity if O(#tokens) in the worst case
as for each endpoint token to traverses _token_to_endpoint_map
lineraly to erase the endpoint mapping if it exists.

This change renames the current implementation of
update_normal_tokens to update_normal_tokens_sync
and clones the code as a coroutine that returns a future
and may yield if needed.

Eventually we should futurize the whole token_metadata
and abstract_replication_strategy interface and get rid
of the synchronous functions.  Until then the sync
version is still required from call sites that
are neither returning a future nor run in a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 10:35:15 +02:00
Asias He
829b4c1438 repair: Make removenode safe by default
Currently removenode works like below:

- The coordinator node advertises the node to be removed in
  REMOVING_TOKEN status in gossip

- Existing nodes learn the node in REMOVING_TOKEN status

- Existing nodes sync data for the range it owns

- Existing nodes send notification to the coordinator

- The coordinator node waits for notification and announce the node in
  REMOVED_TOKEN

Current problems:

- Existing nodes do not tell the coordinator if the data sync is ok or failed.

- The coordinator can not abort the removenode operation in case of error

- Failed removenode operation will make the node to be removed in
  REMOVING_TOKEN forever.

- The removenode runs in best effort mode which may cause data
  consistency issues.

  It means if a node that owns the range after the removenode
  operation is down during the operation, the removenode node operation
  will continue to succeed without requiring that node to perform data
  syncing. This can cause data consistency issues.

  For example, Five nodes in the cluster, RF = 3, for a range, n1, n2,
  n3 is the old replicas, n2 is being removed, after the removenode
  operation, the new replicas are n1, n5, n3. If n3 is down during the
  removenode operation, only n1 will be used to sync data with the new
  owner n5. This will break QUORUM read consistency if n1 happens to
  miss some writes.

Improvements in this patch:

- This patch makes the removenode safe by default.

We require all nodes in the cluster to participate in the removenode operation and
sync data if needed. We fail the removenode operation if any of them is down or
fails.

If the user want the removenode operation to succeed even if some of the nodes
are not available, the user has to explicitly pass a list of nodes that can be
skipped for the operation.

$ nodetool removenode --ignore-dead-nodes <list_of_dead_nodes_to_ignore> <host_id>

Example restful api:

$ curl -X POST "http://127.0.0.1:10000/storage_service/remove_node/?host_id=7bd303e9-4c7b-4915-84f6-343d0dbd9a49&ignore_nodes=127.0.0.3,127.0.0.5"

- The coordinator can abort data sync on existing nodes

For example, if one of the nodes fails to sync data. It makes no sense for
other nodes to continue to sync data because the whole operation will
fail anyway.

- The coordinator can decide which nodes to ignore and pass the decision
  to other nodes

Previously, there is no way for the coordinator to tell existing nodes
to run in strict mode or best effort mode. Users will have to modify
config file or run a restful api cmd on all the nodes to select strict
or best effort mode. With this patch, the cluster wide configuration is
eliminated.

Fixes #7359

Closes #7626
2020-12-10 10:14:39 +02:00
Piotr Jastrzebski
87bf577450 token_metadata: Remove std::iterator from tokens_iterator_impl
std::iterator is deprecated since C++17 so define all the required
iterator_traits directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-11-17 16:53:20 +01:00
Benny Halevy
275fe30628 token_metadata_impl: set_pending_ranges: add can_yield_param
To prevent a > 10 ms stall when inserting to boost::icl::interval_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
ba31350239 abstract_replication_strategy: add can_yield param to get_pending_ranges and friends
To prevent reactor stalls as seen in #7313.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
7fb489d338 token_metadata_impl: calculate_pending_ranges_for_* reindent
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
6ce2436a4c token_metadata_impl: calculate_pending_ranges_for_* pass new_pending_ranges by ref
We can use the seastar thread to keep the vector rather thna creating
a lw_shared_ptr for it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
0ca423dcfc token_metadata_impl: calculate_pending_ranges_for_* call in thread
The functions can be simplified as they are all now being called
from a seastar thread.

Make them sequential, returning void, and yielding if necessary.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
84d086dc77 token_metadata: update_pending_ranges: create seastar thread
So we can yield in this path to prevent reactor stalls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
1e6c181678 abstract_replication_strategy: add get_address_ranges method for specific endpoint
Some of the callers of get_address_ranges are interested in the ranges
of a specific endpoint.

Rather than building a map for all endpoints and then traversing
it looking for this specific endpoint, build a multimap of token ranges
relating only to the specified endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
2ce6773dae token_metadata_impl: clone_after_all_left: sort tokens only once
Currently the sorted tokens are copied needlessly by on this path
by `clone_only_token_map` and then recalculated after calling
remove_endpoint for each leaving endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
0abd8e62cd token_metadata: futurize clone_after_all_left
Call the futurized clone_only_token_map and
remove the _leaving_endpoints from the cloned token_metadata_impl.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
4a622c14e1 token_metadata: futurize clone_only_token_map
Does part of clone_async() using continuations to prevent stalls.

Rename synchronous variant to clone_only_token_map_sync
that is going to be deprecated once all its users will be futurized.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
d1a73ec7b3 token_metadata: use mutable_token_metadata_ptr in calculate_pending_ranges_for_*
Replacing old code using lw_shared_ptr<token_metadata> with the "modern"
mutable_token_metadata_ptr alias.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
4fc5997949 token_metadata: add clone_async
Clone token_metadata object using async continuation to
prevent reactor stalls.

Refs https://github.com/scylladb/scylla/issues/7220

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
fa880439c9 storage_service: use token_metadata_lock to serialize updates to token_metadata
Rather than using `serialized_action`, grab a lock before mutating
_token_metadata and hold it until its replicated to all shards.

A following patch will use a mutable token_metadata_ptr
that is updated out of line under the lock.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
5a250f529f token_metadata: get rid of unused calculate_pending_ranges_for_* methods
They are only called inernally by token_metadata_impl.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-09-30 23:16:23 +03:00
Benny Halevy
41e5a3a245 token_metadata: get rid of clone_after_all_settled
It's unused.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-09-30 23:15:11 +03:00
Benny Halevy
105a2f5244 token_metadata_impl: remove_endpoint: do not sort tokens
Call sort_tokens at the caller as all call sites from
within token_metadata_impl call remove_endpoint for multiple
endpoints so the tokens can be re-sorted only once, when done
removing all tokens.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-09-30 23:12:32 +03:00
Benny Halevy
86303f4fdd token_metadata_impl: always sort_tokens in place
No need to return the sorted tokens vector as it's
always assigned to _sorted_tokens.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-09-30 23:08:56 +03:00