Commit Graph

354 Commits

Author SHA1 Message Date
Pavel Emelyanov
ffc9cc9aec range-streamer: Remove global storage service reference
The reference is used by range streamer and (!) storage
service itself to find out if the consistent_rangemovement
option is ON/OFF.

Both places already have the database with config at hands
and can be simplified.

v2: spellchecking

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210212095403.22662-1-xemul@scylladb.com>
2021-02-12 15:50:30 +01:00
Benny Halevy
322aa2f8b5 token_metadata: add clear_gently
clear_gently gently clears the token_metadata members.
It uses continuations to allow yielding if needed
to prevent reactor stalls.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 11:22:21 +02:00
Benny Halevy
e089c22ec1 token_metdata: futurize update_normal_tokens
The function complexity if O(#tokens) in the worst case
as for each endpoint token to traverses _token_to_endpoint_map
lineraly to erase the endpoint mapping if it exists.

This change renames the current implementation of
update_normal_tokens to update_normal_tokens_sync
and clones the code as a coroutine that returns a future
and may yield if needed.

Eventually we should futurize the whole token_metadata
and abstract_replication_strategy interface and get rid
of the synchronous functions.  Until then the sync
version is still required from call sites that
are neither returning a future nor run in a seastar thread.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-12-22 10:35:15 +02:00
Benny Halevy
37e971ad87 dht/i_partitioner: to_partition_ranges: support yielding
Allow yielding to prevent reactor stalls
when called with a long vector of ranges.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-24 12:23:56 +02:00
Benny Halevy
157a964a63 locator: extract can_yield to utils/maybe_yield.hh
Move the definition of bool_class can_yield to a standalone
header file and define there a maybe_yield(can_yield) helper.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-24 12:23:56 +02:00
Benny Halevy
ba31350239 abstract_replication_strategy: add can_yield param to get_pending_ranges and friends
To prevent reactor stalls as seen in #7313.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
4a622c14e1 token_metadata: futurize clone_only_token_map
Does part of clone_async() using continuations to prevent stalls.

Rename synchronous variant to clone_only_token_map_sync
that is going to be deprecated once all its users will be futurized.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:24 +02:00
Benny Halevy
5ab7b0b2ea abstract_replication_strategy: accept a token_metadata_ptr in get_pending_address_ranges methods
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
1cbe54a9cf boot_strapper: get_*_tokens: use token_metadata_ptr
To facilitate preempting of long running loops if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
63137b35ea range_streamer: convert to token_metadata_ptr
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Avi Kivity
6f5ef5a5f5 dht: document incremental partition_range and token_range sharders
Closes #6210
2020-10-18 12:24:49 +03:00
Benny Halevy
569f2830c1 range_streamer: keep a const token_metadata&
range_streamer doesn't need to modify toekn_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Piotr Jastrzebski
52ec0c683e codebase wide: replace erase + remove_if with erase_if
C++20 introduced std::erase_if which simplifies removal of elements
from the collection. Previously the code pattern looked like:

<collection>.erase(
        std::remove_if(<collection>.begin(), <collection>.end(), <predicate>),
        <collection>.end());

In C++20 the same can be expressed with:

std::erase_if(<collection>, <predicate>);

This commit replaces all the occurences of the old pattern with the new
approach.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <6ffcace5cce79793ca6bd65c61dc86e6297233fd.1597064990.git.piotr@scylladb.com>
2020-08-10 18:17:38 +03:00
Pavel Emelyanov
35a22ac48a bptree: Special intra-node key search when possible
If the key type is int64_t and the less-comparator is "natural" (i.e. it's
literally 'a < b') we may use the SIMD instructions to search for the key
on a node. Before doing so, the maybe_key and the searcher should be prepared
for that, in particular:

1. maybe_key should set unused keys to the minimal value
2. the searcher for this case should call the gt() helper with
   primitive types -- int64_t search key and array of int64_t values

To tell to B+ code that the key-less pair is such the less-er should define
the simplify_key() method converting search keys to int64_t-s.

This searcher is selected automatically, if any mismatch happens it silently
falls back to default one. Thus also add a static assertion to the row-cache
to mitigate this.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-06 15:41:31 +03:00
Pavel Emelyanov
61d4a73ed0 token: Restrict TokenCarrier concept with noexcept
The <search-key>.token() is noexcept currently and will have
to be explicitly such for future optimized key searcher, so
restrict the constraint and update the related classes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-08-06 15:41:31 +03:00
Piotr Sarna
bd2d48e99c streaming: make stream_plan::abort noexcept
Aborting a stream plan is used in deinitialization code
ran in noexcept environment, so it should be noexcept itself.
Tested on a not-merged-yet Seastar patch with hardened noexcept
checks for abort_source.

Message-Id: <6eada033bb394d725b83a7e0f92381cb792ef6a1.1596446857.git.sarna@scylladb.com>
2020-08-03 14:00:19 +03:00
Pavel Emelyanov
8618a02815 migration_manager: Remove db/schema_tables.hh inclustion into header
The schema_tables.hh -> migration_manager.hh couple seems to work as one
of "single header for everyhing" creating big blot for many seemingly
unrelated .hh's.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-17 17:54:43 +03:00
Pavel Emelyanov
ae28814b1c token: Introduce raw() helper and raw comparator
In next patches the entries having token on-board will be
moved onto B+-tree rails. For this the int64_t value of the
token will be used as B+ key, so prepare for this.

One corner case -- the after_all_keys tokens must be resolved
to int64::max value to appear at the "end" of the tree. This
is not the same as "before_all_keys" case, which maps to the
int64::min value which is not allowed for regular tokens. But
for the sake of B+ switch this is OK, the conflicts of token
raw values are explicitly resolved in next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
7b2754cf5f row-cache: Use ring_position_comparator in some places
The row cache (and memtable) code uses own comparators built on top
of the ring_position_comparator for collections of partitions. These
collections will be switched from the key less-compare to the pair
of token less-compare + key tri-compare.

Prepare for the switch by generalizing the ring_partition_comparator
and by patching all the non-collections usage of less-compare to use
one.

The memtable code doesn't use it outside of collections, but patch it
anyway as a part of preparations.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
1e15c06889 dht: Detach ring_position_comparator_for_sstables
Next patches will generalize ring_position_comparator with templates
to replace cache_entry's and memtable_entry's comparators. The overload
of operator() for sstables has its own implementation, that differs from
the "generic" one, for smoother generalization it's better to detach it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-14 16:30:02 +03:00
Pavel Emelyanov
6d7ae4ead1 dht: Mark noexcept methods
These are either trivially noexcept already, or call each-other, thus becoming noexcept too

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-07-09 14:41:03 +03:00
Botond Dénes
6ae8e0bc7d dht::sharder: add virtual destructor
This is a class with virtual methods, it should have a virtual
destructor too.
2020-07-01 10:15:49 +03:00
Asias He
9abaf9bc2e boot_strapper: Ignore node to be replaced explicitly as stream source
After commit 7d86a3b208 (storage_service:
Make replacing node take writes), during replace operation, tokens in
_token_metadata for node being replaced are updated only after the replace
operation is finished. As a result, in range_streamer::add_ranges, the
node being replaced will be considered as a source to stream data from.

Before commit 7d86a3b208, the node being
replaced will not be considered as a source node because it is already
replaced by the replacing node before the replace operation is finished.
This is the reason why it works in the past.

To fix, filter out the node being replaced as a source node explicitly.

Tests: replace_first_boot_test and replace_stopped_node_test
Backports: 4.1
Fixes: #6728
2020-06-30 12:40:19 +03:00
Avi Kivity
9afd599d7c Merge 'range_streamer: Handle table of RF 1 in get_range_fetch_map' from Asias
"
After "Make replacing node take writes" series, with repair based node
operations disabled, we saw the replace operation fail like:

```
[shard 0] init - Startup failed: std::runtime_error (unable to find
sufficient sources for streaming range (9203926935651910749, +inf) in
keyspace system_auth)
```
The reason is the system_auth keyspace has default RF of 1. It is
impossible to find a source node to stream from for the ranges owned by
the replaced node.

In the past, the replace operation with keyspace of RF 1 passes, because
the replacing node calls token_metadata.update_normal_tokens(tokens,
ip_of_replacing_node) before streaming. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9021954492552185543, -9016289150131785593] exists on {127.0.0.6}
```

Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in
range_streamer::get_range_fetch_map will pass if the source is the node
itself. However, it will not stream from the node itself. As a result,
the system_auth keyspace will not get any data.

After the "Make replacing node take writes" series, the replacing node
calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node)
after the streaming finishes. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9049647518073030406, -9048297455405660225] exists on {127.0.0.5}
```

Since 127.0.0.5 was dead, the source node check failed, so the bootstrap
operation.

Ta fix, we ignore the table of RF 1 when it is unable to find a source
node to stream.

Fixes #6351
"

* asias-fix_bootstrap_with_rf_one_in_range_streamer:
  range_streamer: Handle table of RF 1 in get_range_fetch_map
  streaming: Use separate streaming reason for replace operation
2020-06-10 16:03:13 +03:00
Rafael Ávila de Espíndola
555d8fe520 build: Be consistent about system versus regular headers
We were not consistent about using '#include "foo.hh"' instead of
'#include <foo.hh>' for scylla's own headers. This patch fixes that
inconsistency and, to enforce it, changes the build to use -iquote
instead of -I to find those headers.

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200608214208.110216-1-espindola@scylladb.com>
2020-06-10 15:49:51 +03:00
Pavel Emelyanov
67d5fad65f storage_service: Remove some inclusions of its header
GC pass over .cc files. Some really do not need it, some need for features/gossiper

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2020-06-01 09:08:40 +03:00
Asias He
81f0260816 range_streamer: Handle table of RF 1 in get_range_fetch_map
After "Make replacing node take writes" series, with repair based node
operations disabled, we saw the replace operation fail like:

```
[shard 0] init - Startup failed: std::runtime_error (unable to find
sufficient sources for streaming range (9203926935651910749, +inf) in
keyspace system_auth)
```
The reason is the system_auth keyspace has default RF of 1. It is
impossible to find a source node to stream from for the ranges owned by
the replaced node.

In the past, the replace operation with keyspace of RF 1 passes, because
the replacing node calls token_metadata.update_normal_tokens(tokens,
ip_of_replacing_node) before streaming. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9021954492552185543, -9016289150131785593] exists on {127.0.0.6}
```

Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in
range_streamer::get_range_fetch_map will pass if the source is the node
itself. However, it will not stream from the node itself. As a result,
the system_auth keyspace will not get any data.

After the "Make replacing node take writes" series, the replacing node
calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node)
after the streaming finishes. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9049647518073030406, -9048297455405660225] exists on {127.0.0.5}
```

Since 127.0.0.5 was dead, the source node check failed, so the bootstrap
operation.

Ta fix, we ignore the keyspace of RF 1 when it is unable to find a source
node to stream.

Fixes #6351
2020-05-22 09:30:52 +08:00
Asias He
fa9ee234a0 streaming: Use separate streaming reason for replace operation
Currently, replace and bootstrap share the same streaming reason,
stream_reason::bootstrap, because they share most of the code
in boot_strapper.

In order to distinguish the two, we need to introduce a new stream
reason, stream_reason::replace. It is safe to do so in a mixed cluster
because current code only check if the stream_reason is
stream_reason::repair.

Refs: #6351
2020-05-22 09:30:52 +08:00
Avi Kivity
6f1a8cfeea Merge 'Use special partitioner for CDC Log' from Piotr
"
CDC has to create CDC streams that are co-located with corresponding BaseTable data. This is not always easy. Especially for small vnodes. This PR introduces new partitioner which allows us to easily find such stream ids that the stream belongs to a given vnode and shard.

The idea is that a partitioner accepts only keys that are a blob composed of two int64 numbers. The first number is the token of the key.

Tests: unit(dev), dtests(CDC)
"

* haaawk-cdc_partitioner:
  cdc:use CDCPartitioner for CDC Log
  dht: Add find_first_token_for_shard
  dht: use long_token in token::to_int64
  cdc: add CDCPartitioner
  stream_id: add token_from_bytes static function
  i_partitioner: Stop distinguishing whether keys order is preserved
2020-05-06 20:29:27 +03:00
Pavel Emelyanov
108a944e7b ring_position_ext: Add formatter
It's not currently used, but helped when debugging reworked
row cache lookups.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20200427144712.14794-1-xemul@scylladb.com>
2020-04-27 18:01:01 +03:00
Piotr Jastrzebski
1d1c6af72a dht: Add find_first_token_for_shard
This new function finds the first token in range (start, end] that
belongs to given shard.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-04-22 18:24:54 +02:00
Piotr Jastrzebski
c82adb7906 dht: use long_token in token::to_int64
Previous implementation of to_int64 wasn't handling dht::minimum_token
and dht::maximum_token.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-04-22 16:12:00 +02:00
Piotr Jastrzebski
ae1f14095f i_partitioner: Stop distinguishing whether keys order is preserved
Scylla inherited a concept of partitioners that preserve order of keys from
the origin but it is not used for anything. Moreover, none of the existing
partitioners preserves keys order. The only partitioner that did this in the
past was ByteOrderedPartitioner and Scylla does not support it any more.

For a partitioner to preserve an order of the keys means that if there are two
keys A and B such that A < B then token(A) < token(B) where token(X) isa token
the partitioner assignes to key X.

This patch removes dht::i_partitioner::preserves_order with all its overrides.
The only place that was using this member function was a check in thrift server
and it is safe to remove the check because the check was only done
to differentiate the error message for partitioners that do and do not preserve
the order of the keys.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-04-21 15:50:22 +02:00
Piotr Jastrzebski
2aaf81bf7c dht: Exclude -2^63 value from get_random_token
-2^63 is a value reserved for min/max token boundaries and shouldn't be used for
regular tokens. This patch fixes get_random_token to never create token with
value -2^63. On the way dht::get_random_number template method is removed
because it was exclusively used by get_random_token.

Also use uniform_int_distribution with int64_t instead of uint64_t by using
correct constructor parameter that guarantees values between -2^63+1 and 2^63-1
inclusively.

Tests: unit(dev)

Fixes #6237.
Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <0a1a939355f5005039d5c2c7c513bad94cf60be2.1587302093.git.piotr@scylladb.com>
2020-04-19 18:17:35 +03:00
Rafael Ávila de Espíndola
f3fd466156 dht: Use get_random_number<uint64_t> instead of int64_t in token::get_random_token
I bisect the opposite change in
9c202b52da as the cause of issue 6193. I
don't know why. Maybe get_random_number<signed_type> is buggy?

In any case, reverting to uint64_t solves the issue.

Fixes #6193

Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20200418001611.440733-1-espindola@scylladb.com>
2020-04-19 09:46:06 +03:00
Avi Kivity
88ade3110f treewide: replace calls to engine().some_api() with some_api()
This removes the need to include reactor.hh, a source of compile
time bloat.

In some places, the call is qualified with seastar:: in order
to resolve ambiguities with a local name.

Includes are adjusted to make everything compile. We end up
having 14 translation units including reactor.hh, primarily for
deprecated things like reactor::at_exit().

Ref #1
2020-04-05 12:46:04 +03:00
Piotr Jastrzebski
a15b32c9d9 token: relax the condition of the sanity check
When we switched token representation to int64_t
we added some sanity checks that byte representation
is always 8 bytes long.

It turns out that for token_kind::before_all_keys and
token_kind::after_all_keys bytes can sometimes be empty
because for those tokens they are just ignored. The check
introduced with the change is too strict and sometimes
throws the exception for tokens before/after all keys
created with empty bytes.

This patch relaxes the condition of the check and always
uses 0 as value of _data for special before/after all keys
tokens.

Fixes #6131

Tests: unit(dev, sct)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-04-04 15:50:10 +03:00
Piotr Jastrzebski
e72696a8e6 sharding_info: rename the class to sharder
Also rename all variables that were named si or sinfo
to sharder.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
2e850421a0 i_partitioner:remove embeded sharding_info
sharding_info embeded into partitioner is no longer
used anywhere and can be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
b46b35c55a i_partitioner: remove unused get_sharding_info
Previous patches has removed all the usages of
this function.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
79adee2fae i_partitioner: remove unused shard_count
Previous patches have switched all the calls to
i_partitioner::shard_count to sharding_info::shard_count
and this function can now be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
b7834634ee i_partitioner: remove sharding_ignore_msb
Every place that has previously called this method is now
using sharding_info::sharding_ignore_msb and this function
can be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
2aaa33d02e i_partitioner: remove unused split_ranges_to_shards
The function is never called so it can be safely removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
bdb7e89048 i_partitioner: remove unused shard_of function
Previous patches switched all the places that called
i_partitioner::shard_of to use sharding_info::shard_of
so i_partitioner::shard_of is no longer used and can
be removed.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
88364b6c30 dht::shard_of: use schema::get_sharding_info
i_partitioner::shard_of will be removed so we should
use sharding_info::shard_of instead.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
8b6be90310 i_partitioner: remove unused token_for_next_shard
Previous patches have switched all the places that was
using i_partitioner::token_for_next_shard to
sharding_info::token_for_next_shard. Now the function
can be removed from i_partitioner.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
8a6c377352 split_range_to_single_shard: use sharding info instead of partitioner
The function relies only on i_partitioner::shard_count
and i_partitioner::token_fon_next_shard. Both are really implemented
in sharding_info so the method can use them directly.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 18:42:33 +02:00
Piotr Jastrzebski
dffa9fc880 dht: remove unimplemented split_range_to_single_shard
This method is not implemented anywhere not to mention the usage.
It is the only resonable thing to remove it instead of keeping
an unused and unimplemented declaration.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 09:36:22 +02:00
Piotr Jastrzebski
94ff653b99 selective_token_range_sharder: replace i_partitioner with sharding_info
The class does not depend on partitioning logic but only uses
sharding logic. This means it is possible and desirable to limit its
dependency to only sharding_info.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
2020-03-30 09:36:22 +02:00