Commit Graph

36 Commits

Author SHA1 Message Date
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Pavel Emelyanov
ffc9cc9aec range-streamer: Remove global storage service reference
The reference is used by range streamer and (!) storage
service itself to find out if the consistent_rangemovement
option is ON/OFF.

Both places already have the database with config at hands
and can be simplified.

v2: spellchecking

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20210212095403.22662-1-xemul@scylladb.com>
2021-02-12 15:50:30 +01:00
Benny Halevy
63137b35ea range_streamer: convert to token_metadata_ptr
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-11-11 14:20:23 +02:00
Benny Halevy
569f2830c1 range_streamer: keep a const token_metadata&
range_streamer doesn't need to modify toekn_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2020-08-20 16:20:34 +03:00
Piotr Jastrzebski
c001374636 codebase wide: replace count with contains
C++20 introduced `contains` member functions for maps and sets for
checking whether an element is present in the collection. Previously
`count` function was often used in various ways.

`contains` does not only express the intend of the code better but also
does it in more unified way.

This commit replaces all the occurences of the `count` with the
`contains`.

Tests: unit(dev)

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <b4ef3b4bc24f49abe04a2aba0ddd946009c9fcb2.1597314640.git.piotr@scylladb.com>
2020-08-15 20:26:02 +03:00
Asias He
81f0260816 range_streamer: Handle table of RF 1 in get_range_fetch_map
After "Make replacing node take writes" series, with repair based node
operations disabled, we saw the replace operation fail like:

```
[shard 0] init - Startup failed: std::runtime_error (unable to find
sufficient sources for streaming range (9203926935651910749, +inf) in
keyspace system_auth)
```
The reason is the system_auth keyspace has default RF of 1. It is
impossible to find a source node to stream from for the ranges owned by
the replaced node.

In the past, the replace operation with keyspace of RF 1 passes, because
the replacing node calls token_metadata.update_normal_tokens(tokens,
ip_of_replacing_node) before streaming. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9021954492552185543, -9016289150131785593] exists on {127.0.0.6}
```

Node 127.0.0.6 is the replacing node 127.0.0.5. The source node check in
range_streamer::get_range_fetch_map will pass if the source is the node
itself. However, it will not stream from the node itself. As a result,
the system_auth keyspace will not get any data.

After the "Make replacing node take writes" series, the replacing node
calls token_metadata.update_normal_tokens(tokens, ip_of_replacing_node)
after the streaming finishes. We saw:

```
[shard 0] range_streamer - Bootstrap : keyspace system_auth range
(-9049647518073030406, -9048297455405660225] exists on {127.0.0.5}
```

Since 127.0.0.5 was dead, the source node check failed, so the bootstrap
operation.

Ta fix, we ignore the keyspace of RF 1 when it is unable to find a source
node to stream.

Fixes #6351
2020-05-22 09:30:52 +08:00
Avi Kivity
c32f9a8f7b dht: check for aborts during streaming
Propagate the abort_source from main() into boot_strapper and range_stream and
check for aborts at strategic points. This includes aborting running stream_plans
and aborting sleeps between retries.

Fixes #4674
2019-08-18 20:41:07 +03:00
Asias He
b2c110699e gms: Remove i_failure_detector.hh
It is not used any more.
2019-03-22 09:08:51 +08:00
Asias He
2b6a4050c2 dht: Do not use failure_detector::is_alive in failure_detector_source_filter
Switch failure_detector_source_filter to use get_local_gossiper::is_alive
directly since we are going to remove the static
gms::get_local_failure_detector object soon.
Pass the nodes that are down to the filter direclty, to avoid the
range_streamer to depends on gossiper at all.
2019-03-22 08:26:47 +08:00
Asias He
7f826d3343 streaming: Expose reason for streaming
On receiving a mutation_fragment or a mutation triggered by a streaming
operation, we pass an enum stream_reason to notify the receiver what
the streaming is used for. So the receiver can decide further operation,
e.g., send view updates, beyond applying the streaming data on disk.

Fixes #3276
Message-Id: <f15ebcdee25e87a033dcdd066770114a499881c0.1539498866.git.asias@scylladb.com>
2018-10-15 22:03:28 +01:00
Asias He
8edf3defdf range_streamer: Futurize add_ranges
It might take long time for get_all_ranges_with_sources_for and
get_all_ranges_with_strict_sources_for to calculate which cause reactor
stall. To fix, run them in a thread and yield. Those functions are used in
the slow path, it is ok to yield more than needed.

Fixes #3639

Message-Id: <63aa7794906ac020c9d9b2984e1351a8298a249b.1536135617.git.asias@scylladb.com>
2018-10-09 09:46:50 +03:00
Asias He
95849371aa range_streamer: Remove unordered_multimap usage
We need the mapping between dht::token_range to
std::vector<inet_address> and inet_address to dht::token_range_vector in
various places. Currently, we use std::unordered_multimap and convert to
std::unordered_map. It is better to use std::unordered_map in the first
place. The changes like below:

- Change from

  std::unordered_multimap<dht::token_range, inet_address>

to

  std::unordered_map<dht::token_range, std::vector<inet_address>>

- Change from

   std::unordered_multimap<inet_address, dht::token_range>

to

   std::unordered_map<inet_address, dht::token_range_vector>

Message-Id: <b8ecc41775e46ec064db3ee07510c404583390aa.1533106019.git.asias@scylladb.com>
2018-08-01 13:01:41 +03:00
Asias He
4a0b561376 storage_service: Get rid of moving operation
The moving operation changes a node's token to a new token. It is
supported only when a node has one token. The legacy moving operation is
useful in the early days before the vnode is introduced where a node has
only one token. I don't think it is useful anymore.

In the future, we might support adjusting the number of vnodes to reblance
the token range each node owns.

Removing it simplifies the cluster operation logic and code.

Fixes #3475

Message-Id: <144d3bea4140eda550770b866ec30e961933401d.1533111227.git.asias@scylladb.com>
2018-08-01 11:18:17 +03:00
Asias He
1f06ee3960 range_streamer: Limit nr of nodes to stream in parallel
For example, to bootstrap a 50th node in a cluster

 [shard 0] range_streamer - Bootstrap with
 [127.0.0.8, 127.0.0.2, 127.0.0.24, 127.0.0.21, 127.0.0.49, 127.0.0.44,
 127.0.0.9, 127.0.0.7, 127.0.0.47, 127.0.0.15, 127.0.0.5, 127.0.0.30,
 127.0.0.14, 127.0.0.12, 127.0.0.36, 127.0.0.11, 127.0.0.48, 127.0.0.28,
 127.0.0.33, 127.0.0.10, 127.0.0.41, 127.0.0.4, 127.0.0.40, 127.0.0.3,
 127.0.0.6, 127.0.0.43, 127.0.0.22, 127.0.0.26, 127.0.0.42, 127.0.0.25,
 127.0.0.17, 127.0.0.37, 127.0.0.23, 127.0.0.13, 127.0.0.38, 127.0.0.1,
 127.0.0.18, 127.0.0.20, 127.0.0.39, 127.0.0.27, 127.0.0.34, 127.0.0.32,
 127.0.0.19, 127.0.0.16, 127.0.0.31, 127.0.0.45, 127.0.0.29, 127.0.0.35,
 127.0.0.46]
 for keyspace=keyspace1 started, nodes_to_stream=49, nodes_in_parallel=49

the new node will get data from 49 existing nodes.

Currently, it will stream from all the 49 existing nodes at the same
time. It is not a good idea to stream from all the nodes in parallel
which can overwhelm the bootstrap node, i.e., 49 nodes sending, 1 node
receiving.

To fix this, limit the nr of nodes to stream in parallel. We should have
a better control over the memory usage and parallelism. But for now,
limit the nr of nodes to a maximum of 16 as a starter. With this limit,
each shard can work with as many as 16 remote nodes in parallel, I think
this has enough parallelism for streaming in terms of performance.

This change have effect on the bootstrap/decommission/removenode node
operations, and do not have effect on repair.

Refs #2782

Message-Id: <980610dc97490d4f16281a0c3203b9bee73e04e4.1531989557.git.asias@scylladb.com>
2018-07-19 11:44:05 +03:00
Asias He
d23dafa7ac dht: Remove column_families parameter in add_rx_ranges and add_tx_ranges
In 4b1034b (storage_service: Remove the stream_hints), we removed the
only user of the api with the column_families parameter.

std::vector column_families = { db::system_keyspace::HINTS };
streamer->add_tx_ranges(keyspace, std::move(ranges_per_endpoint),
column_families);

We can simplify the code range_streamer a bit by removing it.

Fixes #3476

Tests: dtest update_cluster_layout_tests.py
Message-Id: <c81d79c5e6dbc8dd78c1242837de892e39d6abd2.1528356342.git.asias@scylladb.com>
2018-06-10 14:53:40 +03:00
Asias He
9b5585ebd5 range_streamer: Stream 10% of ranges instead of 10 ranges per time
If there are a lot of ranges, e.g., num_tokens=2048, 10 ranges per
stream plan will cause tons of stream plan to be created to stream data,
each having very few data. This cause each stream plan has low transfer
bandwidth, so that the total time to complete the streaming increases.

It makes more sense to send a percentage of the total ranges per stream
plan than a fixed ranges.

Here is an example to stream a keyspace with 513 ranges in
total, 10 ranges v.s. 10% ranges:

Before:
[shard 0] range_streamer - Bootstrap with 127.0.0.1 for
keyspace=system_traces, 510 out of 513 ranges: ranges = 51
[shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1
succeeded, took 107 seconds

After:
[shard 0] range_streamer - Bootstrap with 127.0.0.1 for
keyspace=system_traces, 510 out of 513 ranges: ranges = 10
[shard 0] range_streamer - Bootstrap with ks for keyspace=127.0.0.1
succeeded, took 22 seconds

Message-Id: <a890b84fbac0f3c3cc4021e30dbf4cdf135b93ea.1520992228.git.asias@scylladb.com>
2018-03-14 10:12:12 +02:00
Asias He
6810031ba7 dht: Extend range_streamer interface
After this patch and the following patches to use the new
range_streamder interface, all the following cluster operations:

- bootstrap
- rebuild
- decommission
- removenode

will use the same code to do the streaming.

The range_streamer is now extended to support both fetch from and push
to peer node. Another big change is now the range_streamer will stream
less ranges at a time, so less data, per stream_plan and range_streamer
will remember which ranges are failed to stream and can retry later.

The retry policy is very simple at the moment it retries at most 5 times
and sleep 1 minutes, 1.5^2 minutes, 1.5^3 minutes ....

Later, we can introduce api for user to decide when to stop retrying and
the retry interval.

The benefits:

- All the cluster operation shares the same code to stream

- We can know the operation progress, e.g., we can know total number of
  ranges need to be streamed and number of ranges finished in
  bootstrap, decommission and etc.
- All the cluster operation can survive peer node down during the
  operation which usually takes long time to complete, e.g., when adding
  a new node, currently if any of the existing node which streams data to
  the new node had issue sending data to the new node, the whole bootstrap
  process will fail. After this patch, we can fix the problematic node
  and restart it, the joining node will retry streaming from the node
  again.
- We can fail streaming early and timeout early and retry less because
  all the operations use stream can survive failure of a single
  stream_plan. It is not that important for now to have to make a single
  stream_plan successful. Note, another user of streaming, repair, is now
  using small stream_plan as well and can rerun the repair for the
  failed ranges too.

This is one step closer to supporting the resumable add/remove node
opeartions.
2017-08-07 16:31:47 +08:00
Asias He
937f28d2f1 Convert to use dht::partition_range_vector and dht::token_range_vector 2016-12-19 14:08:50 +08:00
Asias He
d1178fa299 Convert to use dht::token_range 2016-12-19 08:04:29 +08:00
Avi Kivity
a35136533d Convert ring_position and token ranges to be nonwrapping
Wrapping ranges are a pain, so we are moving wrap handling to the edges.

Since cql can't generate wrapping ranges, this means thrift and the ring
maintenance code; also range->ring transformations need to merge the first
and last ranges.

Message-Id: <1478105905-31613-1-git-send-email-avi@scylladb.com>
2016-11-02 21:04:11 +02:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Asias He
bdd6a69af7 streaming: Drop unused parameters
- int connections_per_host

Scylla does not create connections per stream_session, instead it uses
rpc, thus connections_per_host is not relevant to scylla.

- bool keep_ss_table_level
- int repaired_at

Scylla does not stream sstable files. They are not relevant to scylla.
2016-01-25 11:38:13 +08:00
Asias He
0af7fb5509 range_streamer: Kill FIXME in use_strict_consistency for consistent_rangemovement 2015-11-30 09:15:42 +08:00
Asias He
6aa5bfe59f range_streamer: Add virtual destructor to i_source_filter
Found by debug build

==10190==ERROR: AddressSanitizer: new-delete-type-mismatch on 0x602000084430 in thread T0:
  object passed to delete has wrong type:
  size of the allocated type:   16 bytes;
  size of the deallocated type: 8 bytes.
    #0 0x7fe244add512 in operator delete(void*, unsigned long) (/lib64/libasan.so.2+0x9a512)
    #1 0x3c674fe in std::default_delete<dht::range_streamer::i_source_filter>::operator()(dht::range_streamer::i_source_filter*)
       const /usr/include/c++/5.1.1/bits/unique_ptr.h:76
    #2 0x3c60584 in std::unique_ptr<dht::range_streamer::i_source_filter, std::default_delete<dht::range_streamer::i_source_filter> >::~unique_ptr()
       /usr/include/c++/5.1.1/bits/unique_ptr.h:236
    #3 0x3c7ac22 in void __gnu_cxx::new_allocator<std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> > >::destroy<std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> > >(std::unique_ptr<dht::range_streamer::i_source_filter,
       std::default_delete<dht::range_streamer::i_source_filter> >*) /usr/include/c++/5.1.1/ext/new_allocator.h:124
...
2015-11-12 11:19:22 +02:00
Asias He
d166b0f3fa range_streamer: Add get_work_map 2015-11-09 08:43:04 +08:00
Asias He
b172146223 range_streamer: Introduce single_datacenter_filter 2015-10-23 16:13:30 +08:00
Asias He
dc02d76aee range_streamer: Implement fetch_async
It is used by boot_strapper::bootstrap() in bootstrap process to start
the streaming.
2015-10-13 15:45:56 +08:00
Asias He
887c0a36ec range_streamer: Implement add_ranges
It is used by boot_strapper::bootstrap() in bootstrap process.
2015-10-13 15:45:56 +08:00
Asias He
1c1f9bed09 range_streamer: Implement use_strict_sources_for_ranges 2015-10-13 15:45:55 +08:00
Asias He
43d4e62b5a range_streamer: Implement add_source_filter 2015-10-13 15:45:55 +08:00
Asias He
d47ea88aa8 range_streamer: Implement get_all_ranges_with_strict_sources_for 2015-10-13 15:45:55 +08:00
Asias He
84de936e43 range_streamer: Implement get_all_ranges_with_sources_for 2015-10-13 15:45:55 +08:00
Asias He
944e28cd6c range_streamer: Implement get_range_fetch_map 2015-10-13 15:45:55 +08:00
Asias He
d986a4d875 range_streamer: Add constructor 2015-10-13 15:45:55 +08:00
Asias He
1d6c081766 range_streamer: Add i_source_filter and failure_detector_source_filter
It is used to filter out unwanted node.
2015-10-13 15:45:55 +08:00
Asias He
c8b9a6fa06 dht: Convert RangeStreamer to C++ 2015-10-13 15:45:55 +08:00