Commit Graph

455 Commits

Author SHA1 Message Date
Avi Kivity
9c37fdaca3 Revert "dht: incremental_owned_ranges_checker: use lower_bound()"
This reverts commit d85af3dca4. It
restores the linear search algorithm, as we expect the search to
terminate near the origin. In this case linear search is O(1)
while binary search is O(log n).

A comment is added so we don't repeat the mistake.

Closes #13704
2023-05-02 08:01:44 +03:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Kefu Chai
f5b05cf981 treewide: use defaulted operator!=() and operator==()
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.

fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.

in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.

sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.

also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.

please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.

this change is a cleanup to modernize the code base with C++20
features.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13687
2023-04-27 10:24:46 +03:00
Kefu Chai
5a11d67709 dht: token: s/tri_compare/operator<=>/
now that C++20 is able to generate the default-generated comparing
operators for us. there is no need to define them manually. and,
`std::rel_ops::*` are deprecated in C++20.

also, use `foo <=> bar` instead of `tri_compare(foo, bar)` for better
readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-26 14:09:57 +08:00
Kefu Chai
cc87e10f40 dht: print pk in decorated_key with "pk" prefix
this change ensures that `dk._key` is formatted with the "pk" prefix.
as in 3738fcb, the `operator<<` for partition_key was removed. so the
compiler has to find an alternative when trying to fulfill the needs
when this operator<< is called. fortunately, from the compiler's
perspective, `partition_key` has an `operator managed_bytes_view`, and
this operator does not have the explicit specifier, and,
`managed_bytes_view` does support `operator<<`. so this ends up with a
change in the format of `decorated_key` when it is printed using
`operator<<`. the code compiles. but unfortunately, the behavior is
changed, and it breaks scylla-dtest/cdc_tracing_info_test.py where the
partition_key is supposed to be printed like "pk{010203}" instead of
"010203". the latter is how `managed_bytes_view` is formatted.

a test is added accordingly to avoid future changes which break the
dtest.

Fixes scylladb#13628
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13653
2023-04-25 09:53:47 +02:00
Tomasz Grabiec
fa8ad9a585 dht: Use last_token_of_compaction_group() in split_token_range_msb() 2023-04-24 10:49:37 +02:00
Tomasz Grabiec
fceb5f8cf6 locator: Introduce tablet_metadata
token_metadata now stores tablet metadata with information about
tablets in the system.
2023-04-24 10:49:37 +02:00
Tomasz Grabiec
241f7febec dht: Introduce first_token() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
462e3ffd36 dht: Introduce next_token() 2023-04-24 10:49:36 +02:00
Tomasz Grabiec
d3c9ad4ed6 locator: Rename effective_replication_map to vnode_effective_replication_map
In preparation for introducing a more abstract
effective_replication_map which can describe replication maps which
are not based on vnodes.
2023-04-24 10:49:36 +02:00
Kamil Braun
55f43e532c Merge 'get rid of gms/failure_detector' from Benny Halevy
Move gms::arrival_window to api/failure_detector which is its only user.
and get rid of the rest, which is not used, now that we use direct_failure_detector instead.

TODO: integare direct_failure_detector with failure_detector api.

Closes #13576

* github.com:scylladb/scylladb:
  gms: get rid of unused failure_detector
  api: failure_detector: remove false dependency on failure_detector::arrival_window
  test: rest_api: add test_failure_detector
2023-04-21 11:47:44 +02:00
Benny Halevy
3f1ac846d8 gms: get rid of unused failure_detector
The legacy failure_detector is now unused and can be removed.

TODO: integare direct_failure_detector with failure_detector api.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-21 09:08:27 +03:00
Kefu Chai
fe9f41bd84 dht: remove unnecessarily forward declaration
it turns out the declaration of `operator<<(ostream&, const
dht::token&)` is unnecessarily. so let's drop it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 11:41:54 +08:00
Kefu Chai
53dedca8cd dht: specialize fmt::formatter<dht::token>
this is a part of a series to migrating from `operator<<(ostream&, ..)`
based formatting to fmtlib based formatting. the goal here is to enable
fmtlib to print `dht::token` without the help of `operator<<`.

the corresponding `operator<<()` is preserved in this change, as it
has lots of users in this project, we will tackle them case-by-case in
follow-up changes.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-21 11:41:54 +08:00
Gleb Natapov
fd6d45e178 bootstrapper: Add get_random_bootstrap_tokens function
Does the same as get_bootstrap_tokens() but does not consult
initial token config option. Will be used later.
2023-03-21 16:06:43 +02:00
Kefu Chai
c37f4e5252 treewide: use fmt::join() when appropriate
now that fmtlib provides fmt::join(). see
https://fmt.dev/latest/api.html#_CPPv4I0EN3fmt4joinE9join_viewIN6detail10iterator_tI5RangeEEN6detail10sentinel_tI5RangeEEERR5Range11string_view
there is not need to revent the wheel. so in this change, the homebrew
join() is replaced with fmt::join().

as fmt::join() returns an join_view(), this could improve the
performance under certain circumstances where the fully materialized
string is not needed.

please note, the goal of this change is to use fmt::join(), and this
change does not intend to improve the performance of existing
implementation based on "operator<<" unless the new implementation is
much more complicated. we will address the unnecessarily materialized
strings in a follow-up commit.

some noteworthy things related to this change:

* unlike the existing `join()`, `fmt::join()` returns a view. so we
  have to materialize the view if what we expect is a `sstring`
* `fmt::format()` does not accept a view, so we cannot pass the
  return value of `fmt::join()` to `fmt::format()`
* fmtlib does not format a typed pointer, i.e., it does not format,
  for instance, a `const std::string*`. but operator<<() always print
  a typed pointer. so if we want to format a typed pointer, we either
  need to cast the pointer to `void*` or use `fmt::ptr()`.
* fmtlib is not able to pick up the overload of
  `operator<<(std::ostream& os, const column_definition* cd)`, so we
  have to use a wrapper class of `maybe_column_definition` for printing
  a pointer to `column_definition`. since the overload is only used
  by the two overloads of
  `statement_restrictions::add_single_column_parition_key_restriction()`,
  the operator<< for `const column_definition*` is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-16 20:34:18 +08:00
Kamil Braun
fe14d14ce9 Merge 'Eliminate extraneous copies of dht::token_range_vector' from Benny Halevy
In several places we copy token range vectors where we could move them and eliminate unnecessary memory copies.

Ref #11005

Closes #12344

* github.com:scylladb/scylladb:
  dht/range_streamer: stream_async: move ranges_to_stream to do_streaming
  streaming: stream_session: maybe_yield
  streaming: stream_session: prepare: move token ranges to add_transfer_ranges
  streaming: stream_plan: transfer_ranges: move token ranges towards add_transfer_ranges
  dht/range_streamer: stream_async: do_streaming: move ranges downstream
  dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace
  dht/range_streamer: get_range_fetch_map: reduce copies
  dht/range_streamer: add_ranges: move ranges down-stream
  dht/boot_strapper: move ranges to add_ranges
  dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining
  dht/range_streamer: stream_async: erase from range_vec only after do_streaming success
2023-03-07 13:46:33 +01:00
Kefu Chai
c5d1a69859 build: cmake: link couple libraries as whole archive
turns out we are using static variables to register entries in
global registries, and these variables are not directly referenced,
so linker just drops them when linking the executables or shared
libraries. to address this problem, we just link the whole archive.
another option would be create a linker script or pass
--undefined=<symbol> to linker. neither of them is straightforward.

a helper function is introduced to do this, as we cannot use CMake
3.24 as yet.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-04 13:11:25 +08:00
Kefu Chai
563fbb2d11 build: cmake: extract more subsystem out into its own CMakeLists.txt
namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica,
service, tools, tracing and transport.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-03-02 10:15:25 +08:00
Kefu Chai
d85af3dca4 dht: incremental_owned_ranges_checker: use lower_bound()
instead of using a while loop for finding the lower_bound,
just use std::lower_bound() for finding if current node owns given
token. this has two advantages:

* better readability: as lower_bound is exactly what this loop
  calculates.
* lower_bound uses binary search for searching the element,
  this algorithm should be faster than linear under most
  circumstances.
* lower_bound uses std::advance() and prefix increment operator,
  this should be more performant than the postfix increment operator.
  as it does not create an temporary instance of iterator.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #13008
2023-03-01 11:29:46 +02:00
Benny Halevy
06a0902708 dht/range_streamer: stream_async: move ranges_to_stream to do_streaming
Currently the ranges_to_stream variable lives
on the caller state, and do_streaming() moves its
contents down to request_ranges/transfer_ranges
and then calls clear() to make it ready for reuse.

This works in principle but it makes it harder
for an occasional reader of this code to figure out
what going on.

This change transfers control of the ranges_to_stream vector
to do_streaming, by calling it with (std::exchange(do_streaming, {}))
and with that that moved vector doesn't need to be cleared by
do_streaming, and the caller is reponsible for readying
the variable for reuse in its for loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:38:34 +02:00
Benny Halevy
775c6b9697 dht/range_streamer: stream_async: do_streaming: move ranges downstream
The ranges can be moved rather than copied to both
`request_ranges` and `transfer_ranges` as they are only cleared
after this point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:56:55 +02:00
Benny Halevy
3cd8838a09 dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace
After calling get_range_fetch_map, ranges_for_keyspace
is not used anymore.
Synchronously destroying it may potentially stall in large clusters
so use utils::clear_gently to gently clear the map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:30 +02:00
Benny Halevy
a80c2d16dd dht/range_streamer: get_range_fetch_map: reduce copies
Use const& to refer to the input ranges and endpoints
rather than copying them individually along the way
more than needed to.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:30 +02:00
Benny Halevy
9d6e5d50d1 dht/range_streamer: add_ranges: move ranges down-stream
Eliminate extraneous copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:27 +02:00
Benny Halevy
c61f058aa5 dht/boot_strapper: move ranges to add_ranges
Eliminate extraneous copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Benny Halevy
27b382dcce dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining
Rather than calling nr_ranges_to_stream() inside `do_streaming`.
As nr_ranges_to_stream depends on the `_to_stream` that will be updated
only later on after the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Benny Halevy
c3c7efffb1 dht/range_streamer: stream_async: erase from range_vec only after do_streaming success
range_vec is used for calculating nr_ranges_to_stream.
Currently, the ranges_to_stream that were
moved out of range_vec are push back on exception,
but this isn't safe, since they may have moved already
to request_ranges or transfer_ranges.

Instead, erase the ranges we pass to do_streaming
only after it succeeds so on exception, range_vec
will not need adjusting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Kefu Chai
df63e2ba27 types: move types.{cc,hh} into types
they are part of the CQL type system, and are "closer" to types.
let's move them into "types" directory.

the building systems are updated accordingly.

the source files referencing `types.hh` were updated using following
command:

```
find . -name "*.{cc,hh}" -exec sed -i 's/\"types.hh\"/\"types\/types.hh\"/' {} +
```

the source files under sstables include "types.hh", which is
indeed the one located under "sstables", so include "sstables/types.hh"
instea, so it's more explicit.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #12926
2023-02-19 21:05:45 +02:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Botond Dénes
c927eea1d5 Merge 'table: trim ranges for compaction group cleanup' from Benny Halevy
This series contains the following changes for trimming the ranges passed to cleanup a compaction group to the compaction group owned token_range.

table: compaction_group_for_token: use signed arithmetic
Fixes #12595

table: make_compaction_groups: calculate compaction_group token ranges
table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries
Fixes #12594

Closes #12598

* github.com:scylladb/scylladb:
  table: perform_cleanup_compaction: trim owned ranges on compaction_group boundaries
  table: make_compaction_groups: calculate compaction_group token ranges
  dht: range_streamer: define logger as static
2023-01-30 13:11:28 +02:00
Benny Halevy
82011fc489 dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const
Its _it member keeps state about the current range.
Although it's modified by the method, this is an implementation
detail that irrelevant to the caller, hence mark the
belongs_to_current_node method as const (and noexcept while
at it).

This allows the caller, cleanup_compaction, to use it from
inside a const method, without having to mark
its respective member as mutable too.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12634
2023-01-25 14:52:21 +02:00
Benny Halevy
95a8e0b21d table: make_compaction_groups: calculate compaction_group token ranges
Add dht::split_token_range_msb that returns a token_range_vector
with ranges split using a given number of most-significant bits.

When creating the table's compaction groups, use dht::split_token_range_msb
to calculate the token_range owned by each compaction_group.

Refs #12594

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-22 22:54:26 +02:00
Benny Halevy
912b56ebcf dht: range_streamer: define logger as static
dht::logger can't be global in this case,
as it's too generic, but should be static
to range_streamer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-22 22:54:26 +02:00
Benny Halevy
8009585e7d table: compaction_group_for_token: use signed arithmetic
Add and use dht::compaction_group_of that computes the
compaction_group index by unbiasing the token,
similar to dht::shard_of.

This way, all tokens in `_compaction_groups[i]` are ordered
before `_compaction_groups[j]` iff i < j.

Fixes #12595

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12599
2023-01-22 11:27:07 +02:00
Botond Dénes
50b155e706 dht/i_partitioner.hh: ring_position_ext: add weight() accessor 2023-01-09 09:46:57 -05:00
Benny Halevy
57ff3f240f dht: optimize subtract_ranges
Take advantage of the fact that both ranges and
ranges_to_subtract are deoverlapped and sorted by
to reduce the calculation complexity from
quadratic to linear.

Fixes #11922

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-21 15:48:28 +02:00
Benny Halevy
8b81635d95 compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation
The algorithm is generic and can be used elsewhere.

Add a unit test for the function before it gets
optimized in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-21 15:48:26 +02:00
Benny Halevy
10f8f13b90 db: view_update_generator: always clean up staging sstables
Since they are currently not cleaned up by cleanup compaction
filter their tokens, processing only tokens owned by the
current node (based on the keyspace replication strategy).

Refs #9559

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 07:38:22 +02:00
Benny Halevy
fd3e66b0cc compaction: extract incremental_owned_ranges_checker out to dht
It is currently used by cleanup_compaction partition filter.
Factor it out so it can be used to filter staging sstables in
the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 07:32:56 +02:00
Botond Dénes
169a8a66f2 compatible_ring_position_or_view: make it cheap to copy
This class exists for one purpose only: to serve as glue code between
dht::ring_position and boost::icl::interval_map. The latter requires
that keys in its intervals are:
* default constructible
* copyable
* have standalone compare operations

For this reason we have to wrap `dht::ring_position` in a class,
together with a schema to provide all this. This is
`compatible_ring_position`. There is one further requirement by code
using the interval map: it wants to do lookups without copying the
lookup key(s). To solve this, we came up with
`compatible_ring_position_or_view` which is a union of a key or a key
view + schema. As we recently found out, boost::icl copies its keys **a
lot**. It seems to assume these keys are cheap to copy and carelessly
copies them around even when iterating over the map. But
`compatible_ring_position_or_view` is not cheap to copy as it copies a
`dht::ring_position` which allocates, and it does that via an
`std::optional` and `std::variant` to add insult to injury.
This patch make said class cheap to copy, by getting rid of the variant
and storing the `dht::ring_position` via a shared pointer. The view is
stored separately and either points to the ring position stored in the
shared pointer or to an outside ring position (for lookups).

Fixes: #11669

Closes #11670
2022-10-04 12:00:21 +03:00
Asias He
9ed401c4b2 streaming: Add finished percentage metrics for node ops using streaming
We have added the finished percentage for repair based node operations.

This patch adds the finished percentage for node ops using the old
streaming.

Example output:

scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945
scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000

In addition to the metrics, log shows the percentage is added.

[shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646

Fixes #11600

Closes #11601
2022-09-22 14:19:34 +03:00
Pavel Emelyanov
b6fdea9a79 code: Call sort_endpoints_by_proximity() via topology
The method is about to be moved from snitch to topology, this patch
prepares the rest of the code to use the latter to call it. The
topology's method just calls snitch, but it's going to change in the
next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:14:01 +03:00
Pavel Emelyanov
4184091f1c snitch, code: Remove get_sorted_list_by_proximity()
There are two sorting methods in snitch -- one sorts the list of
addresses in place, the other one creates a sorted copy of the passed
const list (in fact -- the passed reference is not const, but it's not
modified by the method). However, both callers of the latter anyway
create their own temporary list of address, so they don't really benefit
from snitch generating another copy.

So this patch leaves just one sorting method -- the in-place one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:11:37 +03:00
Pavel Emelyanov
6dedc69608 topology: Do not add bootstrapping nodes to topology
Recent change in topology (commit 4cbe6ee9 titled
"topology: Require entry in the map for update_normal_tokens()")
made token_metadata::update_normal_tokens() require the entry presense
in the embedded topology object. Respectively, the commit in question
equipped most callers of update_normal_tokens() with preceeding
topology update call to satisfy the requirement.

However, tokens are put into token_metadata not only for normal state,
but also for bootstrapping, and one place that added bootstrapping
tokens errorneously got topology update. This is wrong -- node must
not be present in the topology until switching into normal state. As
the result several tests with bootstrapping nodes started to fail.

The fix removes topology update for bootstrapping nodes, but this
change reveals few other places that piggy-backed this mistaken
update, so noy _they_ need to update topology themselves.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2040/
       update_cluster_layout_tests.py::test_simple_add_new_node_while_schema_changes_with_repair
       update_cluster_layout_tests.py::test_simple_kill_new_node_while_bootstrapping_with_parallel_writes_in_multidc
       repair_based_node_operations_test.py::test_lcs_reshape_efficiency

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220902082753.17827-1-xemul@scylladb.com>
2022-09-04 13:53:38 +03:00
Pavel Emelyanov
7305061674 replication_strategy: Accept dc-rack as get_pending_address_ranges argument
The method creates a copy of token metadata and pushes an endpoint (with
some tokens) into it. Next patches will require providing dc/rack info
together with the endpoint, this patch prepares for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:39:44 +03:00
Pavel Emelyanov
360c4f8608 dht: Carry dc-rack over boot_strapper and range_streamer
Both classes may populate (temporarly clones of) token metadata object
with endpoint:tokens pairs for the endpoint they work with. Next patches
will require that endpoint comes with the dc/rack info. This patch makes
sure dht classes have the necessary information at hand (for now it's
just empty pair of strings).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:37:02 +03:00
Benny Halevy
91ab8ee1c3 effective_replication_map: make get_range_addresses asynchronous
So it may yield, preenting reactor stalls as seen in #11005.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
9b2af3f542 range_streamer: add_ranges and friends: get erm as param
Rather than getting it in the callee, let the caller
(e.g.  storage_service)
hold the erm and pass it down to potentially multiple
async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
7ee6048255 database: add get_non_local_strategy_keyspaces
For node operations, we currently call get_non_system_keyspaces
but really want to work on all keyspace that have non-local
replication strategy as they are replicated on other nodes.

Reflect that in the replica::database function name.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00