114 Commits

Author SHA1 Message Date
Benny Halevy
6dbbb80aae locator: abstract_replication_strategy: implement local_replication_strategy
Derive both vnode_effective_replication_map
and local_effective_replication_map from
static_effective_replication_map as both are static and per-keyspace.

However, local_effective_replication_map does not need vnodes
for the mapping of all tokens to the local node.

Note that everywhere_replication_strategy is not abstracted in a similar
way, although it could, since the plan is to get rid of it
once all system keyspaces areconverted to local or tablets replication
(and propagated everywhere if needed using raft group0)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:05:11 +03:00
Benny Halevy
cbad497859 locator: abstract_replication_strategy: rename vnode_effective_replication_map_ptr et. al
to static_effective_replication_map_ptr, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 16:03:53 +03:00
Benny Halevy
bd62421c05 keyspace: rename get_vnode_effective_replication_map
to get_static_effective_replication_map, in preparation
for separating local_effective_replication_map from
vnode_effective_replication_map (both are per-keyspace).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:40:43 +03:00
Benny Halevy
33f34c8c32 dht: range_streamer: use naked e_r_m pointers
Prepare for following patch that will separate
the local effective replication map from
vnode_effective_replication_map.

The caller is responsible to keep the
effective_replication_map_ptr alive while
in use by low-level async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-08-06 13:34:23 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Gleb Natapov
41a57ed2e8 streaming: move streaming code to use host ids instead of host ips
The patch is rather large, but it is a straightforward conversion from
one type to another.
2024-12-15 11:31:11 +02:00
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Kefu Chai
5cd619a60c treewide: s/boost::adaptors::map_keys/std::views::keys/
now that we are allowed to use C++23. we now have the luxury of using
`std::views::keys`.

in this change, we:

- replace `boost::adaptors::map_keys` with `std::views::keys`
- update affected code to work with `std::views::keys`

to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.

this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21198
2024-10-21 12:47:52 +03:00
Avi Kivity
c3be2489ce treewide: drop includes of <boost/range/adaptors.hpp>
This includes way too much, including <boost/regex.hpp>, which is huge.
Drop includes of adaptors.hpp and replace by what is needed.

Closes scylladb/scylladb#21187
2024-10-20 17:17:11 +03:00
Patryk Jędrzejczak
366605224c token_metadata: rename get_all_endpoints and get_all_ips
In one of the following patches, we introduce support for zero-token
nodes. A zero-token node that has successfully joined the cluster is
in the normal state but is not a normal token owner. Hence, the names
of `get_all_endpoints` and `get_all_ips` become misleading. They
should specify that the functions return only IDs/IPs of token owners.
2024-08-29 10:37:07 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Benny Halevy
adc1d7f68f token: move ordering operator inline
Token comparisons are abundant.
The equality operator is defined inline
in dht/token.hh by calling `t1 <=> t2`,
and so is `tri_compare_raw`, which `operator<=>`
calls in the common path, but `operator<=>` itself
is defined out of line, losing the benefits of inlining.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-07-20 21:21:42 +03:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Kefu Chai
1ce58595aa dht: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16891
2024-01-21 16:56:16 +02:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
93263bf9e7 bootstrap: use new token_metadata
Just mechanical changes to the new token_metadata.

All the boost and topology tests pass with this change.
2023-12-12 23:19:53 +04:00
Avi Kivity
9c0f05efa1 Merge 'Track tablet streaming under global sessions to prevent side-effects of failed streaming' from Tomasz Grabiec
Tablet streaming involves asynchronous RPCs to other replicas which transfer writes. We want side-effects from streaming only within the migration stage in which the streaming was started. This is currently not guaranteed on failure. When streaming master fails (e.g. due to RPC failing), it can be that some streaming work is still alive somewhere (e.g. RPC on wire) and will have side-effects at some point later.

This PR implements tracking of all operations involved in streaming which may have side-effects, which allows the topology change coordinator to fence them and wait for them to complete if they were already admitted.

The tracking and fencing is implemented by using global "sessions", created for streaming of a single tablet. Session is globally identified by UUID. The identifier is assigned by the topology change coordinator, and stored in system.tablets. Sessions are created and closed based on group0 state (tablet metadata) by the barrier command sent to each replica, which we already do on transitions between stages. Also, each barrier waits for sessions which have been closed to be drained.

The barrier is blocked only if there is some session with work which was left behind by unsuccessful streaming. In which case it should not be blocked for long, because streaming process checks often if the guard was left behind and stops if it was.

This mechanism of tracking is fault-tolerant: session id is stored in group0, so coordinator can make progress on failover. The barriers guarantee that session exists on all replicas, and that it will be closed on all replicas.

Closes scylladb/scylladb#15847

* github.com:scylladb/scylladb:
  test: tablets: Add test for failed streaming being fenced away
  error_injection: Introduce poll_for_message()
  error_injection: Make is_enabled() public
  api: Add API to kill connection to a particular host
  range_streamer: Do not block topology change barriers around streaming
  range_streamer, tablets: Do not keep token metadata around streaming
  tablets: Fail gracefully when migrating tablet has no pending replica
  storage_service, api: Add API to disable tablet balancing
  storage_service, api: Add API to migrate a tablet
  storage_service, raft topology: Run streaming under session topology guard
  storage_service, tablets: Use session to guard tablet streaming
  tablets: Add per-tablet session id field to tablet metadata
  service: range_streamer: Propagate topology_guard to receivers
  streaming: Always close the rpc::sink
  storage_service: Introduce concept of a topology_guard
  storage_service: Introduce session concept
  tablets: Fix topology_metadata_guard holding on to the old erm
  docs: Document the topology_guard mechanism
2023-12-07 16:29:02 +02:00
Tomasz Grabiec
c228f2c940 range_streamer, tablets: Do not keep token metadata around streaming
It holds back global token metadata barrier during streaming, which
limits parallelism of load balancing.
2023-12-06 18:36:17 +01:00
Tomasz Grabiec
fd3c089ccc service: range_streamer: Propagate topology_guard to receivers 2023-12-06 18:36:16 +01:00
Benny Halevy
e1239e63bf dht/range_streamer: use locator::topology rather than fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 09:01:31 +02:00
Avi Kivity
78fc3b5f56 config: rename stream_plan_ranges_percentage to *_fraction
The value is specified as a fraction between 0 and 1, so don't
mislead users into specifying a value between 0 and 100.

Closes #15261
2023-09-03 23:24:29 +03:00
Tomasz Grabiec
6d545b2f9e storage_service: Implement stream_tablet RPC
Performs streaming of data for a single tablet between two tablet
replicas. The node which gets the RPC is the receiving replica.
2023-07-25 21:08:51 +02:00
Asias He
dad5caf141 streaming: Add stream_plan_ranges_percentage
This option allows user to change the number of ranges to stream in
batch per stream plan.

Currently, each stream plan streams 10% of the total ranges.

With more ranges per stream plan, it reduces the waiting time between
two stream plans. For example,

stream_plan1: shard0 (t0), shard1 (t1)
stream_plan2: shard0 (t2), shard1 (t3)

We start stream_plan2 after all shards finish streaming in stream_plan1.
If shard0 and shard1 in stream_plan1 finishes at different time. One of
the shards will be idle.

If we stream more ranges in a single stream plan, the waiting time will
be reduced.

Previously, we retry the stream plan if one of the stream plans is
failed. That's one of the reasons we want more stream plans. With RBNO
and 1f8b529e08 (range_streamer: Disable restream logic), the
restream factor is not important anymore.

Also, more ranges in a single stream plan will create bigger but fewer
sstables on the receiver side.

The default value is the same as before: 10% percentage of total ranges.

Fixes #14191

Closes #14402
2023-07-14 09:03:01 +03:00
Kamil Braun
30cc07b40d Merge 'Introduce tablets' from Tomasz Grabiec
This PR introduces an experimental feature called "tablets". Tablets are
a way to distribute data in the cluster, which is an alternative to the
current vnode-based replication. Vnode-based replication strategy tries
to evenly distribute the global token space shared by all tables among
nodes and shards. With tablets, the aim is to start from a different
side. Divide resources of replica-shard into tablets, with a goal of
having a fixed target tablet size, and then assign those tablets to
serve fragments of tables (also called tablets). This will allow us to
balance the load in a more flexible manner, by moving individual tablets
around. Also, unlike with vnode ranges, tablet replicas live on a
particular shard on a given node, which will allow us to bind raft
groups to tablets. Those goals are not yet achieved with this PR, but it
lays the ground for this.

Things achieved in this PR:

  - You can start a cluster and create a keyspace whose tables will use
    tablet-based replication. This is done by setting `initial_tablets`
    option:

    ```
        CREATE KEYSPACE test WITH replication = {'class': 'NetworkTopologyStrategy',
                        'replication_factor': 3,
                        'initial_tablets': 8};
    ```

    All tables created in such a keyspace will be tablet-based.

    Tablet-based replication is a trait, not a separate replication
    strategy. Tablets don't change the spirit of replication strategy, it
    just alters the way in which data ownership is managed. In theory, we
    could use it for other strategies as well like
    EverywhereReplicationStrategy. Currently, only NetworkTopologyStrategy
    is augmented to support tablets.

  - You can create and drop tablet-based tables (no DDL language changes)

  - DML / DQL work with tablet-based tables

    Replicas for tablet-based tables are chosen from tablet metadata
    instead of token metadata

Things which are not yet implemented:

  - handling of views, indexes, CDC created on tablet-based tables
  - sharding is done using the old method, it ignores the shard allocated in tablet metadata
  - node operations (topology changes, repair, rebuild) are not handling tablet-based tables
  - not integrated with compaction groups
  - tablet allocator piggy-backs on tokens to choose replicas.
    Eventually we want to allocate based on current load, not statically

Closes #13387

* github.com:scylladb/scylladb:
  test: topology: Introduce test_tablets.py
  raft: Introduce 'raft_server_force_snapshot' error injection
  locator: network_topology_strategy: Support tablet replication
  service: Introduce tablet_allocator
  locator: Introduce tablet_aware_replication_strategy
  locator: Extract maybe_remove_node_being_replaced()
  dht: token_metadata: Introduce get_my_id()
  migration_manager: Send tablet metadata as part of schema pull
  storage_service: Load tablet metadata when reloading topology state
  storage_service: Load tablet metadata on boot and from group0 changes
  db, migration_manager: Notify about tablet metadata changes via migration_listener::on_update_tablet_metadata()
  migration_notifier: Introduce before_drop_keyspace()
  migration_manager: Make prepare_keyspace_drop_announcement() return a future<>
  test: perf: Introduce perf-tablets
  test: Introduce tablets_test
  test: lib: Do not override table id in create_table()
  utils, tablets: Introduce external_memory_usage()
  db: tablets: Add printers
  db: tablets: Add persistence layer
  dht: Use last_token_of_compaction_group() in split_token_range_msb()
  locator: Introduce tablet_metadata
  dht: Introduce first_token()
  dht: Introduce next_token()
  storage_proxy: Improve trace-level logging
  locator: token_metadata: Fix confusing comment on ring_range()
  dht, storage_proxy: Abstract token space splitting
  Revert "query_ranges_to_vnodes_generator: fix for exclusive boundaries"
  db: Exclude keyspace with per-table replication in get_non_local_strategy_keyspaces_erms()
  db: Introduce get_non_local_vnode_based_strategy_keyspaces()
  service: storage_proxy: Avoid copying keyspace name in write handler
  locator: Introduce per-table replication strategy
  treewide: Use replication_strategy_ptr as a shorter name for abstract_replication_strategy::ptr_type
  locator: Introduce effective_replication_map
  locator: Rename effective_replication_map to vnode_effective_replication_map
  locator: effective_replication_map: Abstract get_pending_endpoints()
  db: Propagate feature_service to abstract_replication_strategy::validate_options()
  db: config: Introduce experimental "TABLETS" feature
  db: Log replication strategy for debugging purposes
  db: Log full exception on error in do_parse_schema_tables()
  db: keyspace: Remove non-const replication strategy getter
  config: Reformat
2023-04-27 09:40:18 +02:00
Kefu Chai
5a11d67709 dht: token: s/tri_compare/operator<=>/
now that C++20 is able to generate the default-generated comparing
operators for us. there is no need to define them manually. and,
`std::rel_ops::*` are deprecated in C++20.

also, use `foo <=> bar` instead of `tri_compare(foo, bar)` for better
readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-04-26 14:09:57 +08:00
Tomasz Grabiec
d3c9ad4ed6 locator: Rename effective_replication_map to vnode_effective_replication_map
In preparation for introducing a more abstract
effective_replication_map which can describe replication maps which
are not based on vnodes.
2023-04-24 10:49:36 +02:00
Benny Halevy
3f1ac846d8 gms: get rid of unused failure_detector
The legacy failure_detector is now unused and can be removed.

TODO: integare direct_failure_detector with failure_detector api.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-21 09:08:27 +03:00
Benny Halevy
06a0902708 dht/range_streamer: stream_async: move ranges_to_stream to do_streaming
Currently the ranges_to_stream variable lives
on the caller state, and do_streaming() moves its
contents down to request_ranges/transfer_ranges
and then calls clear() to make it ready for reuse.

This works in principle but it makes it harder
for an occasional reader of this code to figure out
what going on.

This change transfers control of the ranges_to_stream vector
to do_streaming, by calling it with (std::exchange(do_streaming, {}))
and with that that moved vector doesn't need to be cleared by
do_streaming, and the caller is reponsible for readying
the variable for reuse in its for loop.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 17:38:34 +02:00
Benny Halevy
775c6b9697 dht/range_streamer: stream_async: do_streaming: move ranges downstream
The ranges can be moved rather than copied to both
`request_ranges` and `transfer_ranges` as they are only cleared
after this point.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:56:55 +02:00
Benny Halevy
3cd8838a09 dht/range_streamer: add_ranges: clear_gently ranges_for_keyspace
After calling get_range_fetch_map, ranges_for_keyspace
is not used anymore.
Synchronously destroying it may potentially stall in large clusters
so use utils::clear_gently to gently clear the map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:30 +02:00
Benny Halevy
a80c2d16dd dht/range_streamer: get_range_fetch_map: reduce copies
Use const& to refer to the input ranges and endpoints
rather than copying them individually along the way
more than needed to.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:30 +02:00
Benny Halevy
9d6e5d50d1 dht/range_streamer: add_ranges: move ranges down-stream
Eliminate extraneous copy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:52:27 +02:00
Benny Halevy
27b382dcce dht/range_streamer: stream_async: incrementally update _nr_ranges_remaining
Rather than calling nr_ranges_to_stream() inside `do_streaming`.
As nr_ranges_to_stream depends on the `_to_stream` that will be updated
only later on after the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Benny Halevy
c3c7efffb1 dht/range_streamer: stream_async: erase from range_vec only after do_streaming success
range_vec is used for calculating nr_ranges_to_stream.
Currently, the ranges_to_stream that were
moved out of range_vec are push back on exception,
but this isn't safe, since they may have moved already
to request_ranges or transfer_ranges.

Instead, erase the ranges we pass to do_streaming
only after it succeeds so on exception, range_vec
will not need adjusting.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-02-28 16:50:40 +02:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Benny Halevy
912b56ebcf dht: range_streamer: define logger as static
dht::logger can't be global in this case,
as it's too generic, but should be static
to range_streamer.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-01-22 22:54:26 +02:00
Asias He
9ed401c4b2 streaming: Add finished percentage metrics for node ops using streaming
We have added the finished percentage for repair based node operations.

This patch adds the finished percentage for node ops using the old
streaming.

Example output:

scylla_streaming_finished_percentage{ops="bootstrap",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="decommission",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="rebuild",shard="0"} 0.561945
scylla_streaming_finished_percentage{ops="removenode",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="repair",shard="0"} 1.000000
scylla_streaming_finished_percentage{ops="replace",shard="0"} 1.000000

In addition to the metrics, log shows the percentage is added.

[shard 0] range_streamer - Finished 2698 out of 2817 ranges for rebuild, finished percentage=0.95775646

Fixes #11600

Closes #11601
2022-09-22 14:19:34 +03:00
Pavel Emelyanov
b6fdea9a79 code: Call sort_endpoints_by_proximity() via topology
The method is about to be moved from snitch to topology, this patch
prepares the rest of the code to use the latter to call it. The
topology's method just calls snitch, but it's going to change in the
next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:14:01 +03:00
Pavel Emelyanov
4184091f1c snitch, code: Remove get_sorted_list_by_proximity()
There are two sorting methods in snitch -- one sorts the list of
addresses in place, the other one creates a sorted copy of the passed
const list (in fact -- the passed reference is not const, but it's not
modified by the method). However, both callers of the latter anyway
create their own temporary list of address, so they don't really benefit
from snitch generating another copy.

So this patch leaves just one sorting method -- the in-place one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:11:37 +03:00
Pavel Emelyanov
6dedc69608 topology: Do not add bootstrapping nodes to topology
Recent change in topology (commit 4cbe6ee9 titled
"topology: Require entry in the map for update_normal_tokens()")
made token_metadata::update_normal_tokens() require the entry presense
in the embedded topology object. Respectively, the commit in question
equipped most callers of update_normal_tokens() with preceeding
topology update call to satisfy the requirement.

However, tokens are put into token_metadata not only for normal state,
but also for bootstrapping, and one place that added bootstrapping
tokens errorneously got topology update. This is wrong -- node must
not be present in the topology until switching into normal state. As
the result several tests with bootstrapping nodes started to fail.

The fix removes topology update for bootstrapping nodes, but this
change reveals few other places that piggy-backed this mistaken
update, so noy _they_ need to update topology themselves.

tests: https://jenkins.scylladb.com/job/releng/job/Scylla-CI/2040/
       update_cluster_layout_tests.py::test_simple_add_new_node_while_schema_changes_with_repair
       update_cluster_layout_tests.py::test_simple_kill_new_node_while_bootstrapping_with_parallel_writes_in_multidc
       repair_based_node_operations_test.py::test_lcs_reshape_efficiency

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Message-Id: <20220902082753.17827-1-xemul@scylladb.com>
2022-09-04 13:53:38 +03:00
Benny Halevy
91ab8ee1c3 effective_replication_map: make get_range_addresses asynchronous
So it may yield, preenting reactor stalls as seen in #11005.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
9b2af3f542 range_streamer: add_ranges and friends: get erm as param
Rather than getting it in the callee, let the caller
(e.g.  storage_service)
hold the erm and pass it down to potentially multiple
async functions.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Pavel Emelyanov
5e2fa32c8c range_streamer: Get rack/datacenter from topology
It's needed in source filter classes so range-streamer passes the
topology reference into its methods.

Nice side effect -- snitch header goes away from range-streamer one.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-06-22 11:47:26 +03:00
Asias He
1f8b529e08 range_streamer: Disable restream logic
Consider:
- n1 and n2 in the cluster
- n3 bootstraps to join
- n1 does not hear gossip update from n3 due to network issue
- n1 removes n3 from gossip and pending node list
- stream between n1 and n3 fails
- n1 and n3 network issue is fixed
- n3 retry the stream with n1
- n3 finishes the stream with n1
- n3 advertises normal to join the cluster

The problem is that n1 will not treat n3 as the pending node so writes
will not route to n3 once n1 removes n3.

Another problem is that when n1 gets normal gossip status update from
n3. The gossip listener will fail because n1 has removed n3 so n1 could
not find the host id for n3. This will cause n1 to abort.

To fix, disable the retry logic in range_streamer so that once a stream
with existing fails the bootstrap fails.

The downside is that we lose the ability to restream caused by temporary
network issue but since we have repair based node operation. We can use
it to resume the previous failed node operations.

Fixes: #9805

Closes #9806
2022-05-24 11:24:25 +03:00
Pavel Emelyanov
469ded71a9 bootstrapper: Get 'is-replacing' via argument too
This also removes the only usage of this helper outside of the storage
service. The place that needs it is the use_strict_sources_for_ranges()
checker and all the callers of it are aware of whether it's replacing
happenning or not.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-02-07 12:41:02 +03:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
ae3a360725 database: Move database, keyspace, table classes to replica/ directory
The database, keyspace, and table classes represent the replica-only
part of the objects after which they are named. Reading from a table
doesn't give you the full data, just the replica's view, and it is not
consistent since reconciliation is applied on the coordinator.

As a first step in acknowledging this, move the related files to
a replica/ subdirectory.
2022-01-06 17:07:30 +02:00