Commit Graph

690 Commits

Author SHA1 Message Date
Benny Halevy
53fdf75cf9 repair: pass erm down to get_hosts_participating_in_repair and get_neighbors
Now that it is available in repair_info.

Fixes #11993

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:30 +02:00
Benny Halevy
b69be61f41 repair: pass effective_replication_map down to repair_info
And make sure the token_metadata ring version is same as the
reference one (from the erm on shard 0), when starting the
repair on each shard.

Refs #11993

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:29 +02:00
Benny Halevy
c47d36b53d repair: coroutinize sync_data_using_repair
Prepare for the next path that will co_await
make_global_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:04 +02:00
Benny Halevy
58b1c17f5d repair: futurize do_repair_start
Turn it into a coroutine to prepare for the next path
that will co_await make_global_effective_replication_map.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 08:07:04 +02:00
Benny Halevy
d6b2124903 repair: sync_data_using_repair: require to run on shard 0
And with that do_sync_data_using_repair can be folded into
sync_data_using_repair.

This will simplify using the effective_replication_map
throughout the operation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
0c56c75cf8 repair: require all node operations to be called on shard 0
To simplify using of the effective_replication_map / token_metadata_ptr
throught the operation.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
64b0756adc repair: repair_info: keep effective_replication_map
Sampled when repair info is constructed.
To be used throughout the repair process.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
c7d753cd44 repair: do_repair_start: use keyspace erm to get keyspace local ranges
Rather than calling db.get_keyspace_local_ranges that
looks up the keyspace and its erm again.

We want all the inforamtion derived from the erm to
be based on the same source.

The function is synchronous so this changes doesn't
fix anything, just cleans up the code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
aaf74776c2 repair: do_repair_start: use keyspace erm for get_primary_ranges
Ensure that the primary ranges are in sync with the
keyspace erm.

The function is synchronous so this change doesn't fix anything,
it just cleans up the code.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:58:21 +02:00
Benny Halevy
9200e6b005 repair: do_repair_start: use keyspace erm for get_primary_ranges_within_dc
Ensure the erm and topology are in sync.

The function is synchronous so this change doesn't fix
anything, just cleans up the code.

Fix mistake in comment while at it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:57:56 +02:00
Benny Halevy
59dc2567fd repair: do_repair_start: check_in_shutdown first
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Benny Halevy
881eb0df83 repair: get_db().local() where needed
In several places we get the sharded database using get_db()
and then we only use db.local().  Simplify the code by keeping
reference only to the local database upfront.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Benny Halevy
c22c4c8527 repair: get topology from erm/token_metdata_ptr
We want the topology to be synchronized with the respective
effective_replication_map / token_metadata.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-17 07:56:34 +02:00
Aleksandra Martyniuk
f2fe586f03 repair: check shutdown with abort source in repair module
In repair module the shutdown can be checked using abort_source.
Thus, we can get rid of shutdown flag.
2022-10-31 10:57:29 +01:00
Aleksandra Martyniuk
2d878cc9b5 repair: use generic module gate for repair module operations
Repair module uses a gate to prevent starting new tasks on shutdown.
Generic module's gate serves the same purpose, thus we can
use it also in repair specific context.
2022-10-31 10:56:36 +01:00
Aleksandra Martyniuk
4aae7e9026 repair: move tracker to repair module
Since both tracker and repair_module serve similar purpose,
it is confusing where we should seek for methods connected to them.
Thus, to make it more transparent, tracker class is deleted and all
its attributes and methods are moved to repair_module.
2022-10-31 10:55:36 +01:00
Aleksandra Martyniuk
a5c05dcb60 repair: move next_repair_command to repair_module
Number of the repair operation was counted both with
next_repair_command from tracer and sequence number
from task_manager::module.

To get rid of redundancy next_repair_command was deleted and all
methods using its value were moved to repair_module.
2022-10-31 10:54:39 +01:00
Aleksandra Martyniuk
c81260fb8b repair: generate repair id in repair module
repair_uniq_id for repair task can be generated in repair module
and accessed from the task.
2022-10-31 10:54:24 +01:00
Aleksandra Martyniuk
6432a26ccf repair: keep shard number in repair_uniq_id
Execution shard is one of the traits specific to repair tasks.
Child task should freely access shard id of its parent. Thus,
the shard number is kept in a repair_uniq_id struct.
2022-10-31 10:41:17 +01:00
Aleksandra Martyniuk
e2c7c1495d repair: change UUID to task_id
Change type of repair id from utils::UUID to task_id to distinguish
them from ids of other entities.
2022-10-31 10:07:08 +01:00
Aleksandra Martyniuk
dc80af33bc repair: add task_manager::module to repair_service
repair_service keeps a shared pointer to repair_module.
2022-10-31 10:04:50 +01:00
Aleksandra Martyniuk
576277384a repair: create repair module and task
Create repair_task_impl and repair_module inheriting from respectively
task manager task_impl and module to integrate repair operations with
task manager.
2022-10-31 10:04:48 +01:00
Benny Halevy
0ea8250e83 repair: use sharded abort_source to abort repair_info
Currently we use a single shared_ptr<abort_source>
that can't be copied across shards.

Instead, use a sharded<abort_source> in node_ops_info so that each
repair_info instance will use an (optional) abort_source*
on its own shard.

Added respective start and stop methodsm plus a local_abort_source
getter to get the shard-local abort_source (if available).

Fixes #11826

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:18:30 +03:00
Benny Halevy
88f993e5ed repair: node_ops_info: add start and stop methods
Prepare for adding a sharded<abort_source> member.

Wire start/stop in storage_service::node_ops_meta_data.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:18:30 +03:00
Benny Halevy
5c25066ea7 repair: node_ops_info: prevent accidental copy
Delete node_ops_info copy and move constructors before
we add a sharded<abort_source> member for the per-shard repairs
in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-10-27 12:14:03 +03:00
Pavel Emelyanov
3dc7c33847 repair: Remove ops_uuid
It used to be used to abort repair_info by the corresponding node-ops
uuid, but this code is no longer there, so it's good to drop the uuid as
well

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-18 20:04:23 +03:00
Pavel Emelyanov
b835c3573c repair: Remove abort_repair_node_ops() altogether
This code is dead after previous patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-18 20:04:23 +03:00
Pavel Emelyanov
8231b4ec1b repair: Subscribe on node_ops_info::as abortion
When node_ops_meta_data aborts it also kicks repair to find and abort
all relevant repair_infos. Now it can be simplified by subscribing
repair_meta on the abort source and aborting it without explicit kick

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-18 20:04:23 +03:00
Pavel Emelyanov
bf5825daac repair: Keep abort source on node_ops_info
Next patches will need to subscribe on node_ops_meta_data's abort source
inside repair code, so keep the pointer on node_ops_info too. At the
same time, the node_ops_info::abort becomes obsolete, because the same
check can be performed via the abort_source->abort_requested()

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-18 20:04:23 +03:00
Pavel Emelyanov
bbb7fca09c repair: Pass node_ops_info arg to do_sync_data_using_repair()
Next patches will need to know more than the ops_uuid. The needed info
is (well -- will be) sitting on node_ops_info, so pass it along

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-18 20:04:23 +03:00
Pavel Emelyanov
5e9c3c65b5 repair: Mark repair_info::abort() noexcept
Next patch will call it inside abort_source subscription callback which
requires the calling code to be noexcept

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-10-18 20:04:23 +03:00
Asias He
c194c811df repair: Yield in repair_service::do_decommission_removenode_with_repair
When walking through the ranges, we should yield to prevent stalls. We
do similar yield in other node operations.

Fix a stall in 5.1.dev.20220724.f46b207472a3 with build-id
d947aaccafa94647f71c1c79326eb88840c5b6d2

```
!INFO | scylla[6551]: Reactor stalled for 10 ms on shard 0. Backtrace:
0x4bbb9d2 0x4bba630 0x4bbb8e0 0x7fd365262a1f 0x2face49 0x2f5caff
0x36ca29f 0x36c89c3 0x4e3a0e1
````

Fixes #11146

Closes #11160
2022-09-28 18:21:35 +03:00
Benny Halevy
6a11c410fd repair: row_level: repair_update_system_table_handler: get get_tombstone_gc_state for db compaction_manager
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:04:16 +03:00
Benny Halevy
5dd15aa3c8 tombstone_gc: introduce tombstone_gc_state
and use it to access the repair history maps.

At this introductory patch, we use default-constructed
tombstone_gc_state to access the thread-local maps
temporarily and those use sites will be replaced
in following patches that will gradually pass
the tombstone_gc_state down from the compaction_manager
to where it's used.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:02:54 +03:00
Benny Halevy
b2b211568e repair_service: simplify update_repair_time error handling
There's no need for per-shard try/catch here.
Just catch exceptions from the overall sharded operation
to update_repair_time.

Also, update warning to indicate that only updating the repair history
time failed, not "Loading repair history".

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 22:43:08 +03:00
Benny Halevy
7d13811297 tombstone_gc: update_repair_time: get table_id rather than schema_ptr
The function doesn't need access to the whole schema.
The table_id is just enough to get by.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 22:43:08 +03:00
Pavel Emelyanov
b6fdea9a79 code: Call sort_endpoints_by_proximity() via topology
The method is about to be moved from snitch to topology, this patch
prepares the rest of the code to use the latter to call it. The
topology's method just calls snitch, but it's going to change in the
next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-09-05 15:14:01 +03:00
Pavel Emelyanov
43e83c5415 storage_service,dht,repair: Provide local dc/rack from system ks
When a node starts it adds itself to the topology. Mostly it's done in
the storage_service::join_cluster() and whoever it calls. In all those
places the dc/rack for the added node is taken from the system keyspace
(it's cache was populated with local dc/rack by the previous patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:52:16 +03:00
Pavel Emelyanov
4cbe6ee9f4 topology: Require entry in the map for update_normal_tokens()
The method in question tries to be on the safest side and adds the
enpoint for which it updates the tokens into the topology. From now on
it's up to the caller to put the endpoint into topology in advance.

So most of what this patch does is places topology.update_endpoint()
into the relevant places of the code.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:44:08 +03:00
Pavel Emelyanov
7305061674 replication_strategy: Accept dc-rack as get_pending_address_ranges argument
The method creates a copy of token metadata and pushes an endpoint (with
some tokens) into it. Next patches will require providing dc/rack info
together with the endpoint, this patch prepares for that.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-08-26 09:39:44 +03:00
Avi Kivity
be44fd63f9 Merge 'Make get_range_addresses async and hold effective_replication_map_ptr around it' from Benny Halevy
This series converts the synchronous `effective_replication_map::get_range_addresses` to async
by calling the replication strategy async entry point with the same name, as its callers are already async
or can be made so easily.

To allow it to yield and work on a coherent view of the token_metadata / topology / replication_map,
let the callers of this patch hold a effective_replication_map per keyspace and pass it down
to the (now asynchronous) functions that use it (making affected storage_service methods static where possible
if they no longer depend on the storage_service instance).

Also, the repeated calls to everywhere_replication_strategy::calculate_natural_endpoints
are optimized in this series by introducing a virtual abstract_replication_strategy::has_static_natural_endpoints predicate
that is true for local_strategy and everywhere_replication_strategy, and is false otherwise.
With it, functions repeatedly calling calculate_natural_endpoints in a loop, for every token, will call it only once since it will return the same result every time anyhow.

Refs #11005

Doesn't fix the issue as the large allocation still remains until we make change dht::token_range_vector chunked (chunked_vector cannot be used as is at the moment since we require the ability to push also to the front when unwrapping)

Closes #11009

* github.com:scylladb/scylladb:
  effective_replication_map: make get_range_addresses asynchronous
  range_streamer: add_ranges and friends: get erm as param
  storage_service: get_new_source_ranges: get erm as param
  storage_service: get_changed_ranges_for_leaving: get erm as param
  storage_service: get_ranges_for_endpoint: get erm as param
  repair: use get_non_local_strategy_keyspaces_erms
  database: add get_non_local_strategy_keyspaces_erms
  database: add get_non_local_strategy_keyspaces
  storage_service: coroutinize update_pending_ranges
  effective_replication_map: add get_replication_strategy
  effective_replication_map: get_range_addresses: use the precalculated replication_map
  abstract_replication_strategy: get_pending_address_ranges: prevent extra vector copies
  abstract_replication_strategy: reindent
  utils: sequenced_set: expose set and `contains` method
  abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set
  utils: sequenced_set: templatize VectorType
  utils: sanitize sequenced_set
  utils: sequenced_set: delete mutable get_vector method
2022-08-09 13:25:53 +03:00
Benny Halevy
cffe00cc58 repair: use get_non_local_strategy_keyspaces_erms
Use get_non_local_strategy_keyspaces_erms for getting
a coherent set of keyspace names and their respective
effective replication strategy.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
7ee6048255 database: add get_non_local_strategy_keyspaces
For node operations, we currently call get_non_system_keyspaces
but really want to work on all keyspace that have non-local
replication strategy as they are replicated on other nodes.

Reflect that in the replica::database function name.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:01 +03:00
Benny Halevy
ebe1edc091 utils: sequenced_set: expose set and contains method
And use that in sights using the endpoint set
returned by abstract_replication_strategy::calculate_natural_endpoints.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Benny Halevy
7017ad6822 abstract_replication_strategy: calculate_natural_endpoints: return endpoint_set
So it could be used also for easily searching for an endpoint.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 17:31:00 +03:00
Benny Halevy
257d74bb34 schema, everywhere: define and use table_id as a strong type
Define table_id as a distinct utils::tagged_uuid modeled after raft
tagged_id, so it can be differentiated from other uuid-class types,
in particular from table_schema_version.

Fixes #11207

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:09:41 +03:00
Benny Halevy
2948a4feb6 repair: delete unused include of utils/bit_cast.hh
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-08 08:02:27 +03:00
Asias He
d3c6e72c69 repair: Allow abort repair jobs in early stage
Consider this:

- User starts a repair job with http api
- User aborts all repair
- The repair_info object for the repair job is created
- The repair job is not aborted

In this patch, the repair uuid is recorded before repair_info object is
created, so that repair can now abort repair jobs in the early stage.

Fixes #10384

Closes #10428
2022-06-27 16:39:36 +03:00
Avi Kivity
3131cbea62 Merge 'query: allow replica to provide arbitrary continue position' from Botond Dénes
Currently, we use the last row in the query result set as the position where the query is continued from on the next page. Since only live rows make it into query result set, this mandates the query to be stopped on a live row on the replica, lest any dead rows or tombstones processed after the live rows, would have to be re-processed on the next page (and the saved reader would have to be thrown away due to position mismatch). This requirement of having to stop on a live row is problematic with datasets which have lots of dead rows or tombstones, especially if these form a prefix. In the extreme case, a query can time out before it can process a single live row and the data-set becomes effectively unreadable until compaction gets rid of the tombstones.
This series prepares the way for the solution: it allows the replica to determine what position the query should continue from on the next page. This position can be that of a dead row, if the query stopped on a dead row. For now, the replica supplies the same position that would have been obtained with looking at the last row in the result set, this series merely introduces the infrastructure for transferring a position together with the query result, and it prepares the paging logic to make use of this position. If the coordinator is not prepared for the new field, it will simply fall-back to the old way of looking at the last row in the result set. As I said for now this is still the same as the content of the new field so there is no problem in mixed clusters.

Refs: https://github.com/scylladb/scylla/issues/3672
Refs: https://github.com/scylladb/scylla/issues/7689
Refs: https://github.com/scylladb/scylla/issues/7933

Tests: manual upgrade test.
I wrote a data set with:
```
./scylla-bench -mode=write -workload=sequential -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -clustering-row-size=8096 -partition-count=1000
```
This creates large, 80MB partitions, which should fill many pages if read in full. Then I started a read workload:
```
./scylla-bench -mode=read -workload=uniform -replication-factor=3 -nodes 127.0.0.1,127.0.0.2,127.0.0.3 -clustering-row-count=10000 -duration=10m -rows-per-request=9000 -page-size=100
```
I confirmed that paging is happening as expected, then upgraded the nodes one-by-one to this PR (while the read-load was ongoing). I observed no read errors or any other errors in the logs.

Closes #10829

* github.com:scylladb/scylla:
  query: have replica provide the last position
  idl/query: add last_position to query_result
  mutlishard_mutation_query: propagate compaction state to result builder
  multishard_mutation_query: defer creating result builder until needed
  querier: use full_position instead of ad-hoc struct
  querier: rely on compactor for position tracking
  mutation_compactor: add current_full_position() convenience accessor
  mutation_compactor: s/_last_clustering_pos/_last_pos/
  mutation_compactor: add state accessor to compact_mutation
  introduce full_position
  idl: move position_in_partition into own header
  service/paging: use position_in_partition instead of clustering_key for last row
  alternator/serialization: extract value object parsing logic
  service/pagers/query_pagers.cc: fix indentation
  position_in_partition: add to_string(partition_region) and parse_partition_region()
  mutation_fragment.hh: move operator<<(partition_region) to position_in_partition.hh
2022-06-27 12:23:21 +03:00
Benny Halevy
9c231ad0ce repair_reader: construct _reader_handle before _reader
Currently, the `_reader` member is explicitly
initialized with the result of the call to `make_reader`.
And `make_reader`, as a side effect, assigns a value
to the `_reader_handle` member.

Since C++ initializes class members sequentially,
in the order they are defined, the assignment to `_reader_handle`
in `make_reader()` happens before `_reader_handle` is initialized.

This patch fixes that by changing the definition order,
and consequently, the member initialization order
in the constructor so that `_reader_handle` will be (default-)initialized
before the call to `make_reader()`, avoiding the undefined behavior.

Fixes #10882

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #10883
2022-06-26 20:17:47 +03:00