Commit Graph

626 Commits

Author SHA1 Message Date
Pavel Solodovnikov
47834313d8 repair: avoid infinite recursion on stringifying unknown node_ops_cmd
Cast the cmd representation to underlying type and
avoid infinite recursion in the `operator <<(node_ops_cmd)`.

Tests: unit(dev)

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
Message-Id: <20220430180701.1012190-1-pa.solodovnikov@scylladb.com>
2022-05-02 10:08:34 +03:00
Botond Dénes
f527956cdb readers: remove v1 empty_reader
The only user is row level repair: it is replaced with
downgrade_to_v1(make_empty_flat_reader_v2()). The row level reader has
lots of downgrade_to_v1() calls, we will deal with these later all at
once.
Another use is the empty mutation source, this is trivially converted to
use the v2 variant.
2022-04-28 14:12:24 +03:00
Botond Dénes
b061acb668 Merge 'Remove queue reader v1' from Mikołaj Sielużycki
The patchset embeds the mutation_fragment upgrading logic from v1 to v2 into the mutation_fragment_queue. This way the mutation fragments coming to the mutation_fragment_queue can be v1, but the underlying query_reader receives mutation_fragment_v2, eliminating the last usage of query_reader (v1). The last commit removes query_reader, query_reader_handle and associated factory functions.

tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)

Closes #10371

* github.com:scylladb/scylla:
  readers: Remove queue_reader v1 and associated code.
  repair: Make mutation_fragment_queue internally upgrade fragments to v2
  repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
2022-04-21 12:34:48 +03:00
Mikołaj Sielużycki
339b60e5b0 repair: Make mutation_fragment_queue internally upgrade fragments to v2 2022-04-20 17:55:58 +02:00
Mikołaj Sielużycki
eeb2b458de repair: Make mutation_fragment_queue::impl a seastar::shared_ptr
It makes mutation_fragment_queue copyable and makes the pointer to
pending mutation fragments in next commit stable. This allows moving the
mutation_fragment_queue without breaking the underlying
upgrading_consumer.
2022-04-20 17:51:58 +02:00
Avi Kivity
5da586271f repair: explicityl ignore tombstone gc update response
The response struct is empty and we have nothing to do with it. Cast
it to void to avoid a gcc warning.
2022-04-18 12:27:18 +03:00
Botond Dénes
75786c42cb Merge 'Add repair unit tests/v1' from Mikołaj Sielużycki
This patch series splits up parts of repair pipeline to allow unit testing
various bits of code without having to run full dtest suite. The reason why
repair pipeline has no unit tests is that by definition repair requires multiple
nodes, while unit test environment works only for a single node.

However, it is possible to explicitly define interfaces between various parts of the
pipeline, inject dependencies and test them individually. This patch series is focused
on taking repair_rows_on_wire (frozen mutation representation of changes coming from
another node) and flushing them to an sstable.

The commits are split into the following parts:
- pulling out classes to separate headers so that they can be included (potentially indirectly) from the test,
- pulling out repair_meta::to_repair_rows_list and part of repair_meta::flush_rows_in_working_row_buf so that they can be tested,
- refactoring repair_writer so that the actual writing logic can be injected as dependency,
- creating the unit test.

tests: unit(dev), dtest(incremental_repair_test, read_repair_test, repair_additional_test, repair_test)

Closes #10345

* github.com:scylladb/scylla:
  repair: Add unit test for flushing repair_rows_on_wire to disk.
  repair: Extract mutation_fragment_queue and repair_writer::impl interfaces.
  repair: Make parts of repair_writer interface private.
  repair: Rename inputs to flush_rows.
  repair: Make repair_meta::flush_rows a free function.
  repair: Split flush_rows_in_working_row_buf to two functions and make one static.
  repair: Rename inputs to to_repair_rows_list.
  repair: Make to_repair_rows_list a free function.
  repair: Make repair_meta::to_repair_rows_list a static function
  repair: Fix indentation in repair_writer.
  repair: Move repair_writer to separate header.
  repair: Move repair_row to a separate header.
  repair: Move repair_sync_boundary to a separate header.
  repair: Move decorated_key_with_hash to separate header.
  repair: Move row_repair hashing logic to separate class and file.
2022-04-14 18:17:03 +03:00
Pavel Emelyanov
05eb9c9416 repair, system_keyspace: Query repair_history with a helper
Querying the table is now done with the help of qctx directly. This
patch replaces it with a querying helper that calls the consumer
function with the entry struct as the argument.

After this change repair code can stop including query_context and
mess with untyped_result_set.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-12 14:04:21 +03:00
Pavel Emelyanov
59f4aa0934 repair: Update loader code to use system_keyspace entry
Patch the history entry loader to use the recently introduced
history entry. This is just to reduce the churn in the next patch

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-12 13:59:55 +03:00
Pavel Emelyanov
9940016e05 repair, system_keyspace: Update repair_history with a helper
Current code works directly on the qctx which is not nice. Instead,
make it use the system keyspace reference. To make it work, the patch
adds a helper method and introduces a helper struct for the table
entry. This struct will also be used to query the table (next patch).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-12 13:57:57 +03:00
Pavel Emelyanov
e501ebd6c2 repair: Keep system keyspace reference
Repair updates (and queries on start) the system.repair_history table
and thus depends on the system_keyspace object

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2022-04-12 13:57:08 +03:00
Mikołaj Sielużycki
39205917a8 repair: Extract mutation_fragment_queue and repair_writer::impl interfaces. 2022-04-12 09:22:03 +02:00
Mikołaj Sielużycki
a52126d861 repair: Make parts of repair_writer interface private. 2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
826e0e9d8a repair: Rename inputs to flush_rows. 2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
4dd32064a3 repair: Make repair_meta::flush_rows a free function. 2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
046e8c31db repair: Split flush_rows_in_working_row_buf to two functions and make one static.
It allows pulling out the logic of writing internal representation
of repair mutations to disk. This in turn is needed to unit test
this functionality without spinning up clusters, which significantly
improves developer iteration time.
2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
ca53a7fcc9 repair: Rename inputs to to_repair_rows_list. 2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
c7a7680c7d repair: Make to_repair_rows_list a free function. 2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
69fc74ffbe repair: Make repair_meta::to_repair_rows_list a static function
It allows pulling out the logic of convering on-the-wire representation
of repair mutations to an internal representation used later for
flushing repair mutations to disk. This in turn is needed to unit test
the functionality without spinning up clusters, which significantly
improves developer iteration time.
2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
4ba48e5739 repair: Fix indentation in repair_writer. 2022-04-12 09:20:14 +02:00
Mikołaj Sielużycki
3ff738db6b repair: Move repair_writer to separate header. 2022-04-12 09:20:03 +02:00
Mikołaj Sielużycki
04986e8c8e repair: Move repair_row to a separate header. 2022-04-12 08:50:34 +02:00
Mikołaj Sielużycki
7b0cbdeac5 repair: Move repair_sync_boundary to a separate header. 2022-04-12 08:50:34 +02:00
Mikołaj Sielużycki
f9c75952ea repair: Move decorated_key_with_hash to separate header. 2022-04-12 08:50:34 +02:00
Mikołaj Sielużycki
0fa703de3e repair: Move row_repair hashing logic to separate class and file. 2022-04-12 08:50:34 +02:00
Botond Dénes
11c378a175 mutation_reader: move queue reader to readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
d0ea895671 readers: move multishard reader & friends to reader/multishard.cc
Since the multishard reader family weighs more than 1K SLOC, it gets
its own .cc file.
2022-03-30 15:42:51 +03:00
Mikołaj Sielużycki
1d84a254c0 flat_mutation_reader: Split readers by file and remove unnecessary includes.
The flat_mutation_reader files were conflated and contained multiple
readers, which were not strictly necessary. Splitting optimizes both
iterative compilation times, as touching rarely used readers doesn't
recompile large chunks of codebase. Total compilation times are also
improved, as the size of flat_mutation_reader.hh and
flat_mutation_reader_v2.hh have been reduced and those files are
included by many file in the codebase.

With changes

real	29m14.051s
user	168m39.071s
sys	5m13.443s

Without changes

real	30m36.203s
user	175m43.354s
sys	5m26.376s

Closes #10194
2022-03-14 13:20:25 +02:00
Botond Dénes
ad1b157452 streaming/consumer: convert to v2
At least on the API level, internally there are still conversions, but
these are going to be sorted out in the next patches too.
2022-03-02 09:55:09 +02:00
Asias He
ec59f7a079 repair: Do not flush hints and batchlog if tombstone_gc_mode is not repair
The flush of hints and batchlog are needed only for the table with
tombstone_gc_mode set to repair mode. We should skip the flush if the
tombstone_gc_mode is not repair mode.

Fixes #10004

Closes #10124
2022-02-25 07:26:11 +02:00
Asias He
680195564d repair: Unify repair uuid report in the log
More and more places are using the repair[uuid]: format for logging
repair jobs with the uuid. Convert more places to use the new format to
unify the log format.

This makes it easier to grep a specific repair job in the log.

Closes #10125
2022-02-23 09:13:12 +02:00
Botond Dénes
4aa9b90ba9 repair/row_level: use evictable reader v2 2022-02-21 12:29:24 +02:00
Benny Halevy
f8db9e1bd8 repair_service: deglobalize get_next_repair_meta_id
Rather than using a static unit32_t next_id,
move the next_id variable into repair_service shard 0
and manage it there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 11:34:21 +02:00
Benny Halevy
90ba9013be repair_service: deglobalize repair_meta_map
Move the static repair_meta_map into the repair_service
and expose it from there.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 11:01:47 +02:00
Benny Halevy
e6b6fdc9a0 repair_service: pass reference to service to row_level_repair_gossip_helper
Note that we can't pass the repair_service container()
from its ctor since it's not populated until all shards start.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 11:00:26 +02:00
Benny Halevy
3008ecfd4e repair_meta: define repair_meta_ptr
Keep repair_meta in repair_meta_map as shared_ptr<repair_meta>
rather than lw_shared_ptr<repair_meta> so it can be defined
in the header file and use only forward-declared
class repair_meta.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 09:18:14 +02:00
Benny Halevy
fdc0a9602c repair_meta: move static repair_meta map functions out of line
Define the static {get,insert,remove}_repair_meta functions out
of the repair_meta class definition, on the way of moving them,
along with the repair_meta_map itself, to repair_service.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 09:15:09 +02:00
Benny Halevy
b5427cc6d1 repair_meta: make get_set_diff a free function
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 09:13:09 +02:00
Benny Halevy
224e7497e0 repair: repair_meta: no need to keep sharded<netw::messaging_service>
All repair_meta needs is the local instance.
Need be, it's a peering service so the container()
can be used if needed.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 09:13:09 +02:00
Benny Halevy
c4ac92b2b7 repair: repair_meta: derive subordinate services from repair_service
Use repair_service as the authoritative source for
the database, messaging_service, system_distributed_keyspace,
and view_update_generator, similar to repair_info.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 09:12:53 +02:00
Benny Halevy
a71d6333e4 repair: pass repair_service to repair_meta
Prepare for old the repair_meta_map in repair_service.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-01-27 09:12:51 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Avi Kivity
52b7778ae6 Merge "repair: make sure there is one permit per repair with count res" from Botond
"
Repair obtains a permit for each repair-meta instance it creates. This
permit is supposed to track all resources consumed by that repair as
well as ensure concurrency limit is respected. However when the
non-local reader path is used (shard config of master != shard config of
follower), a second permit will be obtained -- for the shard reader of
the multishard reader. This creates a situation where the repair-meta's
permit can block the shard permit, creating a deadlock situation.
This patch solves this by dropping the count resource on the
repair-meta's permit when a non-local reader path is executed -- that is
a multishard reader is created.

Fixes: #9751
"

* 'repair-double-permit-block/v4' of https://github.com/denesb/scylla:
  repair: make sure there is one permit per repair with count res
  reader_permit: add release_base_resource()
2022-01-16 18:22:29 +02:00
Botond Dénes
b6828e899a Merge "Postpone reshape of SSTables created by repair" from Raphael
"
SSTables created by repair will potentially not conform to the
compaction strategy
layout goal. If node shuts down before off-strategy has a chance to
reshape those files, node will be forced to reshape them on restart.
That
causes unexpected downtime. Turns out we can skip reshape of those files
on boot, and allow them to be reshaped after node becomes online, as if
the node never went down. Those files will go through same procedure as
files created by repair-based ops. They will be placed in maintenance
set,
and be reshaped iteratively until ready for integration into the main
set.
"

Fixes #9895.

tests: UNIT(dev).

* 'postpone_reshape_on_repair_originated_files' of https://github.com/raphaelsc/scylla:
  distributed_loader: postpone reshape of repair-originated sstables
  sstables: Introduce filter for sstable_directory::reshape
  table: add fast path when offstrategy is not needed
  sstables: add constant for repair origin
2022-01-14 14:05:09 +02:00
Avi Kivity
63d254a8d2 Merge 'gms, service: futurize and coroutinize gossiper-related code' from Pavel Solodovnikov
This series greatly reduces gossipers' dependence on `seastar::async` (yet, not completely).

`i_endpoint_state_change_subscriber` callbacks are converted to return futures (again, to get rid of `seastar::async` dependency), all users are adjusted appropriately (e.g. `storage_service`, `cdc::generation_service`, `streaming::stream_manager`, `view_update_backlog_broker` and `migration_manager`).
This includes futurizing and coroutinizing the whole function call chain up to the `i_endpoint_state_change_subscriber` callback functions.

To aid the conversion process, a non-`seastar::async` dependent variant of `utils::atomic_vector::for_each` is introduced (`for_each_futurized`). A different name is used to clearly distinguish converted and non-converted code, so that the last step (remove `seastar::async()` wrappers around callback-calling code in gossiper) is easier. This is left for a follow-up series, though.

Tests: unit(dev)

Closes #9844

* github.com:scylladb/scylla:
  service: storage_service: coroutinize `set_gossip_tokens`
  service: storage_service: coroutinize `leave_ring`
  service: storage_service: coroutinize `handle_state_left`
  service: storage_service: coroutinize `handle_state_leaving`
  service: storage_service: coroutinize `handle_state_removing`
  service: storage_service: coroutinize `do_drain`
  service: storage_service: coroutinize `shutdown_protocol_servers`
  service: storage_service: coroutinize `excise`
  service: storage_service: coroutinize `remove_endpoint`
  service: storage_service: coroutinize `handle_state_replacing`
  service: storage_service: coroutinize `handle_state_normal`
  service: storage_service: coroutinize `update_peer_info`
  service: storage_service: coroutinize `do_update_system_peers_table`
  service: storage_service: coroutinize `update_table`
  service: storage_service: coroutinize `handle_state_bootstrap`
  service: storage_service: futurize `notify_*` functions
  service: storage_service: coroutinize `handle_state_replacing_update_pending_ranges`
  repair: row_level_repair_gossip_helper: coroutinize `remove_row_level_repair`
  locator: reconnectable_snitch_helper: coroutinize `reconnect`
  gms: i_endpoint_state_change_subscriber: make callbacks to return futures
  utils: atomic_vector: introduce future-returning `for_each` function
  utils: atomic_vector: rename `for_each` to `thread_for_each`
  gms: gossiper: coroutinize `start_gossiping`
  gms: gossiper: coroutinize `force_remove_endpoint`
  gms: gossiper: coroutinize `do_status_check`
  gms: gossiper: coroutinize `remove_endpoint`
2022-01-13 23:09:02 +02:00
Raphael S. Carvalho
34be8842ad sstables: add constant for repair origin
Make comparisons easy and avoid duplication

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-01-12 11:13:58 -03:00
Botond Dénes
97d74de8fc Merge "flat_mutation_reader: clone evictable_reader & convert some others" from Michael Livshin
"
The first patch introduces evictable_reader_v2, and the second one
further simplifies it.  We clone instead of converting because there
is at least one downstream (by way of multishard_combining_reader) use
that is not itself straightforward to convert at the moment
(multishard_mutation_query), and because evictable_reader instances
cannot be {up,down}graded (since users also access the undelying
buffers).  This also means that shard_reader, reader_lifecycle_policy
and multishard_combining_reader have to be cloned.
"

* tag 'clone-evictable-reader-to-v2/v3' of https://github.com/cmm/scylla:
  convert make_multishard_streaming_reader() to flat_mutation_reader_v2
  convert table::make_streaming_reader() to flat_mutation_reader_v2
  convert make_flat_multi_range_reader() to flat_mutation_reader_v2
  view_update_generator: remove unneeded call to downgrade_to_v1()
  introduce multishard_combining_reader_v2
  introduce shard_reader_v2
  introduce the reader_lifecycle_policy_v2 abstract base
  evictable_reader_v2: further code simplifications
  introduce evictable_reader_v2 & friends
2022-01-11 17:01:08 +02:00
Michael Livshin
1f27e12dc6 convert make_multishard_streaming_reader() to flat_mutation_reader_v2
All changes are mechanical.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-01-11 10:49:26 +02:00
Pavel Solodovnikov
4fcf31f11c repair: row_level_repair_gossip_helper: coroutinize remove_row_level_repair
Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-01-11 09:29:12 +03:00
Pavel Solodovnikov
5dcfb94d5a gms: i_endpoint_state_change_subscriber: make callbacks to return futures
Coroutinize a few simple callbacks in the process.

Signed-off-by: Pavel Solodovnikov <pa.solodovnikov@scylladb.com>
2022-01-11 09:29:12 +03:00