Commit Graph

147 Commits

Author SHA1 Message Date
Benny Halevy
2e24b05122 compaction: make_partition_filter: do not assert shard ownership
Now, with f1bbf705f9
(Cleanup sstables in resharding and other compaction types),
we may filter sstables as part of resharding compaction
and the assertion that all tokens are owned by the current
shard when filtering is no longer true.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-23 15:24:20 +03:00
Benny Halevy
9105f9800c sstables: add a printer for shared_sstable
Refactor the printing logic in compaction::formatted_sstables_list
out to sstables::to_string(const shared_sstable&, bool include_origin)
and operator<<(const shared_sstable) on top of it.

So that we can easily print std::vector<shared_sstable>
from compaction_manager in the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 23:31:35 +03:00
Benny Halevy
0c6ce5af74 compaction: move owned ranges filtering to base class
Move the token filtering logic down from cleanup_compaction
to regular_compaction and class compaction so it can be
reused by other compaction types.

Create a _owned_ranges_checker in class compaction
when _owned_ranges is engaged, and use it in
compaction::setup to filter partitions based on the owned ranges.

Ref scylladb/scylladb#12998

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:55:09 +03:00
Benny Halevy
09df04c919 compaction: move owned_ranges into descriptor
Move the owned_ranges_ptr, currently used only by
cleanup and upgrade compactions, to the generic
compaction descriptor so we apply cleanup in other
compaction types.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-04-10 22:52:12 +03:00
Pavel Emelyanov
8a061bd862 sstables, code: Introduce and use change_state() call
The call moves the sstable to the specified state.

The change state is translated into the storage driver state change
which is for todays filesystem storage means moving between directories.
The "normal" state maps to the base dir of the table, there's no
dedicated subdir for this state and this brings some trouble into the
play.

The thing is that in order to check if an sstable is in "normal" state
already its impossible to compare filename of its path to any
pre-defined values, as tables' basdirs are dynamic. To overcome this,
the change-state call checks that the sstable is in one of "known"
sub-states, and assumes that it's in normal state otherwise.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-02-21 17:39:34 +03:00
Kefu Chai
0cb842797a treewide: do not define/capture unused variables
these warnings are found by Clang-17 after removing
`-Wno-unused-lambda-capture` and '-Wno-unused-variable' from
the list of disabled warnings in `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-02-15 22:57:18 +02:00
Avi Kivity
69a385fd9d Introduce schema/ module
Schema related files are moved there. This excludes schema files that
also interact with mutations, because the mutation module depends on
the schema. Those files will have to go into a separate module.

Closes #12858
2023-02-15 11:01:50 +02:00
Avi Kivity
c5e4bf51bd Introduce mutation/ module
Move mutation-related files to a new mutation/ directory. The names
are kept in the global namespace to reduce churn; the names are
unambiguous in any case.

mutation_reader remains in the readers/ module.

mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this
patch.

This is a step forward towards librarization or modularization of the
source base.

Closes #12788
2023-02-14 11:19:03 +02:00
Raphael S. Carvalho
5a784c3c6d treewide: Use new sstable_set::size() wherever possible
That's the preferred alternative because it's zero copy.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-03 10:38:04 -03:00
Benny Halevy
82011fc489 dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const
Its _it member keeps state about the current range.
Although it's modified by the method, this is an implementation
detail that irrelevant to the caller, hence mark the
belongs_to_current_node method as const (and noexcept while
at it).

This allows the caller, cleanup_compaction, to use it from
inside a const method, without having to mark
its respective member as mutable too.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #12634
2023-01-25 14:52:21 +02:00
Tomasz Grabiec
23e4c83155 position_in_partition: Make after_key() work with non-full keys
This fixes a long standing bug related to handling of non-full
clustering keys, issue #1446.

after_key() was creating a position which is after all keys prefixed
by a non-full key, rather than a position which is right after that
key.

This will issue will be caught by cql_query_test::test_compact_storage
in debug mode when mutation_partition_v2 merging starts inserting
sentinels at position after_key() on preemption.

It probably already causes problems for such keys.
2022-12-14 14:47:33 +01:00
Benny Halevy
8b81635d95 compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation
The algorithm is generic and can be used elsewhere.

Add a unit test for the function before it gets
optimized in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-21 15:48:26 +02:00
Avi Kivity
994603171b Merge 'Add validator to the mutation compactor' from Botond Dénes
Fragment reordering and fragment dropping bugs have been plaguing us since forever. To fight them we added a validator to the sstable write path to prevent really messed up sstables from being written.
This series adds validation to the mutation compactor. This will cover reads and compaction among others, hopefully ridding us of such bugs on the read path too.
This series fixes some benign looking issues found by unit tests after the validator was added -- although how benign a producer emitting two partition-ends depends entirely on how the consumer reacts to it, so no such bug is actually benign.

Fixes: https://github.com/scylladb/scylladb/issues/11174

Closes #11532

* github.com:scylladb/scylladb:
  mutation_compactor: add validator
  mutation_fragment_stream_validator: add a 'none' validation level
  test/boost/mutation_query_test: test_partition_limit: sort input data
  querier: consume_page(): use partition_start as the sentinel value
  treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{}
  treewide: use ::for_partition_start() instead of ::partition_start_tag_t{}
  position_in_partition: add for_partition_{start,end}()
2022-11-20 20:33:26 +02:00
Raphael S. Carvalho
8e1e30842d compaction: Use table_state's backlog tracker in compaction_read_monitor_generator
A step closer towards a separate backlog tracker for each compaction group.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-11-11 09:17:37 -03:00
Botond Dénes
f1a039fc2b treewide: use ::for_partition_start() instead of ::partition_start_tag_t{}
We just added a convenience static factory method for partition start,
change the present users of the clunky constructor+tag to use it
instead.
2022-11-11 09:58:18 +02:00
Benny Halevy
fd3e66b0cc compaction: extract incremental_owned_ranges_checker out to dht
It is currently used by cleanup_compaction partition filter.
Factor it out so it can be used to filter staging sstables in
the next patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-11-09 07:32:56 +02:00
Taras Borodin
c155ae1182 add utf8:validate to operator<< partition_key with_schema. 2022-09-22 16:42:31 +03:00
Raphael S. Carvalho
e2ccafbe38 compaction: Add support to split large partitions
Adds support for splitting large partitions during compaction.

Large partitions introduce many problems, like memory overhead and
breaks incremental compaction promise. We want to split large
partitions across fixed-size fragments. We'll allow a partition
to exceed size limit by 10%, as we don't want to unnecessarily split
partitions that just crossed the limit boundary.

To avoid having to open a minimal of 2 fragments in a read, partition
tombstone will be replicated to every fragment storing the
partition.

The splitting isn't enabled by default, and can be used by
strategies that are run aware like ICS. LCS still cannot support
it as it's still using physical level metadata, not run id.

An incremental reader for sstable runs will follow soon.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-09-14 13:23:16 -03:00
Benny Halevy
d86810d22c mutation_partition: compact_for_compaction_v2: get tombstone_gc_state
To be passed down to compact_mutation_state in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-07 07:43:15 +03:00
Benny Halevy
7e4612d3aa mutation_readers: pass tombstone_gc_state to compating_reader
To be passed further done to `compact_mutation_state` in
a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-07 07:43:14 +03:00
Benny Halevy
572d534d0d sstables: get_gc_before_*: get tombstone_gc_state from caller
Pass the tombstone_gc_state from the compaction_strategy
to sstables get_gc_before_* functions using the table state
to get to the tombstone_gc_state.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-09-06 23:05:39 +03:00
Botond Dénes
b89b84ad3c compaction: scrub/abort: be more verbose
Currently abort-mode scrub exits with a message which basically says
"some problem was found", with no details on what problem it found. Add
a detailed error report on the found problem before aborting the scrub.

Closes #11418
2022-09-06 11:42:34 +03:00
Benny Halevy
7747b8fa33 sstables: define run_identifier as a strong tagged_uuid type
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #11321
2022-08-18 19:03:10 +03:00
Benny Halevy
e1fe598760 compaction: cleanup, upgrade: use a lw_shared_ptr for owned token ranges
Currently they are copied for the get_sstables function
so this change reduces copies.

Also, it will allow further decoupling of compaction_manager
from replica::database, by letting the caller of
perform_cleanup and perform_sstable_upgrade get the
owned token ranges from db and pass it to the perform_*
functions in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-08-02 07:57:41 +03:00
Aleksandra Martyniuk
6ea5bc96d7 scrub compaction: return status indicating aborted operations
over the rest api

Performing compaction scrub user did not know whether an operation
was aborted.

If compaction scrub is aborted, return status the user gets over
rest api is set to 1.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
3a805a9d9b compaction: extract statistics in compaction_result
Statistics from compaction_result are extracted to new struct
compaction_stats and stored as a field of compaction_result.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
ab85dab05d scrub compaction: count validation errors
The number of validation errors encountered during scrub compaction
is counted.
2022-07-29 09:35:20 +02:00
Raphael S. Carvalho
f52ad722f3 compaction_manager: rename table_state's get_sstable_set to main_sstable_set
With compaction_manager switching to table_state, we'll need to
introduce a method in table_state to return maintenance set.
So better to have a descriptive name for main set.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-07-13 11:12:33 -03:00
Raphael S. Carvalho
aa667e590e sstable_set: Fix partitioned_sstable_set constructor
The sstable set param isn't being used anywhere, and it's also buggy
as sstable run list isn't being updated accordingly. so it could happen
that set contains sstables but run list is empty, introducing
inconsistency.

we're fortunate that the bug wasn't activated as it would've been
a hard one to catch. found this while auditting the code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220617203438.74336-1-raphaelsc@scylladb.com>
2022-06-21 11:58:13 +03:00
Avi Kivity
8edb79ea80 Merge 'Reduce compaction serialization' from Mikołaj Sielużycki
update_history can take a long time compared to compaction, as a call
issued on shard S1 can be handled on shard S2. If the other shard is
under heavy load, we may unnecessarily block kicking off a new
compaction. Normally it isn't a problem, as compactions aren't super
frequent, but there were edge cases where the described behaviour caused
compaction to fail to keep up with excessive flushing, leading to too
many sstables on disk and OOM during a read.

There is no need to wait with next compaction until history is updated,
so release the weight earlier to remove unnecessary serialization.

Changelog:
v3:
- explicitly call deregister instead of moving the weight RAII object to release weight
- mark compaction as finished when sstables are compacted, without waiting for history to update
v2:
- Split the patches differently for easier review
- Rebased agains newer master, which contains fixes that failed the debug version of the test
- Removed the test, as it will be provided by [PR#10717](https://github.com/scylladb/scylla/pull/10717)

Closes #10507

* github.com:scylladb/scylla:
  compaction: Release compaction weight before updating history.
  compaction: Inline compact_sstables_and_update_history call.
  compaction: Extract compact_sstables function
  compaction: Rename compact_sstables to compact_sstables_and_update_history
  compaction: Extract update_history function
  compaction: Extract should_update_history function.
  compaction: Fetch start_size from compaction_result
  compaction: Add tracking start_size in compaction_result.
2022-06-13 16:04:20 +03:00
Benny Halevy
593a192664 compaction: setup: reserve space for _input_sstable_generations
We know in advance the maximum number of
sstable generations to track, so reserve space for it
to prevent vector reallocation for large number of sstables.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-08 10:18:24 +03:00
Benny Halevy
4fac6e0b27 compaction: coroutinize setup and maybe yield
To prevent reactor stalls with large number of sstables.

Fixes #10738

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-06-08 10:12:41 +03:00
Mikołaj Sielużycki
2edf137f61 compaction: Add tracking start_size in compaction_result. 2022-06-07 12:55:28 +02:00
Raphael S. Carvalho
2a7eb16c02 sstables: Use generation_type for compaction ancestors
Let's also use generation_type for compaction ancestors, so once we
support something other than integer for SSTable generation, we
won't have discrepancy about what the generation type is.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-05-31 15:28:02 -03:00
Raphael S. Carvalho
0307cdd2bf compaction: Fix incremental compaction logging
The messages only dumps the last sealed fragment, but it should dump
all the output fragments replacing the exhausted input ones.

Let's print origin of output fragments, so we can differ between
files with compaction and garbage-collection origin.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220524232232.119520-1-raphaelsc@scylladb.com>
2022-05-30 15:58:14 +03:00
Piotr Sarna
209c2f5d99 sstables: define generation_type for sstables
No functional changes intended - this series is quite verbose,
but after it's in, it should be considerably easier to change
the type of SSTable generations to something else - e.g. a string
or timeUUID.

Closes #10533
2022-05-11 14:46:30 +02:00
Benny Halevy
78d6f6a519 compaction: sanitize headers from flat_mutation_reader v1
flat_mutation_reader make_scrubbing_reader no longer exists
and there is no need to include flat_mutation_reader.hh
nor forward declare the class.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-28 17:23:04 +03:00
Benny Halevy
f634b6d3be compaction: cleanup_compaction: make_partition_filter: return flat_mutation_reader_v2::filter
We filter only on the parittion key, so it doesn't matter,
but we want to get rid of flat_mutation_reader v1.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-28 17:16:47 +03:00
Raphael S. Carvalho
f05ae92849 compaction: move compaction::enable_garbage_collected_sstable_writer() into protected namespace
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220411181322.192830-2-raphaelsc@scylladb.com>
2022-04-12 11:21:18 +03:00
Botond Dénes
b029bd3db7 tree: remove mutation_reader.hh include
In most files it was unused. We should move these to the patch which
moved out the last interesting reader from mutation_reader.hh (and added
the corresponding new header include) but its probably not worth the
effort.
Some other files still relied on mutation_reader.hh to provide reader
concurrency semaphore and some other misc reader related definitions.
2022-03-30 15:42:51 +03:00
Botond Dénes
b7954138ac mutation_reader: move compacting reader into readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
f24f2f726a mutation_reader: move filtering reader into readers/ 2022-03-30 15:42:51 +03:00
Botond Dénes
2ae0e0093e compaction/compaction: abort scrub when attempting to rectify stream with active tombstone 2022-03-29 13:19:05 +03:00
Raphael S. Carvalho
25be958ab9 compaction: Introduce compaction_descriptor::sstables_size
This method can be reused in manager, and will be useful for upcoming
cleanup task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:55:10 -03:00
Raphael S. Carvalho
67a7b7a3f4 compaction: rename interrupt() to a descriptive name
interrupt() makes it sound like it's interrupting the compaction, but it's
actually called *on* interrupt, to handle the interrupt scenario.
Let's rename it to on_interrupt().

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220311000128.189840-1-raphaelsc@scylladb.com>
2022-03-11 10:16:34 +02:00
Botond Dénes
7e0b51ff23 Merge 'Overhaul compaction_manager::task' from Benny Halevy
The series overhauls the compaction_manager::task design and implementation
by properly layering the functionality between the compaction_manager
that deals with generic task execution, and the per-task business logic that is defined
in a set of classes derived from the generic task class.

While at it, the series introduces `task::state` and a set of helper functions to manage it
to prevent leaks in the statistics, fixing #9974.

Two more stats counter were exposed: `completed_tasks` and a new `postponed_tasks`.

Test: sstable_compaction_test
Dtest: compaction_test.py compaction_additional_test.py

Fixes #9974

Closes #10122

* github.com:scylladb/scylla:
  compaction_manager: use coroutine::switch_to
  compaction_manager::task: drop _compaction_running
  compaction_manager: move per-type logic to derived task
  compaction_manager: task: add state enum
  compaction_manager: task: add maybe_retry
  compaction_manager: reevaluate_postponed_compactions: mark as noexcept
  compaction_manager: define derived task types
  compaction_manager: register_metrics: expose postponed_compactions
  compaction_manager: register_metrics: expose failed_compactions
  compaction_manager: register_metrics: expose _stats.completed_tasks
  compaction: add documentation for compaction_type to string conversions
  compaction: expose to_string(compaction_type)
  compaction_manager: task: standardize task description in log messages
  compaction_manager: refactor can_proceed
  compaction_manager: pass compaction_manager& to task ctor
  compaction_manager: use shared_ptr<task> rather than lw_shared_ptr
  compaction_manager: rewrite_sstables: acquire _maintenance_ops_sem once
  compaction_manager: use compaction_state::lock only to synchronize major and regular compaction
2022-03-10 13:33:56 +02:00
Benny Halevy
28a74a2e90 compaction: expose to_string(compaction_type)
To be used in the next patch to generate
a string dscription from the compaction_type.

In theory, we could use compaction_name()
btu the latter returns the compaction type
in all-upper case and that is very different from
what we print to the log today.  The all-upper
strings are used for the api layer, e.g. to
stop tasks of a particular compaction type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Botond Dénes
105bf8888a sstables: convert mx writer to v2
The sstables::sstable class has two methods for writing sstables:
1) sstable_writer get_writer(...);
2) future<> write_components(flat_mutation_reader, ...);

(1) directly exposes the writer type, so we have to update all users of
it (there is not that many) in this same patch. We defer updating
users of (2) to a follow-up commits.
2022-03-10 07:03:49 +02:00
Botond Dénes
7a37e30310 mutation_reader: convert compacting reader v2
Its input was already a v2 reader, now itself is also a v2 reader.
With this commit, compaction.cc is finally v2 all-the-way.
2022-03-10 07:03:46 +02:00
Benny Halevy
c7de2e0682 compaction: log info message when interrupting compaction
Info messages are logged when compaction jobs start and finish
but there is no message logged when the job is interrupted, e.g.
when stopped by the compaction_manager.

Refs scylladb/scylla-dtest#2468

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-07 11:43:58 +02:00