Commit Graph

50 Commits

Author SHA1 Message Date
Yaniv Kaul
c658bdb150 Typos: fix typos in comments
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2023-12-02 22:37:22 +02:00
Aleksandra Martyniuk
0c6a3f568a compaction: delete default_compaction_progress_monitor
default_compaction_progress_monitor returns a reference to a static
object. So, it should be read-only, but its users need to modify it.

Delete default_compaction_progress_monitor and use one's own
compaction_progress_monitor instance where it's needed.

Closes scylladb/scylladb#15800
2023-10-23 16:03:34 +03:00
Aleksandra Martyniuk
39e96c6521 compaction: find total compaction size 2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
7b3e0ab1f2 compaction: sstables: monitor validation scrub with compaction_read_generator
Validation scrub bypasses the usual compaction machinery, though it
still needs to be tracked with compaction_progress_monitor so that
we could reach its progress from compaction task executor.

Track sstable scrub in validate mode with read monitors.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
3553556708 compaction: keep compaction_progress_monitor in compaction_task_executor
Keep compaction_progress_monitor in compaction_task_executor and pass a reference
to it further, so that the compaction progress could be retrieved out of it.
2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk
22bf3c03df compaction: add compaction_progress_monitor
In the following patches compaction_read_monitor_generator will be used
to find progress of compaction_task_executor's. To avoid unnecessary life
prolongation and exposing internals of the class out of compaction.cc,
compaction_progress_monitor is created.

Compaction class keeps a reference to the compaction_progress_monitor.
Inheriting classes which actually use compaction_read_monitor_generator,
need to set it with set_generator method.
2023-10-12 17:03:46 +02:00
Raphael S. Carvalho
83c70ac04f utils: Extract pretty printers into a header
Can be easily reused elsewhere.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-06-26 21:58:20 -03:00
Pavel Emelyanov
66e43912d6 code: Switch to seastar API level 7
In that level no io_priority_class-es exist. Instead, all the IO happens
in the context of current sched-group. File API no longer accepts prio
class argument (and makes io_intent arg mandatory to impls).

So the change consists of
- removing all usage of io_priority_class
- patching file_impl's inheritants to updated API
- priority manager goes away altogether
- IO bandwidth update is performed on respective sched group
- tune-up scylla-gdb.py io_queues command

The first change is huge and was made semi-autimatically by:
- grep io_priority_class | default_priority_class
- remove all calls, found methods' args and class' fields

Patching file_impl-s is smaller, but also mechanical:
- replace io_priority_class& argument with io_intent* one
- pass intent to lower file (if applicatble)

Dropping the priority manager is:
- git-rm .cc and .hh
- sed out all the #include-s
- fix configure.py and cmakefile

The scylla-gdb.py update is a bit hairry -- it needs to use task queues
list for IO classes names and shares, but to detect it should it checks
for the "commitlog" group is present.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13963
2023-06-06 13:29:16 +03:00
Raphael S. Carvalho
38b226f997 Resurrect optimization to avoid bloom filter checks during compaction
Commit 8c4b5e4283 introduced an optimization which only
calculates max purgeable timestamp when a tombstone satisfy the
grace period.

Commit 'repair: Get rid of the gc_grace_seconds' inverted the order,
probably under the assumption that getting grace period can be
more expensive than calculating max purgeable, as repair-mode GC
will look up into history data in order to calculate gc_before.

This caused a significant regression on tombstone heavy compactions,
where most of tombstones are still newer than grace period.
A compaction which used to take 5s, now takes 35s. 7x slower.

The reason is simple, now calculation of max purgeable happens
for every single tombstone (once for each key), even the ones that
cannot be GC'ed yet. And each calculation has to iterate through
(i.e. check the bloom filter of) every single sstable that doesn't
participate in compaction.

Flame graph makes it very clear that bloom filter is a heavy path
without the optimization:
    45.64%    45.64%  sstable_compact  sstable_compaction_test_g
        [.] utils::filter::bloom_filter::is_present

With its resurrection, the problem is gone.

This scenario can easily happen, e.g. after a deletion burst, and
tombstones becoming only GC'able after they reach upper tiers in
the LSM tree.

Before this patch, a compaction can be estimated to have this # of
filter checks:
(# of keys containing *any* tombstone) * (# of uncompacting sstable
runs[1])

[1] It's # of *runs*, as each key tend to overlap with only one
fragment of each run.

After this patch, the estimation becomes:
(# of keys containing a GC'able tombstone) * (# of uncompacting
runs).

With repair mode for tombstone GC, the assumption, that retrieval
of gc_before is more expensive than calculating max purgeable,
is kept. We can revisit it later. But the default mode, which
is the "timeout" (i.e. gc_grace_seconds) one, we still benefit
from the optimization of deferring the calculation until
needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes #13908
2023-05-18 09:01:50 +03:00
Botond Dénes
10fe76a0fe compaction/compaction: remove now unused scrub_validate_mode_validate_reader() 2023-05-02 09:42:42 -04:00
Aleksandra Martyniuk
7ead1a7857 compaction: request abort only once in compaction_data::stop
compaction_manager::task (and thus compaction_data) can be stopped
because of many different reasons. Thus, abort can be requested more
than once on compaction_data abort source causing a crash.

To prevent this before each request_abort() we check whether an abort
was requested before.

Closes #12004
2022-11-17 12:44:59 +02:00
Aleksandra Martyniuk
3a805a9d9b compaction: extract statistics in compaction_result
Statistics from compaction_result are extracted to new struct
compaction_stats and stored as a field of compaction_result.
2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk
ab85dab05d scrub compaction: count validation errors
The number of validation errors encountered during scrub compaction
is counted.
2022-07-29 09:35:20 +02:00
Mikołaj Sielużycki
2edf137f61 compaction: Add tracking start_size in compaction_result. 2022-06-07 12:55:28 +02:00
Benny Halevy
78d6f6a519 compaction: sanitize headers from flat_mutation_reader v1
flat_mutation_reader make_scrubbing_reader no longer exists
and there is no need to include flat_mutation_reader.hh
nor forward declare the class.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-04-28 17:23:04 +03:00
Benny Halevy
ffc314d506 compaction: add documentation for compaction_type to string conversions
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
28a74a2e90 compaction: expose to_string(compaction_type)
To be used in the next patch to generate
a string dscription from the compaction_type.

In theory, we could use compaction_name()
btu the latter returns the compaction type
in all-upper case and that is very different from
what we print to the log today.  The all-upper
strings are used for the api layer, e.g. to
stop tasks of a particular compaction type.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-03-10 08:39:18 +02:00
Benny Halevy
57f97046a7 compaction_manager: allow stopping sleeping tasks
Use exponential_backoff_retry::retry(abort_source&)
when sleeping between retries and request abort
when the task is stopped.

Fixes #10112

Test: unit(dev)

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2022-02-21 21:01:56 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Botond Dénes
a7f4ab6b14 compaction/compaction: remove v1 version of validate and scrub reader factory methods 2022-01-14 10:19:56 +02:00
Botond Dénes
b315d17c2a compaction: migrate scrub and validate to v2
We add v2 version of external API but leave the old v1 in place to help
incremental migration. The implementation is migrated to v2.
2022-01-14 08:54:26 +02:00
Benny Halevy
07c5ddf182 sstables: add is_eligible_for_compaction
Currently compaction_manager tracks sstables
based on !requires_view_building() and similarly,
table::in_strategy_sstables picks up only sstables
that are not in staging.

is_eligible_for_compaction() generalizes this condition
in preparation for adding a quarantine subdirectory for
invalid sstables that should not be compacted as well.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:00:44 +02:00
Raphael S. Carvalho
06405729ce compaction: stop including database.hh
after switching to table_state, compaction code can finally stop
including database.hh

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:03 -03:00
Raphael S. Carvalho
69ab5c9dff compaction: switch to table_state in get_fully_expired_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:02 -03:00
Raphael S. Carvalho
d89edad9fb compaction: switch to table_state
Make compaction procedure switch to table_state. Only function in
compaction.cc still directly using table is
get_fully_expired_sstables(T,...), but subsequently we'll make it
switch to table_state and then we can finally stop including database.hh
in the compaction code.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-19 22:06:01 -03:00
Raphael S. Carvalho
29df862f57 compaction: make table param of get_fully_expired_sstables() const
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-09 10:41:54 -03:00
Raphael S. Carvalho
132a840ed5 compaction: fix outdated doc of compact_sstables()
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 11:09:24 -03:00
Raphael S. Carvalho
0d745912d0 compaction: fix indentantion in compaction.hh
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:50:46 -03:00
Raphael S. Carvalho
b344db1696 compaction: give a more descriptive name to compaction_data
info is no longer descriptive, as compaction now works with
compaction_data instead of compaction_info.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-11-04 09:43:08 -03:00
Raphael S. Carvalho
ab0217e30e compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs
Compaction efficiency can be defined as how much backlog is reduced per
byte read or written.
We know a few facts about efficiency:
1) the more files are compacted together (the fan-in) the higher the
efficiency will be, however...
2) the bigger the size difference of input files the worse the
efficiency, i.e. higher write amplification.

so compactions with similar-sized files are the most efficient ones,
and its efficiency increases with a higher number of files.

However, in order to not have bad read amplification, number of files
cannot grow out of bounds. So we have to allow parallel compaction
on different tiers, but to avoid "dilution" of overall efficiency,
we will only allow a compaction to proceed if its efficiency is
greater than or equal to the efficiency of ongoing compactions.

By the time being, we'll assume that strategies don't pick candidates
with wildly different sizes, so efficiency is only calculated as a
function of compaction fan-in.

Now when system is under heavy load, then fan-in threshold will
automatically grow to guarantee that overall efficiency remains
stable.

Please note that fan-in is defined in number of runs. LCS compaction
on higher levels will have a fan-in of 2. Under heavy load, it
may happen that LCS will temporarily switch to size-tiered mode
for compaction to keep up with amount of data being produced.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>
2021-11-03 20:03:23 +02:00
Raphael S. Carvalho
9067a13eac compaction: split compaction info and data for control
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:57 -03:00
Raphael S. Carvalho
cbd78be2dd compaction: remove start_size and end_size from compaction_info
those stats aren't used in compaction stats API and therefore they
can be removed. end_size is added to compaction_result (needed for
updating history) and start_size can be calculated in advance.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:45 -03:00
Raphael S. Carvalho
38df9c68f8 compaction: kill sstables field in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:33 -03:00
Raphael S. Carvalho
90cfe895d4 compaction: kill table pointer in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:16:29 -03:00
Raphael S. Carvalho
efed06e2e4 compaction: move management of compaction_info to compaction_manager
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.

This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.

From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.

This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:15:00 -03:00
Raphael S. Carvalho
1f5b17fdc5 compaction: move output run id from compaction_info into task
this run id is used to track partial runs that are being written to.
let's move it from info into task, as this is not an external info,
but rather one that belongs to compaction_manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-30 13:13:20 -03:00
Raphael S. Carvalho
afd45b9f49 compaction: Don't leak backlog of input sstable when compaction strategy is changed
The generic backlog formula is: ALL + PARTIAL - COMPACTING

With transfer_ongoing_charges() we already ignore the effect of
ongoing compactions on COMPACTING as we judge them to be pointless.

But ongoing compactions will run to completion, meaning that output
sstables will be added to ALL anyway, in the formula above.

With stop_tracking_ongoing_compactions(), input sstables are never
removed from the tracker, but output sstables are added, which means
we end up with duplicate backlog in the tracker.

By removing this tracking mechanism, pointless ongoing compaction
will be ignored as expected and the leaks will be fixed.

Later, the intention is to force a stop on ongoing compactions if
strategy has changed as they're pointless anyway.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-27 14:03:28 -03:00
Avi Kivity
d7ac699a55 Revert "Merge "compaction: Update backlog tracker correctly when schema is updated" from Raphael"
This reverts commit b5cf0b4489, reversing
changes made to e8493e20cb. It causes
segmentation faults when sstable readers are closed.

Fixes #9388.
2021-09-26 18:31:49 +03:00
Avi Kivity
bf94c06fc7 Revert "Merge "simplifications and layer violation fix for compaction manager" from Raphael"
This reverts commit 7127c92acc, reversing
changes made to 88480ac504. We need to
revert b5cf0b4489 to fix #9388, and this stands
in the way.

Ref #9388.
2021-09-26 18:30:36 +03:00
Raphael S. Carvalho
5bf51ced14 compaction: split compaction info and data for control
compaction_info must only contain info data to be exported to the
outside world, whereas compaction_data will contain data for
controlling compaction behavior and stats which change as
compaction progresses.
This separation makes the interface clearer, also allowing for
future improvements like removing direct references to table
in compaction.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:56:18 -03:00
Raphael S. Carvalho
6d1170ac94 compaction: remove start_size and end_size from compaction_info
those stats aren't used in compaction stats API and therefore they
can be removed. end_size is added to compaction_result (needed for
updating history) and start_size can be calculated in advance.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:41:13 -03:00
Raphael S. Carvalho
d73a241a4e compaction: kill sstables field in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:38:32 -03:00
Raphael S. Carvalho
b6b4042faf compaction: kill table pointer in compaction_info
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:38:11 -03:00
Raphael S. Carvalho
0885376a85 compaction: move management of compaction_info to compaction_manager
Today, compaction is calling compaction manager to register / deregister
the compaction_info created by it.

This is a layer violation because manager sits one layer above
compaction, so manager should be responsible for managing compaction
info.

From now on, compaction_info will be created and managed by
compaction_manager. compaction will only have a reference to info,
which it can use to update the world about compaction progress.

This will allow compaction_manager to be simplified as info can be
coupled with its respective task, allowing duplication to be removed
and layer violation to be fixed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 10:00:49 -03:00
Raphael S. Carvalho
7688d0432c compaction: move output run id from compaction_info into task
this run id is used to track partial runs that are being written to.
let's move it from info into task, as this is not an external info,
but rather one that belongs to compaction_manager.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-23 09:56:01 -03:00
Raphael S. Carvalho
0a3049908c compaction: Don't leak backlog of input sstable when compaction strategy is changed
The generic back formula is: ALL + PARTIAL - COMPACTING

With transfer_ongoing_charges() we already ignore the effect of
ongoing compactions on COMPACTING as we judge them to be pointless.

But ongoing compactions will run to completion, meaning that output
sstables will be added to ALL anyway, in the formula above.

With stop_tracking_ongoing_compactions(), input sstables are never
removed from the tracker, but output sstables are added, which means
we end up with duplicate backlog in the tracker.

By removing this tracking mechanism, pointless ongoing compaction
will be ignored as expected and the leaks will be fixed.

Later, the intention is to force a stop on ongoing compactions if
strategy has changed as they're pointless anyway.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2021-09-20 15:36:05 -03:00
Raphael S. Carvalho
acba3bd3c4 sstables: give a more descriptive name to compaction_options
the name compaction_options is confusing as it overlaps in meaning
with compaction_descriptor. hard to reason what are the exact
difference between them, without digging into the implementation.

compaction_options is intended to only carry options specific to
a give compaction type, like a mode for scrub, so let's rename
it to compaction_type_options to make it clearer for the
readers.

[avi: adjust for scrub changes]
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210908003934.152054-1-raphaelsc@scylladb.com>
2021-09-12 11:21:33 +03:00
Botond Dénes
a258f5639b compaction: validation compaction -> scrub compaction (validate mode)
Fold validation compaction into scrub compaction (validate mode). Only
on the interface level though: to initiate validation compaction one now
has to use `compaction_options::make_scrub(compaction_options::scrub::mode::validate)`.
The implementation code stays as-is -- separate.
2021-08-05 07:32:05 +03:00
Botond Dénes
a57caf5229 sstables/compaction: implement validation compaction type
Validation just reads all the passed-in sstables and runs the mutation
stream through a mutation fragment stream validator, logging all errors
found, and finally also logging whether all the sstables are valid or
not. Validation is not really a compaction as it doesn't write any
output. As such it bypasses most of the usual compaction machinery, so
the latter doesn't have to be adapted to this outlier.
This patch only adds the implementation, but it still cannot be started
via `compact_sstables()`, that will be implemented by the next patches.
2021-07-12 10:25:15 +03:00
Raphael S. Carvalho
1924e8d2b6 treewide: Move compaction code into a new top-level compaction dir
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.

Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
2021-07-07 23:21:51 +03:00