Commit Graph

15 Commits

Author SHA1 Message Date
Raphael S. Carvalho
25be958ab9 compaction: Introduce compaction_descriptor::sstables_size
This method can be reused in manager, and will be useful for upcoming
cleanup task.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:55:10 -03:00
Raphael S. Carvalho
c25d8f6770 compaction: Move decision of garbage collection from strategy to task type
For compaction to be able to purge expired data, like tombstones, a
sstable set snapshot is set in the compaction descriptor.

That's a decision that belongs to task type. For example, all regular
compaction enable GC, whereas scrub for example doesn't for safety
reasons.

The problem is that the decision is being made by every instantiation
of compaction_descriptor in the strategies, which is both unnecessary
and also adds lots of boilerplate to the code, making it hard to
understand and work with.

As sstable set snapshot is an implementation detail, a new method
is being added to compaction_descriptor to make the intention
clearer, making the interface easier to understand.

can_purge_tombstones, used previously by rewrite task only, is being
reused for communicating GC intention into task::compact_sstables().

The boilerplate was a pain when adding a new strategy method for
the ongoing work on cleanup, described by issue #10097.
Another benefit is that we'll now only create a set snapshot when
compaction will really run. Before, it could happen that the snapshot
would be discarded if the compaction attempt had to be postponed,
which is a waste of cpu cycles.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2022-03-21 12:14:04 -03:00
Raphael S. Carvalho
1a2332a0ba compaction: Move release_exhausted out of the compaction descriptor
With compact_sstables() now living in compaction_manager::task,
release_exhausted no longer has to live inside compaction_descriptor,
which is a good direction because implementation detail is being
removed from the interface.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220311023410.250149-2-raphaelsc@scylladb.com>
2022-03-14 15:39:23 +02:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Benny Halevy
cc122984d6 compaction: scrub: add quarantine_mode option
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-12-05 18:29:04 +02:00
Raphael S. Carvalho
ab0217e30e compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs
Compaction efficiency can be defined as how much backlog is reduced per
byte read or written.
We know a few facts about efficiency:
1) the more files are compacted together (the fan-in) the higher the
efficiency will be, however...
2) the bigger the size difference of input files the worse the
efficiency, i.e. higher write amplification.

so compactions with similar-sized files are the most efficient ones,
and its efficiency increases with a higher number of files.

However, in order to not have bad read amplification, number of files
cannot grow out of bounds. So we have to allow parallel compaction
on different tiers, but to avoid "dilution" of overall efficiency,
we will only allow a compaction to proceed if its efficiency is
greater than or equal to the efficiency of ongoing compactions.

By the time being, we'll assume that strategies don't pick candidates
with wildly different sizes, so efficiency is only calculated as a
function of compaction fan-in.

Now when system is under heavy load, then fan-in threshold will
automatically grow to guarantee that overall efficiency remains
stable.

Please note that fan-in is defined in number of runs. LCS compaction
on higher levels will have a fan-in of 2. Under heavy load, it
may happen that LCS will temporarily switch to size-tiered mode
for compaction to keep up with amount of data being produced.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>
2021-11-03 20:03:23 +02:00
Vlad Zolotarov
79b0654d60 time_window_compaction_strategy: put expired sstables in a separate compaction task
It's much more efficient to have a separate compaction task that consists
completely from expired sstables and make sure it gets a unique "weight" than
mixing expired sstables with non-expired sstables adding an unpredictable
latency to an eviction event of an expired sstable.

This change also improves the visibility of eviction events because now
they are always going to appear in the log as compactions that compact into
an empty set.

Fixes #9533

Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>

Closes #9534
2021-10-31 17:54:40 +02:00
Benny Halevy
5483269dfb compaction_manager: pass owned_ranges via cleanup/upgrade options
So they can be easily computed using an async task
before constructing the compaction object
in a following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2021-10-13 14:17:46 +03:00
Raphael S. Carvalho
acba3bd3c4 sstables: give a more descriptive name to compaction_options
the name compaction_options is confusing as it overlaps in meaning
with compaction_descriptor. hard to reason what are the exact
difference between them, without digging into the implementation.

compaction_options is intended to only carry options specific to
a give compaction type, like a mode for scrub, so let's rename
it to compaction_type_options to make it clearer for the
readers.

[avi: adjust for scrub changes]
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210908003934.152054-1-raphaelsc@scylladb.com>
2021-09-12 11:21:33 +03:00
Botond Dénes
76f2790c24 compaction/compaction_descriptor: add comment to Validation compaction type
Add a note explaining what Origin uses this for, to deter future
attempts at reusing this for something else.
2021-08-05 07:36:45 +03:00
Botond Dénes
ab7a2cabb3 compaction/compaction_descriptor: compaction_options: remove validate
It is unused now.
2021-08-05 07:36:45 +03:00
Botond Dénes
8b64a6caa7 compaction/compaction_descriptor: compaction_options: add options() accessor 2021-08-03 09:34:17 +03:00
Botond Dénes
f01b799a30 compaction/compaction_descriptor: compaction_options::scrub::mode: add validate
To replace compaction_type::Validation.
2021-08-03 09:34:15 +03:00
Botond Dénes
891921377d sstables/compaction_descriptor: compaction_options: add validation compaction type
This enables starting validation compaction via `compact_sstables()`.
2021-07-12 10:25:15 +03:00
Raphael S. Carvalho
1924e8d2b6 treewide: Move compaction code into a new top-level compaction dir
Since compaction is layered on top of sstables, let's move all compaction code
into a new top-level directory.
This change will give me extra motivation to remove all layer violations, like
sstable calling compaction-specific code, and compaction entanglement with
other components like table and storage service.

Next steps:
- remove all layer violations
- move compaction code in sstables namespace into a new one for compaction.
- move compaction unit tests into its own file

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>
2021-07-07 23:21:51 +03:00