scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	25be958ab9	compaction: Introduce compaction_descriptor::sstables_size This method can be reused in manager, and will be useful for upcoming cleanup task. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-03-21 12:55:10 -03:00
Raphael S. Carvalho	c25d8f6770	compaction: Move decision of garbage collection from strategy to task type For compaction to be able to purge expired data, like tombstones, a sstable set snapshot is set in the compaction descriptor. That's a decision that belongs to task type. For example, all regular compaction enable GC, whereas scrub for example doesn't for safety reasons. The problem is that the decision is being made by every instantiation of compaction_descriptor in the strategies, which is both unnecessary and also adds lots of boilerplate to the code, making it hard to understand and work with. As sstable set snapshot is an implementation detail, a new method is being added to compaction_descriptor to make the intention clearer, making the interface easier to understand. can_purge_tombstones, used previously by rewrite task only, is being reused for communicating GC intention into task::compact_sstables(). The boilerplate was a pain when adding a new strategy method for the ongoing work on cleanup, described by issue #10097. Another benefit is that we'll now only create a set snapshot when compaction will really run. Before, it could happen that the snapshot would be discarded if the compaction attempt had to be postponed, which is a waste of cpu cycles. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-03-21 12:14:04 -03:00
Raphael S. Carvalho	1a2332a0ba	compaction: Move release_exhausted out of the compaction descriptor With compact_sstables() now living in compaction_manager::task, release_exhausted no longer has to live inside compaction_descriptor, which is a good direction because implementation detail is being removed from the interface. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220311023410.250149-2-raphaelsc@scylladb.com>	2022-03-14 15:39:23 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Benny Halevy	cc122984d6	compaction: scrub: add quarantine_mode option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:29:04 +02:00
Raphael S. Carvalho	ab0217e30e	compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs Compaction efficiency can be defined as how much backlog is reduced per byte read or written. We know a few facts about efficiency: 1) the more files are compacted together (the fan-in) the higher the efficiency will be, however... 2) the bigger the size difference of input files the worse the efficiency, i.e. higher write amplification. so compactions with similar-sized files are the most efficient ones, and its efficiency increases with a higher number of files. However, in order to not have bad read amplification, number of files cannot grow out of bounds. So we have to allow parallel compaction on different tiers, but to avoid "dilution" of overall efficiency, we will only allow a compaction to proceed if its efficiency is greater than or equal to the efficiency of ongoing compactions. By the time being, we'll assume that strategies don't pick candidates with wildly different sizes, so efficiency is only calculated as a function of compaction fan-in. Now when system is under heavy load, then fan-in threshold will automatically grow to guarantee that overall efficiency remains stable. Please note that fan-in is defined in number of runs. LCS compaction on higher levels will have a fan-in of 2. Under heavy load, it may happen that LCS will temporarily switch to size-tiered mode for compaction to keep up with amount of data being produced. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>	2021-11-03 20:03:23 +02:00
Vlad Zolotarov	79b0654d60	time_window_compaction_strategy: put expired sstables in a separate compaction task It's much more efficient to have a separate compaction task that consists completely from expired sstables and make sure it gets a unique "weight" than mixing expired sstables with non-expired sstables adding an unpredictable latency to an eviction event of an expired sstable. This change also improves the visibility of eviction events because now they are always going to appear in the log as compactions that compact into an empty set. Fixes #9533 Signed-off-by: Vlad Zolotarov <vladz@scylladb.com> Closes #9534	2021-10-31 17:54:40 +02:00
Benny Halevy	5483269dfb	compaction_manager: pass owned_ranges via cleanup/upgrade options So they can be easily computed using an async task before constructing the compaction object in a following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-10-13 14:17:46 +03:00
Raphael S. Carvalho	acba3bd3c4	sstables: give a more descriptive name to compaction_options the name compaction_options is confusing as it overlaps in meaning with compaction_descriptor. hard to reason what are the exact difference between them, without digging into the implementation. compaction_options is intended to only carry options specific to a give compaction type, like a mode for scrub, so let's rename it to compaction_type_options to make it clearer for the readers. [avi: adjust for scrub changes] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210908003934.152054-1-raphaelsc@scylladb.com>	2021-09-12 11:21:33 +03:00
Botond Dénes	76f2790c24	compaction/compaction_descriptor: add comment to Validation compaction type Add a note explaining what Origin uses this for, to deter future attempts at reusing this for something else.	2021-08-05 07:36:45 +03:00
Botond Dénes	ab7a2cabb3	compaction/compaction_descriptor: compaction_options: remove validate It is unused now.	2021-08-05 07:36:45 +03:00
Botond Dénes	8b64a6caa7	compaction/compaction_descriptor: compaction_options: add options() accessor	2021-08-03 09:34:17 +03:00
Botond Dénes	f01b799a30	compaction/compaction_descriptor: compaction_options::scrub::mode: add validate To replace compaction_type::Validation.	2021-08-03 09:34:15 +03:00
Botond Dénes	891921377d	sstables/compaction_descriptor: compaction_options: add validation compaction type This enables starting validation compaction via `compact_sstables()`.	2021-07-12 10:25:15 +03:00
Raphael S. Carvalho	1924e8d2b6	treewide: Move compaction code into a new top-level compaction dir Since compaction is layered on top of sstables, let's move all compaction code into a new top-level directory. This change will give me extra motivation to remove all layer violations, like sstable calling compaction-specific code, and compaction entanglement with other components like table and storage service. Next steps: - remove all layer violations - move compaction code in sstables namespace into a new one for compaction. - move compaction unit tests into its own file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>	2021-07-07 23:21:51 +03:00

15 Commits