scylladb

Author	SHA1	Message	Date
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Aleksandra Martyniuk	0c6a3f568a	compaction: delete default_compaction_progress_monitor default_compaction_progress_monitor returns a reference to a static object. So, it should be read-only, but its users need to modify it. Delete default_compaction_progress_monitor and use one's own compaction_progress_monitor instance where it's needed. Closes scylladb/scylladb#15800	2023-10-23 16:03:34 +03:00
Aleksandra Martyniuk	39e96c6521	compaction: find total compaction size	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	7b3e0ab1f2	compaction: sstables: monitor validation scrub with compaction_read_generator Validation scrub bypasses the usual compaction machinery, though it still needs to be tracked with compaction_progress_monitor so that we could reach its progress from compaction task executor. Track sstable scrub in validate mode with read monitors.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	3553556708	compaction: keep compaction_progress_monitor in compaction_task_executor Keep compaction_progress_monitor in compaction_task_executor and pass a reference to it further, so that the compaction progress could be retrieved out of it.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	22bf3c03df	compaction: add compaction_progress_monitor In the following patches compaction_read_monitor_generator will be used to find progress of compaction_task_executor's. To avoid unnecessary life prolongation and exposing internals of the class out of compaction.cc, compaction_progress_monitor is created. Compaction class keeps a reference to the compaction_progress_monitor. Inheriting classes which actually use compaction_read_monitor_generator, need to set it with set_generator method.	2023-10-12 17:03:46 +02:00
Raphael S. Carvalho	83c70ac04f	utils: Extract pretty printers into a header Can be easily reused elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:20 -03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Raphael S. Carvalho	38b226f997	Resurrect optimization to avoid bloom filter checks during compaction Commit `8c4b5e4283` introduced an optimization which only calculates max purgeable timestamp when a tombstone satisfy the grace period. Commit 'repair: Get rid of the gc_grace_seconds' inverted the order, probably under the assumption that getting grace period can be more expensive than calculating max purgeable, as repair-mode GC will look up into history data in order to calculate gc_before. This caused a significant regression on tombstone heavy compactions, where most of tombstones are still newer than grace period. A compaction which used to take 5s, now takes 35s. 7x slower. The reason is simple, now calculation of max purgeable happens for every single tombstone (once for each key), even the ones that cannot be GC'ed yet. And each calculation has to iterate through (i.e. check the bloom filter of) every single sstable that doesn't participate in compaction. Flame graph makes it very clear that bloom filter is a heavy path without the optimization: 45.64% 45.64% sstable_compact sstable_compaction_test_g [.] utils::filter::bloom_filter::is_present With its resurrection, the problem is gone. This scenario can easily happen, e.g. after a deletion burst, and tombstones becoming only GC'able after they reach upper tiers in the LSM tree. Before this patch, a compaction can be estimated to have this # of filter checks: (# of keys containing any tombstone) * (# of uncompacting sstable runs[1]) [1] It's # of runs, as each key tend to overlap with only one fragment of each run. After this patch, the estimation becomes: (# of keys containing a GC'able tombstone) * (# of uncompacting runs). With repair mode for tombstone GC, the assumption, that retrieval of gc_before is more expensive than calculating max purgeable, is kept. We can revisit it later. But the default mode, which is the "timeout" (i.e. gc_grace_seconds) one, we still benefit from the optimization of deferring the calculation until needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13908	2023-05-18 09:01:50 +03:00
Botond Dénes	10fe76a0fe	compaction/compaction: remove now unused scrub_validate_mode_validate_reader()	2023-05-02 09:42:42 -04:00
Aleksandra Martyniuk	7ead1a7857	compaction: request abort only once in compaction_data::stop compaction_manager::task (and thus compaction_data) can be stopped because of many different reasons. Thus, abort can be requested more than once on compaction_data abort source causing a crash. To prevent this before each request_abort() we check whether an abort was requested before. Closes #12004	2022-11-17 12:44:59 +02:00
Aleksandra Martyniuk	3a805a9d9b	compaction: extract statistics in compaction_result Statistics from compaction_result are extracted to new struct compaction_stats and stored as a field of compaction_result.	2022-07-29 09:35:20 +02:00
Aleksandra Martyniuk	ab85dab05d	scrub compaction: count validation errors The number of validation errors encountered during scrub compaction is counted.	2022-07-29 09:35:20 +02:00
Mikołaj Sielużycki	2edf137f61	compaction: Add tracking start_size in compaction_result.	2022-06-07 12:55:28 +02:00
Benny Halevy	78d6f6a519	compaction: sanitize headers from flat_mutation_reader v1 flat_mutation_reader make_scrubbing_reader no longer exists and there is no need to include flat_mutation_reader.hh nor forward declare the class. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-28 17:23:04 +03:00
Benny Halevy	ffc314d506	compaction: add documentation for compaction_type to string conversions Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-03-10 08:39:18 +02:00
Benny Halevy	28a74a2e90	compaction: expose to_string(compaction_type) To be used in the next patch to generate a string dscription from the compaction_type. In theory, we could use compaction_name() btu the latter returns the compaction type in all-upper case and that is very different from what we print to the log today. The all-upper strings are used for the api layer, e.g. to stop tasks of a particular compaction type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-03-10 08:39:18 +02:00
Benny Halevy	57f97046a7	compaction_manager: allow stopping sleeping tasks Use exponential_backoff_retry::retry(abort_source&) when sleeping between retries and request abort when the task is stopped. Fixes #10112 Test: unit(dev) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-02-21 21:01:56 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	a7f4ab6b14	compaction/compaction: remove v1 version of validate and scrub reader factory methods	2022-01-14 10:19:56 +02:00
Botond Dénes	b315d17c2a	compaction: migrate scrub and validate to v2 We add v2 version of external API but leave the old v1 in place to help incremental migration. The implementation is migrated to v2.	2022-01-14 08:54:26 +02:00
Benny Halevy	07c5ddf182	sstables: add is_eligible_for_compaction Currently compaction_manager tracks sstables based on !requires_view_building() and similarly, table::in_strategy_sstables picks up only sstables that are not in staging. is_eligible_for_compaction() generalizes this condition in preparation for adding a quarantine subdirectory for invalid sstables that should not be compacted as well. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Raphael S. Carvalho	06405729ce	compaction: stop including database.hh after switching to table_state, compaction code can finally stop including database.hh Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-19 22:06:03 -03:00
Raphael S. Carvalho	69ab5c9dff	compaction: switch to table_state in get_fully_expired_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-19 22:06:02 -03:00
Raphael S. Carvalho	d89edad9fb	compaction: switch to table_state Make compaction procedure switch to table_state. Only function in compaction.cc still directly using table is get_fully_expired_sstables(T,...), but subsequently we'll make it switch to table_state and then we can finally stop including database.hh in the compaction code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-19 22:06:01 -03:00
Raphael S. Carvalho	29df862f57	compaction: make table param of get_fully_expired_sstables() const Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-09 10:41:54 -03:00
Raphael S. Carvalho	132a840ed5	compaction: fix outdated doc of compact_sstables() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 11:09:24 -03:00
Raphael S. Carvalho	0d745912d0	compaction: fix indentantion in compaction.hh Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:50:46 -03:00
Raphael S. Carvalho	b344db1696	compaction: give a more descriptive name to compaction_data info is no longer descriptive, as compaction now works with compaction_data instead of compaction_info. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:43:08 -03:00
Raphael S. Carvalho	ab0217e30e	compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs Compaction efficiency can be defined as how much backlog is reduced per byte read or written. We know a few facts about efficiency: 1) the more files are compacted together (the fan-in) the higher the efficiency will be, however... 2) the bigger the size difference of input files the worse the efficiency, i.e. higher write amplification. so compactions with similar-sized files are the most efficient ones, and its efficiency increases with a higher number of files. However, in order to not have bad read amplification, number of files cannot grow out of bounds. So we have to allow parallel compaction on different tiers, but to avoid "dilution" of overall efficiency, we will only allow a compaction to proceed if its efficiency is greater than or equal to the efficiency of ongoing compactions. By the time being, we'll assume that strategies don't pick candidates with wildly different sizes, so efficiency is only calculated as a function of compaction fan-in. Now when system is under heavy load, then fan-in threshold will automatically grow to guarantee that overall efficiency remains stable. Please note that fan-in is defined in number of runs. LCS compaction on higher levels will have a fan-in of 2. Under heavy load, it may happen that LCS will temporarily switch to size-tiered mode for compaction to keep up with amount of data being produced. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>	2021-11-03 20:03:23 +02:00
Raphael S. Carvalho	9067a13eac	compaction: split compaction info and data for control compaction_info must only contain info data to be exported to the outside world, whereas compaction_data will contain data for controlling compaction behavior and stats which change as compaction progresses. This separation makes the interface clearer, also allowing for future improvements like removing direct references to table in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:57 -03:00
Raphael S. Carvalho	cbd78be2dd	compaction: remove start_size and end_size from compaction_info those stats aren't used in compaction stats API and therefore they can be removed. end_size is added to compaction_result (needed for updating history) and start_size can be calculated in advance. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:45 -03:00
Raphael S. Carvalho	38df9c68f8	compaction: kill sstables field in compaction_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:33 -03:00
Raphael S. Carvalho	90cfe895d4	compaction: kill table pointer in compaction_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:29 -03:00
Raphael S. Carvalho	efed06e2e4	compaction: move management of compaction_info to compaction_manager Today, compaction is calling compaction manager to register / deregister the compaction_info created by it. This is a layer violation because manager sits one layer above compaction, so manager should be responsible for managing compaction info. From now on, compaction_info will be created and managed by compaction_manager. compaction will only have a reference to info, which it can use to update the world about compaction progress. This will allow compaction_manager to be simplified as info can be coupled with its respective task, allowing duplication to be removed and layer violation to be fixed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:15:00 -03:00
Raphael S. Carvalho	1f5b17fdc5	compaction: move output run id from compaction_info into task this run id is used to track partial runs that are being written to. let's move it from info into task, as this is not an external info, but rather one that belongs to compaction_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:13:20 -03:00
Raphael S. Carvalho	afd45b9f49	compaction: Don't leak backlog of input sstable when compaction strategy is changed The generic backlog formula is: ALL + PARTIAL - COMPACTING With transfer_ongoing_charges() we already ignore the effect of ongoing compactions on COMPACTING as we judge them to be pointless. But ongoing compactions will run to completion, meaning that output sstables will be added to ALL anyway, in the formula above. With stop_tracking_ongoing_compactions(), input sstables are never removed from the tracker, but output sstables are added, which means we end up with duplicate backlog in the tracker. By removing this tracking mechanism, pointless ongoing compaction will be ignored as expected and the leaks will be fixed. Later, the intention is to force a stop on ongoing compactions if strategy has changed as they're pointless anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-27 14:03:28 -03:00
Avi Kivity	d7ac699a55	Revert "Merge "compaction: Update backlog tracker correctly when schema is updated" from Raphael" This reverts commit `b5cf0b4489`, reversing changes made to `e8493e20cb`. It causes segmentation faults when sstable readers are closed. Fixes #9388.	2021-09-26 18:31:49 +03:00
Avi Kivity	bf94c06fc7	Revert "Merge "simplifications and layer violation fix for compaction manager" from Raphael" This reverts commit `7127c92acc`, reversing changes made to `88480ac504`. We need to revert `b5cf0b4489` to fix #9388, and this stands in the way. Ref #9388.	2021-09-26 18:30:36 +03:00
Raphael S. Carvalho	5bf51ced14	compaction: split compaction info and data for control compaction_info must only contain info data to be exported to the outside world, whereas compaction_data will contain data for controlling compaction behavior and stats which change as compaction progresses. This separation makes the interface clearer, also allowing for future improvements like removing direct references to table in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:56:18 -03:00
Raphael S. Carvalho	6d1170ac94	compaction: remove start_size and end_size from compaction_info those stats aren't used in compaction stats API and therefore they can be removed. end_size is added to compaction_result (needed for updating history) and start_size can be calculated in advance. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:41:13 -03:00
Raphael S. Carvalho	d73a241a4e	compaction: kill sstables field in compaction_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:32 -03:00
Raphael S. Carvalho	b6b4042faf	compaction: kill table pointer in compaction_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:11 -03:00
Raphael S. Carvalho	0885376a85	compaction: move management of compaction_info to compaction_manager Today, compaction is calling compaction manager to register / deregister the compaction_info created by it. This is a layer violation because manager sits one layer above compaction, so manager should be responsible for managing compaction info. From now on, compaction_info will be created and managed by compaction_manager. compaction will only have a reference to info, which it can use to update the world about compaction progress. This will allow compaction_manager to be simplified as info can be coupled with its respective task, allowing duplication to be removed and layer violation to be fixed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:00:49 -03:00
Raphael S. Carvalho	7688d0432c	compaction: move output run id from compaction_info into task this run id is used to track partial runs that are being written to. let's move it from info into task, as this is not an external info, but rather one that belongs to compaction_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 09:56:01 -03:00
Raphael S. Carvalho	0a3049908c	compaction: Don't leak backlog of input sstable when compaction strategy is changed The generic back formula is: ALL + PARTIAL - COMPACTING With transfer_ongoing_charges() we already ignore the effect of ongoing compactions on COMPACTING as we judge them to be pointless. But ongoing compactions will run to completion, meaning that output sstables will be added to ALL anyway, in the formula above. With stop_tracking_ongoing_compactions(), input sstables are never removed from the tracker, but output sstables are added, which means we end up with duplicate backlog in the tracker. By removing this tracking mechanism, pointless ongoing compaction will be ignored as expected and the leaks will be fixed. Later, the intention is to force a stop on ongoing compactions if strategy has changed as they're pointless anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-20 15:36:05 -03:00
Raphael S. Carvalho	acba3bd3c4	sstables: give a more descriptive name to compaction_options the name compaction_options is confusing as it overlaps in meaning with compaction_descriptor. hard to reason what are the exact difference between them, without digging into the implementation. compaction_options is intended to only carry options specific to a give compaction type, like a mode for scrub, so let's rename it to compaction_type_options to make it clearer for the readers. [avi: adjust for scrub changes] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210908003934.152054-1-raphaelsc@scylladb.com>	2021-09-12 11:21:33 +03:00
Botond Dénes	a258f5639b	compaction: validation compaction -> scrub compaction (validate mode) Fold validation compaction into scrub compaction (validate mode). Only on the interface level though: to initiate validation compaction one now has to use `compaction_options::make_scrub(compaction_options::scrub::mode::validate)`. The implementation code stays as-is -- separate.	2021-08-05 07:32:05 +03:00
Botond Dénes	a57caf5229	sstables/compaction: implement validation compaction type Validation just reads all the passed-in sstables and runs the mutation stream through a mutation fragment stream validator, logging all errors found, and finally also logging whether all the sstables are valid or not. Validation is not really a compaction as it doesn't write any output. As such it bypasses most of the usual compaction machinery, so the latter doesn't have to be adapted to this outlier. This patch only adds the implementation, but it still cannot be started via `compact_sstables()`, that will be implemented by the next patches.	2021-07-12 10:25:15 +03:00
Raphael S. Carvalho	1924e8d2b6	treewide: Move compaction code into a new top-level compaction dir Since compaction is layered on top of sstables, let's move all compaction code into a new top-level directory. This change will give me extra motivation to remove all layer violations, like sstable calling compaction-specific code, and compaction entanglement with other components like table and storage service. Next steps: - remove all layer violations - move compaction code in sstables namespace into a new one for compaction. - move compaction unit tests into its own file Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210707194058.87060-1-raphaelsc@scylladb.com>	2021-07-07 23:21:51 +03:00

50 Commits