scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-02 06:05:53 +00:00

Author	SHA1	Message	Date
Botond Dénes	55bb70a878	Merge "Make sure TWCS per-window major includes all files" from Raphael " TWCS perform STCS on a window as long as it's the most recent one. From there on, TWCS will compact all files in the past window into a single file. With some moderate write load, it could happen that there's still some compaction activity in that past window, meaning that per-window major may miss some files being currently compacted. As a result, a past window may contain more than 1 file after all compaction activity is done on its behalf, which may increase read amplification. To avoid that, TWCS will now make sure that per-window major is serialized, to make sure no files are missed. Fixes #9553. tests: unit(dev). " * 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla: TWCS: Make sure major on past window is done on all its sstables TWCS: remove needless param for STCS options TWCS: kill unused param in newest_bucket() compaction: Implement strategy control and wire it compaction: Add interface to control strategy behavior.	2021-12-20 17:12:50 +02:00
Nadav Har'El	252ce8afd4	Merge 'Extend stop compaction api' from Benny Halevy Allow stopping compaction by type on a given keyspace and list of tables. Also add api unit test suite that tests the existing `stop_compaction` api and the new `stop_keyspace_compaction` api. Fixes #9700 Closes #9746 * github.com:scylladb/scylla: api: storage_service: validate_keyspace: improve exception error message api: compaction_manager: add stop_keyspace_compaction api: storage_service: expose validate_keyspace and parse_tables api: compaction_manager: stop_compaction: fix type description compaction_manager: stop_compaction: expose optional table* test: api: add basic compaction_manager test	2021-12-20 00:18:46 +02:00
Raphael S. Carvalho	49f40c8791	compaction: Implement strategy control and wire it This implements strategy control interface for both manager and tests, and wire it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:23 -03:00
Raphael S. Carvalho	e0758fded1	compaction_manager: make get_compaction_state() private internal method that should never be directly used by the outside world. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211210120806.19233-1-raphaelsc@scylladb.com>	2021-12-10 17:19:24 +03:00
Benny Halevy	fed7319698	compaction_manager: stop_compaction: expose optional table* To be used by api layer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-09 14:14:49 +02:00
Raphael S. Carvalho	6737c88045	compaction_manager: use single semaphore for serialization of maintenance compactions We have three semaphores for serialization of maintenance ops. 1) _rewrite_sstables_sem: for scrub, cleanup and upgrade. 2) _major_compaction_sem: for major 3) _custom_job_sem: for reshape, resharding and offstrategy scrub, cleanup and upgrade should be serialized with major, so rewrite sem should be merged into major one. offstrategy is also a maintenance op that should be serialized with others, to reduce compaction aggressiveness and space requirement. resharding is one-off operation, so can be merged there too. the same applies for reshape, which can take long and not serializing it with other maintenance activity can lead to exhaustion of resources and high space requirement. let's have a single semaphore to guarantee their serialization. deadlock isn't an issue because locks are always taken in same order. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211201182046.100942-1-raphaelsc@scylladb.com>	2021-12-07 12:18:07 +02:00
Benny Halevy	60ff28932c	compaction_manager: perform_sstable_scrub: get the whole compaction_type_options::scrub So we can pass additional options on top of the scrub mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:21:37 +02:00
Raphael S. Carvalho	6d750d4f59	compaction_manager: move check_for_cleanup into perform_cleanup() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 14:39:31 -03:00
Raphael S. Carvalho	760cfd93fb	compaction_manager: make consistent usage of type and name table new code in manager adopted name and type table, whereas historical code still uses name and type column family. let's make it consistent for newcomers to not get confused. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 14:39:27 -03:00
Raphael S. Carvalho	f23e0d7f2d	compaction_manager: Disconsider inactive tasks when filtering sstables After commit `1f5b17f`, overlapping can be introduced in level 1 because procedure that filters out sstables from partial runs is considering inactive tasks, so L1 sstables can be incorrectly filtered out from next compaction attempt. When L0 is merged into L1, overlapping is then introduced in L1 because old L1 sstables weren't considered in L0 -> L1 compaction. From now on, compaction_manager::get_candidates() will only consider active tasks, to make sure actual partial runs are filtered out. Fixes #9693. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>	2021-12-01 16:11:44 +02:00
Benny Halevy	957003e73f	compaction_manager: stop_compaction: wait for ongoing compactions to stop Similar to #9313, stop_compaction should also reuse the stop_ongoing_comapctions() infrastructure and wait on ongoing compactions of the given type to stop. Fixes #9695 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:11 +02:00
Benny Halevy	03e969dbef	compaction_manager: unify stop_ongoing_compactions implementations Now stop_ongoing_compactions(reason) is equivalent to to stop_ongoing_compactions(reason, nullptr, std::nullopt) so share the code of the latter for the former entry point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:07 +02:00
Benny Halevy	94011bdcca	compaction_manager: stop_ongoing_compactions: add compaction_type option And make the table optional as well, so it can be used by stop_compaction() to a particular compaction type on all tables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:07:47 +02:00
Benny Halevy	a419759835	compaction_manager: get_compactions: get a table* parameter Optionally get running compaction on the provided table. This is required for stop_ongoing_compactions on a given table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:34 +02:00
Benny Halevy	3c721eb228	compaction_manager: make stop_ongoing_compactions public So it can be used directly by table code in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:29 +02:00
Raphael S. Carvalho	80a1ebf0f3	compaction_manager: Fix race when selecting sstables for rewrite operations Rewrite operations are scrub, cleanup and upgrade. Race can happen because 'selection of sstables' and 'mark sstables as compacting' are decoupled. So any deferring point in between can lead to a parallel compaction picking the same files. After commit `2cf0c4bbf`, files are marked as compacting before rewrite starts, but it didn't take into account the commit `c84217ad` which moved retrieval of candidates to a deferring thread, before rewrite_sstables() is even called. Scrub isn't affected by this because it uses a coarse grained approach where whole operation is run with compaction disabled, which isn't good because regular compaction cannot run until its completion. From now on, selection of files and marking them as compacting will be serialized by running them with compaction disabled. Now cleanup will also retrieve sstables with compaction disabled, meaning it will no longer leave uncleaned files behind, which is important to avoid data resurrection if node regains ownership of data in uncleaned files. Fixes #8168. Refs #8155. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129133107.53011-1-raphaelsc@scylladb.com>	2021-11-29 16:27:29 +02:00
Benny Halevy	0a33762fb1	compaction_manager: add compaction_state when table is constructed With that, it is always expected that _compaction_state[cf] exists when compaction jobs are submnitted. Otherwise, throw std::out_of_range exception. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 09:40:06 +02:00
Benny Halevy	3940ffb085	compaction_manager: task: keep a reference on compaction_state And hold its gate to make sure the compaction_state outlives the task and can be used to wait on all tasks and functions using it. With that, doing access _compaction_state[cf] to acquire shared/exclusive locks but rather get to it via task->compaction_state so it can be detached from _compaction_state while task is running, if needed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-23 09:40:06 +02:00
Benny Halevy	e7ab1f8581	compaction_manager: compaction_state: use counter for compaction_disabled We'd like to use compaction_state::gate both for functions running with compaction disabled and for and tasks referring to the compaction_state so that stop_ongoing_compactions could wait on all functions referring to the state structure. This is also cleaner with respect to not relying on gate::use_count() when re-submitting regular compaction when compaction is re-enabled. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-22 22:08:42 +02:00
Benny Halevy	3268c94e72	compaction_manager: task: delete move and copy constructors We use a lw_shared_ptr<task> everywhere. So prevent moving or copying task objects. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-22 22:00:18 +02:00
Avi Kivity	d2e02ea7aa	Merge " Abstract table for compaction layer with table_state" from Raphael " table_state is being introduced for compaction subsystem, to remove table dependency from compaction interface, fix layer violations, and also make unit testing easier as table_state is an abstraction that can be implemented even with no actual table backing it. In this series, compaction strategy interfaces are switching to table_state, and eventually, we'll make compact_sstables() switch to it too. The idea is that no compaction code will directly reference a table object, but only work with the abstraction instead. So compaction subdirectory can stop including database.hh altogether, which is a great step forward. " * 'table_state_v5' of https://github.com/raphaelsc/scylla: sstable_compaction_test: switch to table_state compaction: stop including database.hh for compaction_strategy compaction: switch to table_state in estimated_pending_compactions() compaction: switch to table_state in compaction_strategy::get_major_compaction_job() compaction: switch to table_state in compaction_strategy::get_sstables_for_compaction() DTCS: reduce table dependency for task estimation LCS: reduce table dependency for task estimation table: Implement table_state compaction: make table param of get_fully_expired_sstables() const compaction_manager: make table param of has_table_ongoing_compaction() const Introduce table_state	2021-11-09 19:21:57 +02:00
Raphael S. Carvalho	ff4953206b	compaction_manager: make table param of has_table_ongoing_compaction() const Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-09 10:41:52 -03:00
Raphael S. Carvalho	33b39a2bfc	compaction: move run_with_compaction_disabled() from table into compaction_manager That's intended to fix a bad layer violation as table was given the responsibility of disabling compaction for a given table T, but that logic clearly belongs to compaction_manager instead. Additionally, gate will be used instead of counter, as former provides manager with a way to synchronize with functions running under run_with_compaction_disabled. so remove() can wait for their termination. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-08 15:12:46 -03:00
Raphael S. Carvalho	aa9b1c1fa3	compaction_manager: add struct for per table compaction state This will make it easier to pack all state data for a given table T. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-08 14:24:33 -03:00
Raphael S. Carvalho	c0047bb9c0	compaction_manager: introduce stop_ongoing_compactions() for a table New variant of stop_ongoing_compactions() which will stop all compactions for a given table. Will be reused in both remove() and by run_with_compaction_disabled() which soon be moved into the compaction_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-08 14:24:14 -03:00
Raphael S. Carvalho	0643faafd7	compaction_manager: extract "stop tasks" from stop_ongoing_compactions() into new function Procedure will be reused to stop a list of tasks Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-08 14:23:37 -03:00
Raphael S. Carvalho	8ce9cda391	compaction_manager: rename submit_major_compaction to perform_major_compaction for symmetry, let's call it perform_* as it doesn't work like submission functions which doesn't wait for result, like the one for minor compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:54:00 -03:00
Raphael S. Carvalho	63dc4e2107	compaction_manager: simplify creation of compaction_data there's no need for wrapping compaction_data in shared_ptr, also let's kill unused params in create_compaction_data to simplify its creation. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-11-04 09:35:49 -03:00
Raphael S. Carvalho	ab0217e30e	compaction: Improve overall efficiency by not diluting it with relatively inefficient jobs Compaction efficiency can be defined as how much backlog is reduced per byte read or written. We know a few facts about efficiency: 1) the more files are compacted together (the fan-in) the higher the efficiency will be, however... 2) the bigger the size difference of input files the worse the efficiency, i.e. higher write amplification. so compactions with similar-sized files are the most efficient ones, and its efficiency increases with a higher number of files. However, in order to not have bad read amplification, number of files cannot grow out of bounds. So we have to allow parallel compaction on different tiers, but to avoid "dilution" of overall efficiency, we will only allow a compaction to proceed if its efficiency is greater than or equal to the efficiency of ongoing compactions. By the time being, we'll assume that strategies don't pick candidates with wildly different sizes, so efficiency is only calculated as a function of compaction fan-in. Now when system is under heavy load, then fan-in threshold will automatically grow to guarantee that overall efficiency remains stable. Please note that fan-in is defined in number of runs. LCS compaction on higher levels will have a fan-in of 2. Under heavy load, it may happen that LCS will temporarily switch to size-tiered mode for compaction to keep up with amount of data being produced. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211103215110.135633-2-raphaelsc@scylladb.com>	2021-11-03 20:03:23 +02:00
Avi Kivity	1bac93e075	Merge "simplifications and layer violation fix for compaction manager" from Raphael "This series removes layer violation in compaction, and also simplifies compaction manager and how it interacts with compaction procedure." * 'compaction_manager_layer_violation_fix/v4' of github.com:raphaelsc/scylla: compaction: split compaction info and data for control compaction_manager: use task when stopping a given compaction type compaction: remove start_size and end_size from compaction_info compaction_manager: introduce helpers for task compaction_manager: introduce explicit ctor for task compaction: kill sstables field in compaction_info compaction: kill table pointer in compaction_info compaction: simplify procedure to stop ongoing compactions compaction: move management of compaction_info to compaction_manager compaction: move output run id from compaction_info into task	2021-10-04 13:09:31 +03:00
Raphael S. Carvalho	9067a13eac	compaction: split compaction info and data for control compaction_info must only contain info data to be exported to the outside world, whereas compaction_data will contain data for controlling compaction behavior and stats which change as compaction progresses. This separation makes the interface clearer, also allowing for future improvements like removing direct references to table in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:57 -03:00
Raphael S. Carvalho	18f703e94b	compaction_manager: introduce helpers for task Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:41 -03:00
Raphael S. Carvalho	d4572a1bb5	compaction_manager: introduce explicit ctor for task Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:37 -03:00
Raphael S. Carvalho	4ce745e0b6	compaction: simplify procedure to stop ongoing compactions Today, compactions are tracked by both _compactions and _tasks, where _compactions refer to actual ongoing compaction tasks, whereas _tasks refer to manager tasks which is responsible for spawning new compactions, retry them on failure, etc. As each task can only have one ongoing compaction at a time, let's move compaction into task, such that manager won't have to look at both when deciding to do something like stopping a task. So stopping a task becomes simpler, and duplication is naturally gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:16:21 -03:00
Raphael S. Carvalho	efed06e2e4	compaction: move management of compaction_info to compaction_manager Today, compaction is calling compaction manager to register / deregister the compaction_info created by it. This is a layer violation because manager sits one layer above compaction, so manager should be responsible for managing compaction info. From now on, compaction_info will be created and managed by compaction_manager. compaction will only have a reference to info, which it can use to update the world about compaction progress. This will allow compaction_manager to be simplified as info can be coupled with its respective task, allowing duplication to be removed and layer violation to be fixed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:15:00 -03:00
Raphael S. Carvalho	1f5b17fdc5	compaction: move output run id from compaction_info into task this run id is used to track partial runs that are being written to. let's move it from info into task, as this is not an external info, but rather one that belongs to compaction_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-30 13:13:20 -03:00
Raphael S. Carvalho	52302c3238	compaction_manager: prevent unbounded growth of pending tasks There will be unbounded growth of pending tasks if they are submitted faster than retiring them. That can potentially happen if memtables are frequently flushed too early. It was observed that this unbounded growth caused task queue violations as the queue will be filled with tons of tasks being reevaluated. By avoiding duplication in pending task list for a given table T, growth is no longer unbounded and consequently reevaluation is no longer aggressive. Refs #9331. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210930125718.41243-1-raphaelsc@scylladb.com>	2021-09-30 16:49:52 +03:00
Raphael S. Carvalho	afd45b9f49	compaction: Don't leak backlog of input sstable when compaction strategy is changed The generic backlog formula is: ALL + PARTIAL - COMPACTING With transfer_ongoing_charges() we already ignore the effect of ongoing compactions on COMPACTING as we judge them to be pointless. But ongoing compactions will run to completion, meaning that output sstables will be added to ALL anyway, in the formula above. With stop_tracking_ongoing_compactions(), input sstables are never removed from the tracker, but output sstables are added, which means we end up with duplicate backlog in the tracker. By removing this tracking mechanism, pointless ongoing compaction will be ignored as expected and the leaks will be fixed. Later, the intention is to force a stop on ongoing compactions if strategy has changed as they're pointless anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-27 14:03:28 -03:00
Avi Kivity	d7ac699a55	Revert "Merge "compaction: Update backlog tracker correctly when schema is updated" from Raphael" This reverts commit `b5cf0b4489`, reversing changes made to `e8493e20cb`. It causes segmentation faults when sstable readers are closed. Fixes #9388.	2021-09-26 18:31:49 +03:00
Avi Kivity	bf94c06fc7	Revert "Merge "simplifications and layer violation fix for compaction manager" from Raphael" This reverts commit `7127c92acc`, reversing changes made to `88480ac504`. We need to revert `b5cf0b4489` to fix #9388, and this stands in the way. Ref #9388.	2021-09-26 18:30:36 +03:00
Raphael S. Carvalho	5bf51ced14	compaction: split compaction info and data for control compaction_info must only contain info data to be exported to the outside world, whereas compaction_data will contain data for controlling compaction behavior and stats which change as compaction progresses. This separation makes the interface clearer, also allowing for future improvements like removing direct references to table in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:56:18 -03:00
Raphael S. Carvalho	2353f40f63	compaction_manager: introduce helpers for task Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:39 -03:00
Raphael S. Carvalho	6820fbf460	compaction_manager: introduce explicit ctor for task Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:36 -03:00
Raphael S. Carvalho	98f8673d4e	compaction: simplify procedure to stop ongoing compactions Today, compactions are tracked by both _compactions and _tasks, where _compactions refer to actual ongoing compaction tasks, whereas _tasks refer to manager tasks which is responsible for spawning new compactions, retry them on failure, etc. As each task can only have one ongoing compaction at a time, let's move compaction into task, such that manager won't have to look at both when deciding to do something like stopping a task. So stopping a task becomes simpler, and duplication is naturally gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:25:51 -03:00
Raphael S. Carvalho	0885376a85	compaction: move management of compaction_info to compaction_manager Today, compaction is calling compaction manager to register / deregister the compaction_info created by it. This is a layer violation because manager sits one layer above compaction, so manager should be responsible for managing compaction info. From now on, compaction_info will be created and managed by compaction_manager. compaction will only have a reference to info, which it can use to update the world about compaction progress. This will allow compaction_manager to be simplified as info can be coupled with its respective task, allowing duplication to be removed and layer violation to be fixed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:00:49 -03:00
Raphael S. Carvalho	7688d0432c	compaction: move output run id from compaction_info into task this run id is used to track partial runs that are being written to. let's move it from info into task, as this is not an external info, but rather one that belongs to compaction_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 09:56:01 -03:00
Raphael S. Carvalho	0a3049908c	compaction: Don't leak backlog of input sstable when compaction strategy is changed The generic back formula is: ALL + PARTIAL - COMPACTING With transfer_ongoing_charges() we already ignore the effect of ongoing compactions on COMPACTING as we judge them to be pointless. But ongoing compactions will run to completion, meaning that output sstables will be added to ALL anyway, in the formula above. With stop_tracking_ongoing_compactions(), input sstables are never removed from the tracker, but output sstables are added, which means we end up with duplicate backlog in the tracker. By removing this tracking mechanism, pointless ongoing compaction will be ignored as expected and the leaks will be fixed. Later, the intention is to force a stop on ongoing compactions if strategy has changed as they're pointless anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-20 15:36:05 -03:00
Raphael S. Carvalho	acba3bd3c4	sstables: give a more descriptive name to compaction_options the name compaction_options is confusing as it overlaps in meaning with compaction_descriptor. hard to reason what are the exact difference between them, without digging into the implementation. compaction_options is intended to only carry options specific to a give compaction type, like a mode for scrub, so let's rename it to compaction_type_options to make it clearer for the readers. [avi: adjust for scrub changes] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210908003934.152054-1-raphaelsc@scylladb.com>	2021-09-12 11:21:33 +03:00
Raphael S. Carvalho	6849ec46b8	compaction: Don't purge tombstones in scrub Scrub is supposed to not remove anything from input, write it as is while fixing any corruption it might have. It shouldn't have any assumption on the input. Additionally, a data shadowed by a tombstone might be in another corrupted sstable, so expired tombstones should not be purged in order to prevent data ressurection from occurring. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210904165908.135044-1-raphaelsc@scylladb.com>	2021-09-05 17:10:34 +03:00
Botond Dénes	5f6468d7d7	compaction/compaction_manager: hide perform_sstable_validation() We are folding validation compaction into scrub (at least on the interface level), so remove the validation entry point accordingly and have users go through `perform_sstable_scrub()` instead.	2021-08-05 07:36:44 +03:00

1 2

55 Commits