scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-26 11:30:36 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	a319085870	compaction: Check for key presence in memtable when calculating max purgeable timestamp It was observed that some use cases might append old data constantly to memtable, blocking GC of expired tombstones. That's because timestamp of memtable is unconditionally used for calculating max purgeable, even when the memtable doesn't contain the key of the tombstone we're trying to GC. The idea is to treat memtable as we treat L0 sstables, i.e. it will only prevent GC if it contains data that is possibly shadowed by the expired tombstone (after checking for key presence and timestamp). Memtable will usually have a small subset of keys in largest tier, so after this change, a large fraction of keys containing expired tombstones can be GCed when memtable contains old data. Fixes #17599. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `38699f6c3d`) Closes scylladb/scylladb#19551	2024-07-10 07:30:40 +03:00
Raphael S. Carvalho	67be26ff7d	compaction: Reduce twcs off-strategy space overhead to 10% of free space TWCS off-strategy suffers with 100% space overhead, so a big TWCS table can cause scylla to run out of disk space during node ops. To not penalize TWCS tables, that take a small percentage of disk, with increased write ampl, TWCS off-strategy will be restricted to 10% of free disk space. Then small tables can still compact all disjoint sstables in a single round. Fixes #16514. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `ace4e5111e`)	2024-06-29 11:29:59 -03:00
Raphael S. Carvalho	97893a4f6d	compaction: wire storage free space into reshape procedure After this, TWCS reshape procedure can be changed to limit job to 10% of available space. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `0ce8ee03f1`)	2024-06-29 11:29:59 -03:00
Lakshmi Narayanan Sreethar	4b0c60cdc3	compaction: improve partition estimates for garbage collected sstables When a compaction strategy uses garbage collected sstables to track expired tombstones, do not use complete partition estimates for them, instead, use a fraction of it based on the droppable tombstone ratio estimate. Fixes #18283 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#18465 (cherry picked from commit `d39adf6438`) Closes scylladb/scylladb#18656	2024-05-14 07:53:07 +03:00
Raphael S. Carvalho	db1c8e8754	Fix potential data resurrection when another compaction type does cleanup work Since commit `f1bbf70`, many compaction types can do cleanup work, but turns out we forgot to invalidate cache on their completion. So if a node regains ownership of token that had partition deleted in its previous owner (and tombstone is already gone), data can be resurrected. Tablet is not affected, as it explicitly invalidates cache during migration cleanup stage. Scylla 5.4 is affected. Fixes #17501. Fixes #17452. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17502 (cherry picked from commit `f07c233ad5`)	2024-03-13 14:10:05 +02:00
Botond Dénes	6c625e8cd3	Merge '[Backport 5.4] tasks: compaction: drop regular compaction tasks after they are finished' from Aleksandra Martyniuk Make compaction tasks internal. Drop all internal tasks without parents immediately after they are done. Fixes: https://github.com/scylladb/scylladb/issues/16735 Refs: https://github.com/scylladb/scylladb/issues/16694. Closes scylladb/scylladb#16798 * github.com:scylladb/scylladb: compaction: make regular compaction tasks internal tasks: don't keep internal root tasks after they complete	2024-01-17 09:34:08 +02:00
Aleksandra Martyniuk	081a36e34f	compaction: make regular compaction tasks internal Regular compaction tasks are internal. Adjust test_compaction_task accordingly: modify test_regular_compaction_task, delete test_running_compaction_task_abort (relying on regular compaction) which checks are already achived by test_not_created_compaction_task_abort. Rename the latter. (cherry picked from commit `6b87778ef2`)	2024-01-16 11:15:41 +01:00
Benny Halevy	3ff8051532	api: add /storage_service/compact For major compacting all tables in the database. The advantage of this api is that `commitlog->force_new_active_segment` happens only once in `database::flush_all_tables` rather than once per keyspace (when `nodetool compact` translates to a sequence of `/storage_service/keyspace_compaction` calls). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b12b142232`)	2024-01-12 15:57:39 +08:00
Benny Halevy	5d88e997ef	compaction_manager: flush_all_tables before major compaction Major compaction already flushes each table to make sure it considers any mutations that are present in the memtable for the purpose of tombstone purging. See `64ec1c6ec6` However, tombstone purging may be inhibited by data in commitlog segments based on `gc_time_min` in the `tombstone_gc_state` (See `f42eb4d1ce`). Flushing all sstables in the database release all references to commitlog segments and there it maximizes the potential for tombstone purging, which is typically the reason for running major compaction. However, flushing all tables too frequently might result in tiny sstables. Since when flushing all keyspaces using `nodetool flush` the `force_keyspace_compaction` api is invoked for keyspace successively, we need a mechanism to prevent too frequent flushes by major compaction. Hence a `compaction_flush_all_tables_before_major_seconds` interval configuration option is added (defaults to 24 hours). In the case that not all tables are flushed prior to major compaction, we revert to the old behavior of flushing each table in the keyspace before major-compacting it. Fixes scylladb/scylladb#15777 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `66ba983fe0`)	2024-01-12 15:57:39 +08:00
Benny Halevy	993e6997c0	api: compaction: add flush_memtables option When flushing is done externally, e.g. by running `nodetool flush` prior to `nodetool compact`, flush_memtables=false can be passed to skip flushing of tables right before they are major-compacted. This is useful to prevent creation of small sstables due to excessive memtable flushing. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `1fd85bd37b`)	2024-01-12 15:57:39 +08:00
Nadav Har'El	ff596f9d9d	Merge 'Fix partition estimation with TWCS tables during streaming' from Raphael "Raph" Carvalho TWCS tables require partition estimation adjustment as incoming streaming data can be segregated into the time windows. Turns out we had two problems in this area that leads to suboptimal bloom filters. 1) With off-strategy enabled, data segregation is postponed, but partition estimation was adjusted as if segregation wasn't postponed. Solved by not adjusting estimation if segregation is postponed. 2) With off-strategy disabled, data segregation is not postponed, but streaming didn't feed any metadata into partition estimation procedure, meaning it had to assume the max windows input data can be segregated into (100). Solved by using schema's default TTL for a precise estimation of window count. For the future, we want to dynamically size filters (see https://github.com/scylladb/scylladb/issues/2024), especially for TWCS that might have SSTables that are left uncompacted until they're fully expired, meaning that the system won't heal itself in a timely manner through compaction on a SSTable that had partition estimation really wrong. Fixes https://github.com/scylladb/scylladb/issues/15704. Closes scylladb/scylladb#15938 * github.com:scylladb/scylladb: streaming: Improve partition estimation with TWCS streaming: Don't adjust partition estimate if segregation is postponed (cherry picked from commit `64d1d5cf62`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#16671	2024-01-08 09:06:06 +02:00
Benny Halevy	d8586fd101	compaction_manager: perform_cleanup: ignore condition_variable_timed_out The polling loop was intended to ignore `condition_variable_timed_out` and check for progress using a longer `max_idle_duration` timeout in the loop. Fixes #15669 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#15671 (cherry picked from commit `68a7bbe582`)	2024-01-04 12:16:56 +02:00
Raphael S. Carvalho	7288bdfe09	sstables: Fix update of tombstone GC settings to have immediate effect After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode, gc period if applicable, etc) is used to estimate the amount of droppable data (or determine full expiration = max_deletion_time < gc_before). It could happen that the user switched from timeout to repair mode, but sstables will still use the old mode, despite the user asked for a new one. Another example is when you play with value of grace period, to prevent data resurrection if repair won't be able to run in a timely manner. The problem persists until all sstables using old GC settings are recompacted or node is restarted. To fix this, we have to feed latest schema into sstable procedures used for expiration purposes. Fixes #15643. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15746 (cherry picked from commit `fded314e46`)	2023-12-18 14:14:02 +02:00
Raphael S. Carvalho	da04fea71e	compaction: Fix key estimation per sstable to produce efficient filters The estimation assumes that size of other components are irrelevant, when estimating the number of partitions for each output sstable. The sstables are split according to the data file size, therefore size of other files are irrelevant for the estimation. With certain data models, like single-row partitions containing small values, the index could be even larger than data. For example, assume index is as large as data, then the estimation would say that 2x more sstables will be generated, and as a result, each sstable are underestimated to have 2x less keys. Fix it by only accounting size of data file. Fixes #15726. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15727	2023-10-17 11:21:11 +03:00
Aleksandra Martyniuk	f42be12f43	repair: release resources of shard_repair_task_impl Before integration with task manager the state of one shard repair was kept in repair_info. repair_info object was destroyed immediately after shard repair was finished. In an integration process repair_info's fields were moved to shard_repair_task_impl as the two served the similar purposes. Though, shard_repair_task_impl isn't immediately destoyed, but is kept in task manager for task_ttl seconds after it's complete. Thus, some of repair_info's fields have their lifetime prolonged, which makes the repair state change delayed. Release shard_repair_task_impl resources immediately after shard repair is finished. Fixes: #15505. Closes scylladb/scylladb#15506	2023-09-26 17:09:47 +03:00
Botond Dénes	d5f095d5a4	Merge 'Make interaction of compaction strategy with sstable runs more robust and efficient' from Raphael "Raph" Carvalho SSTable runs work hard to keep the disjointness invariant, therefore they're expensive to build from scratch. For every insertion, it keeps the elements sorted by their first key in order to reject insertion of element that would introduce overlapping. Additionally, a sstable run can grow to dozens of elements (or hundreds) therefore, we can also make interaction with compaction strategies more efficient by not copying them when building a list of candidates in compaction manager. And less fragile by filtering out any sstable runs that are not completely eligible for compaction. Previously, ICS had to give up on using runs managed by sstable set due to fragility of the interface (meaning runs are being built from scratch on every call to the strategy, which is very inefficient, but that had to be done for correctness), but now we can restore that. Closes scylladb/scylladb#15440 * github.com:scylladb/scylladb: compaction: Switch to strategy_control::candidates() for regular compaction tests: Prepare sstable_compaction_test for change in compaction_strategy interface compaction: Allow strategy to retrieve candidates either as sstables or runs compaction: Make get_candidates() work with frozen_sstable_run too sstables: add sstable_run::run_identifier() sstables: tag sstable_run::insert() with nodiscard sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs sstables: Simplify sstable_set interface to retrieve runs	2023-09-26 14:56:05 +03:00
Aleksandra Martyniuk	d799adc536	tasks: change task_manager::task::impl::is_internal() Most of the time only the roots of tasks tree should be non internal. Change default implementation of is_internal and delete overrides consistent with it. Closes scylladb/scylladb#15353	2023-09-26 14:49:49 +03:00
Raphael S. Carvalho	8997fe0625	compaction: Switch to strategy_control::candidates() for regular compaction Now everything is prepared for the switch, let's do it. Now let's wait for ICS to enjoy the set of changes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	02f1f24f27	compaction: Allow strategy to retrieve candidates either as sstables or runs That's needed for upcoming changes that will allow ICS to efficiently retrieve sstable runs. Next patch will remove candidates from compaction_strategy's interface to retrieve candidates using this one instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	ff8510445d	compaction: Make get_candidates() work with frozen_sstable_run too This is done in preparation for ICS to retrieve candidates as sstable runs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Avi Kivity	61440d20c3	Merge 'Enable incremental compaction on off-strategy' from Raphael "Raph" Carvalho Off-strategy suffers with a 100% space overhead, as it adopted a sort of all or nothing approach. Meaning all input sstables, living in maintenance set, are kept alive until they're all reshaped according to the strategy criteria. Input sstables in off-strategy are very likely to be mostly disjoint, so it can greatly benefit from incremental compaction. The incremental compaction approach is not only good for decreasing disk usage, but also memory usage (as metadata of input and output live in memory), and file desc count, which takes memory away from OS. Turns out that this approach also greatly simplifies the off-strategy impl in compaction manager, as it no longer have to maintain new unused sstables and mark them for deletion on failure, and also unlink intermediary sstables used between reshape rounds. Fixes https://github.com/scylladb/scylladb/issues/14992. Closes scylladb/scylladb#15400 * github.com:scylladb/scylladb: test: Verify that off-strategy can do incremental compaction compaction: Clear pending_replacement list when tombstone GC is disabled compaction: Enable incremental compaction on off-strategy compaction: Extend reshape type to allow for incremental compaction compaction: Move reshape_compaction in the source compaction: Enable incremental compaction only if replacer callback is engaged	2023-09-21 20:12:19 +03:00
Raphael S. Carvalho	9d92374b20	compaction: Clear pending_replacement list when tombstone GC is disabled pending_replacement list is used by incremental compaction to communicate to other ongoing compactions about exhausted sstables that must be replaced in the sstable set they keep for tombstone GC purposes. Reshape doesn't enable tombstone GC, so that list will not be cleared, which prevents incremental compaction from releasing sstables referenced by that list. It's not a problem until now where we want reshape to do incremental compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:46 -03:00
Raphael S. Carvalho	42050f13a0	compaction: Enable incremental compaction on off-strategy Off-strategy suffers with a 100% space overhead, as it adopted a sort of all or nothing approach. Meaning all input sstables, living in maintenance set, are kept alive until they're all reshaped according to the strategy criteria. Input sstables in off-strategy are very likely to mostly disjoint, so it can greatly benefit from incremental compaction. The incremental compaction approach is not only good for decreasing disk usage, but also memory usage (as metadata of input and output live in memory), and file desc count, which takes memory away from OS. Turns out that this approach also greatly simplifies the off-strategy impl in compaction manager, as it no longer have to maintain new unused sstables and mark them for deletion on failure, and also unlink intermediary sstables used between reshape rounds. Fixes #14992. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:46 -03:00
Raphael S. Carvalho	db9ce9f35a	compaction: Extend reshape type to allow for incremental compaction That's done by inheriting regular_compaction, which implement incremental compaction. But reshape still implements its own methods for creating writer and reader. One reason is that reshape is not driven by controller, as input sstables to it live in maintenance set. Another reason is customization of things like sstable origin, etc. stop_sstable_writer() is extended because that's used by regular_compaction to check for possibility of removing exhausted sstables earlier whenever an output sstable is sealed. Also, incremental compaction will be unconditionally enabled for ICS/LCS during off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:12 -03:00
Raphael S. Carvalho	33a0f42304	compaction: Move reshape_compaction in the source That's in preparation to next change that will make reshape inherit from regular compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:11:13 -03:00
Botond Dénes	a56a4b6226	Merge 'compaction_backlog_tracker: do not allow moving registered trackers' from Benny Halevy Currently, the moved-object's manager pointer is moved into the constructed object, but without fixing the registration to point to the moved-to object, causing #15248. Although we could properly move the registration from the moved-from object to the moved-to one, it is simpler to just disallow moving a registered tracker, since it's not needed anywhere. This way we just don't need to mess with the trackers' registration. The move-assignment operator has a similar problem, therefore it is deleted in this series, and the function is renamed to `transfer_backlog` that just doesn't deal with the moved-from registration. This is safe since it's only used internally by the compaction manager. Fixes #15248 Closes scylladb/scylladb#15445 * github.com:scylladb/scylladb: compaction_state: store backlog_track in std::optional compaction_backlog_tracker: do not allow moving registered trackers	2023-09-20 16:41:10 +03:00
Botond Dénes	45dfce6632	Merge 'compaction: change behaviour of compaction task executors' from Aleksandra Martyniuk Compaction tasks executors serve two different purposes - as compaction manager related entity they execute compaction operation and as task manager related entity they track compaction status. When one role depends on the other, as it currently is for compaction_task_impl::done() and compaction_task_executor::compaction_done(), requirements of both roles need to be satisfied at the same time in each corner case. Such complexity leads to bugs. To prevent it, compaction_task_impl::done() of executors no longer depends on compaction_task_executor::compaction_done(). Fixes: #14912. Closes scylladb/scylladb#15140 * github.com:scylladb/scylladb: compaction: warn about compaction_done() compaction: do not run stopped compaction compaction: modify lowest compaction tasks' run method compaction: pass do_throw_if_stopping to compaction_task_executor	2023-09-19 15:15:14 +03:00
Benny Halevy	7ca91d719c	compaction_state: store backlog_track in std::optional So that replacing it will destroy the previous tracker and unregister it before assigning the new one and then registering it. This is safer than assiging it in place. With that, the move assignment operator is not longer used and can be deleted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-19 13:59:54 +03:00
Benny Halevy	4ad4b632b8	compaction_backlog_tracker: do not allow moving registered trackers Currently, the moved-object's manager pointer is moved into the constructed object, but without fixing the registration to point to the moved-to object, causing #15248. Although we could properly move the registration from the moved-from object to the moved-to one, it is simpler to just disallow moving a registered tracker, since it's not needed anywhere. This way we just don't need to mess with the trackers' registration. With that in mind, when move-assigning a compaction_backlog_tracker the existing tracker can remain registered. Fixes scylladb/scylladb#15248 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-19 13:24:36 +03:00
Raphael S. Carvalho	6cc85068d7	compaction: Enable incremental compaction only if replacer callback is engaged That's needed for enabling incremental compaction to operate, and needed for subsequent work that enables incremental compaction for off-strategy, which in turn uses reshape compaction type. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-18 17:57:11 -03:00
Aleksandra Martyniuk	53ecc29cd7	compaction: unify exception messages Use fmt::format in exception messages in all methods validating compaction strategies.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	ac08b57555	compaction: cql3: validate options in check_restricted_table_properties Check whether valid compaction strategy options are set for the given strategy type in check_restricted_table_properties.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	44744d6229	compaction: validate options used in different compaction strategies For each compaction strategy, validate whether options values are valid.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	0ed39af221	compaction: validate common compaction strategy options Add compaction_strategy_impl::validate_options to validate common compaction strategy options.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	a2e6081984	compaction: split compaction_strategy_impl constructor Split compaction_strategy_impl constructor into methods that will be reused for validation. Add additional checks providing that options' values are legal.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	5c72bcd40e	compaction: validate size_tiered_compaction_strategy specific options	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	7e5b6ea09a	compaction: validate time_window_compaction_strategy specific options	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	84fd90e472	compaction: add method to validate min and max threshold Add compaction_strategy_impl::validate_min_max_threshold method that will be used to validate min and max threshold values for different compaction methods.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	50c1bb555b	compaction: split size_tiered_compaction_strategy_options constructor Split size_tiered_compaction_strategy_options constructor into methods that will be reused for validation. Add additional checks providing that options' values are legal.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	702c19f941	compaction: make compaction strategy keys static constexpr	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	e3d8f71a88	compaction: use helpers in validate_* functions To be consistent with other compaction_strategy_options, time_window_compaction_strategy_options uses compaction_strategy_impl::get_value and cql3::statements::property_definitions::to_long helpers for parsing.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	c8c3c0e6a6	compaction: split time_window_compaction_strategy_options construtor Split time_window_compaction_strategy_options constructor into functions that will be reused for validation.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	a01dd1351e	compaction: add validate method to compaction_strategy_options Add temporarily empty validate method to compaction_strategy_options. The method will validate the options and help determining whether only the allowed options were set.	2023-09-13 16:59:40 +02:00
Benny Halevy	e5cf6f0897	time_window_compaction_strategy_options: make copy and move-able Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-13 16:59:40 +02:00
Benny Halevy	c9475d6fe0	size_tiered_compaction_strategy_options: make copy and move-able Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	932f39e37c	compaction: warn about compaction_done() compaction_done() returns ready future before compaction_task_executor::run_compaction() even though the compaction did not start. Make compaction_done() private and add a comment to warn against incorrect usage.	2023-09-09 11:19:11 +02:00
Aleksandra Martyniuk	59b7a45f73	compaction: do not run stopped compaction Before compaction_task_executor::do_run is called, the executor can be already aborted. Check if compaction was stopped and set _compaction_done to exceptional future.	2023-09-09 11:19:11 +02:00
Aleksandra Martyniuk	515b8d4890	compaction: modify lowest compaction tasks' run method For compaction_task_executors, unlike for all other task manager tasks, run method does not embrace operations performed in a scope of a task, but only waits until shared_future connected with the operations is resolved. Apart from breaking task manager task conventions, such a run method must consider all corner cases, not to break task manager or compaction manager functionality. To fix existing and prevent further bugs related to task manager and compaction manager coexistence, call perform_task inside run method and wait for it in a standard way. Executors that are not going to be reflected in task manager run call perform_task the old way.	2023-09-09 11:19:11 +02:00
Aleksandra Martyniuk	832df38d26	compaction: pass do_throw_if_stopping to compaction_task_executor As a preparation for further changes, keep do_throw_if_stopping flag as a member of compaction_task_executor.	2023-09-09 11:19:11 +02:00
Benny Halevy	cfecb68245	compaction_manager: stop: close compaction_state:s gates Make sure the compaction_state:s are idle before they are destroyed. Although all tasks are stopped in stop_ongoing_compactions, make sure there is fiber holding the compaction_state gate. compaction_manager::remove now needs to close the compaction_state gate and to stop_ongoing_compactions only if the gate is not closed yet. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-05 09:17:25 +03:00

1 2 3 4 5 ...

730 Commits