scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	da04fea71e	compaction: Fix key estimation per sstable to produce efficient filters The estimation assumes that size of other components are irrelevant, when estimating the number of partitions for each output sstable. The sstables are split according to the data file size, therefore size of other files are irrelevant for the estimation. With certain data models, like single-row partitions containing small values, the index could be even larger than data. For example, assume index is as large as data, then the estimation would say that 2x more sstables will be generated, and as a result, each sstable are underestimated to have 2x less keys. Fix it by only accounting size of data file. Fixes #15726. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15727	2023-10-17 11:21:11 +03:00
Aleksandra Martyniuk	f42be12f43	repair: release resources of shard_repair_task_impl Before integration with task manager the state of one shard repair was kept in repair_info. repair_info object was destroyed immediately after shard repair was finished. In an integration process repair_info's fields were moved to shard_repair_task_impl as the two served the similar purposes. Though, shard_repair_task_impl isn't immediately destoyed, but is kept in task manager for task_ttl seconds after it's complete. Thus, some of repair_info's fields have their lifetime prolonged, which makes the repair state change delayed. Release shard_repair_task_impl resources immediately after shard repair is finished. Fixes: #15505. Closes scylladb/scylladb#15506	2023-09-26 17:09:47 +03:00
Botond Dénes	d5f095d5a4	Merge 'Make interaction of compaction strategy with sstable runs more robust and efficient' from Raphael "Raph" Carvalho SSTable runs work hard to keep the disjointness invariant, therefore they're expensive to build from scratch. For every insertion, it keeps the elements sorted by their first key in order to reject insertion of element that would introduce overlapping. Additionally, a sstable run can grow to dozens of elements (or hundreds) therefore, we can also make interaction with compaction strategies more efficient by not copying them when building a list of candidates in compaction manager. And less fragile by filtering out any sstable runs that are not completely eligible for compaction. Previously, ICS had to give up on using runs managed by sstable set due to fragility of the interface (meaning runs are being built from scratch on every call to the strategy, which is very inefficient, but that had to be done for correctness), but now we can restore that. Closes scylladb/scylladb#15440 * github.com:scylladb/scylladb: compaction: Switch to strategy_control::candidates() for regular compaction tests: Prepare sstable_compaction_test for change in compaction_strategy interface compaction: Allow strategy to retrieve candidates either as sstables or runs compaction: Make get_candidates() work with frozen_sstable_run too sstables: add sstable_run::run_identifier() sstables: tag sstable_run::insert() with nodiscard sstables: Make all_sstable_runs() more efficient by exposing frozen shared runs sstables: Simplify sstable_set interface to retrieve runs	2023-09-26 14:56:05 +03:00
Aleksandra Martyniuk	d799adc536	tasks: change task_manager::task::impl::is_internal() Most of the time only the roots of tasks tree should be non internal. Change default implementation of is_internal and delete overrides consistent with it. Closes scylladb/scylladb#15353	2023-09-26 14:49:49 +03:00
Raphael S. Carvalho	8997fe0625	compaction: Switch to strategy_control::candidates() for regular compaction Now everything is prepared for the switch, let's do it. Now let's wait for ICS to enjoy the set of changes. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	02f1f24f27	compaction: Allow strategy to retrieve candidates either as sstables or runs That's needed for upcoming changes that will allow ICS to efficiently retrieve sstable runs. Next patch will remove candidates from compaction_strategy's interface to retrieve candidates using this one instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Raphael S. Carvalho	ff8510445d	compaction: Make get_candidates() work with frozen_sstable_run too This is done in preparation for ICS to retrieve candidates as sstable runs. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-25 17:18:21 -03:00
Avi Kivity	61440d20c3	Merge 'Enable incremental compaction on off-strategy' from Raphael "Raph" Carvalho Off-strategy suffers with a 100% space overhead, as it adopted a sort of all or nothing approach. Meaning all input sstables, living in maintenance set, are kept alive until they're all reshaped according to the strategy criteria. Input sstables in off-strategy are very likely to be mostly disjoint, so it can greatly benefit from incremental compaction. The incremental compaction approach is not only good for decreasing disk usage, but also memory usage (as metadata of input and output live in memory), and file desc count, which takes memory away from OS. Turns out that this approach also greatly simplifies the off-strategy impl in compaction manager, as it no longer have to maintain new unused sstables and mark them for deletion on failure, and also unlink intermediary sstables used between reshape rounds. Fixes https://github.com/scylladb/scylladb/issues/14992. Closes scylladb/scylladb#15400 * github.com:scylladb/scylladb: test: Verify that off-strategy can do incremental compaction compaction: Clear pending_replacement list when tombstone GC is disabled compaction: Enable incremental compaction on off-strategy compaction: Extend reshape type to allow for incremental compaction compaction: Move reshape_compaction in the source compaction: Enable incremental compaction only if replacer callback is engaged	2023-09-21 20:12:19 +03:00
Raphael S. Carvalho	9d92374b20	compaction: Clear pending_replacement list when tombstone GC is disabled pending_replacement list is used by incremental compaction to communicate to other ongoing compactions about exhausted sstables that must be replaced in the sstable set they keep for tombstone GC purposes. Reshape doesn't enable tombstone GC, so that list will not be cleared, which prevents incremental compaction from releasing sstables referenced by that list. It's not a problem until now where we want reshape to do incremental compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:46 -03:00
Raphael S. Carvalho	42050f13a0	compaction: Enable incremental compaction on off-strategy Off-strategy suffers with a 100% space overhead, as it adopted a sort of all or nothing approach. Meaning all input sstables, living in maintenance set, are kept alive until they're all reshaped according to the strategy criteria. Input sstables in off-strategy are very likely to mostly disjoint, so it can greatly benefit from incremental compaction. The incremental compaction approach is not only good for decreasing disk usage, but also memory usage (as metadata of input and output live in memory), and file desc count, which takes memory away from OS. Turns out that this approach also greatly simplifies the off-strategy impl in compaction manager, as it no longer have to maintain new unused sstables and mark them for deletion on failure, and also unlink intermediary sstables used between reshape rounds. Fixes #14992. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:46 -03:00
Raphael S. Carvalho	db9ce9f35a	compaction: Extend reshape type to allow for incremental compaction That's done by inheriting regular_compaction, which implement incremental compaction. But reshape still implements its own methods for creating writer and reader. One reason is that reshape is not driven by controller, as input sstables to it live in maintenance set. Another reason is customization of things like sstable origin, etc. stop_sstable_writer() is extended because that's used by regular_compaction to check for possibility of removing exhausted sstables earlier whenever an output sstable is sealed. Also, incremental compaction will be unconditionally enabled for ICS/LCS during off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:12 -03:00
Raphael S. Carvalho	33a0f42304	compaction: Move reshape_compaction in the source That's in preparation to next change that will make reshape inherit from regular compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:11:13 -03:00
Botond Dénes	a56a4b6226	Merge 'compaction_backlog_tracker: do not allow moving registered trackers' from Benny Halevy Currently, the moved-object's manager pointer is moved into the constructed object, but without fixing the registration to point to the moved-to object, causing #15248. Although we could properly move the registration from the moved-from object to the moved-to one, it is simpler to just disallow moving a registered tracker, since it's not needed anywhere. This way we just don't need to mess with the trackers' registration. The move-assignment operator has a similar problem, therefore it is deleted in this series, and the function is renamed to `transfer_backlog` that just doesn't deal with the moved-from registration. This is safe since it's only used internally by the compaction manager. Fixes #15248 Closes scylladb/scylladb#15445 * github.com:scylladb/scylladb: compaction_state: store backlog_track in std::optional compaction_backlog_tracker: do not allow moving registered trackers	2023-09-20 16:41:10 +03:00
Botond Dénes	45dfce6632	Merge 'compaction: change behaviour of compaction task executors' from Aleksandra Martyniuk Compaction tasks executors serve two different purposes - as compaction manager related entity they execute compaction operation and as task manager related entity they track compaction status. When one role depends on the other, as it currently is for compaction_task_impl::done() and compaction_task_executor::compaction_done(), requirements of both roles need to be satisfied at the same time in each corner case. Such complexity leads to bugs. To prevent it, compaction_task_impl::done() of executors no longer depends on compaction_task_executor::compaction_done(). Fixes: #14912. Closes scylladb/scylladb#15140 * github.com:scylladb/scylladb: compaction: warn about compaction_done() compaction: do not run stopped compaction compaction: modify lowest compaction tasks' run method compaction: pass do_throw_if_stopping to compaction_task_executor	2023-09-19 15:15:14 +03:00
Benny Halevy	7ca91d719c	compaction_state: store backlog_track in std::optional So that replacing it will destroy the previous tracker and unregister it before assigning the new one and then registering it. This is safer than assiging it in place. With that, the move assignment operator is not longer used and can be deleted. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-19 13:59:54 +03:00
Benny Halevy	4ad4b632b8	compaction_backlog_tracker: do not allow moving registered trackers Currently, the moved-object's manager pointer is moved into the constructed object, but without fixing the registration to point to the moved-to object, causing #15248. Although we could properly move the registration from the moved-from object to the moved-to one, it is simpler to just disallow moving a registered tracker, since it's not needed anywhere. This way we just don't need to mess with the trackers' registration. With that in mind, when move-assigning a compaction_backlog_tracker the existing tracker can remain registered. Fixes scylladb/scylladb#15248 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-19 13:24:36 +03:00
Raphael S. Carvalho	6cc85068d7	compaction: Enable incremental compaction only if replacer callback is engaged That's needed for enabling incremental compaction to operate, and needed for subsequent work that enables incremental compaction for off-strategy, which in turn uses reshape compaction type. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-18 17:57:11 -03:00
Aleksandra Martyniuk	53ecc29cd7	compaction: unify exception messages Use fmt::format in exception messages in all methods validating compaction strategies.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	ac08b57555	compaction: cql3: validate options in check_restricted_table_properties Check whether valid compaction strategy options are set for the given strategy type in check_restricted_table_properties.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	44744d6229	compaction: validate options used in different compaction strategies For each compaction strategy, validate whether options values are valid.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	0ed39af221	compaction: validate common compaction strategy options Add compaction_strategy_impl::validate_options to validate common compaction strategy options.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	a2e6081984	compaction: split compaction_strategy_impl constructor Split compaction_strategy_impl constructor into methods that will be reused for validation. Add additional checks providing that options' values are legal.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	5c72bcd40e	compaction: validate size_tiered_compaction_strategy specific options	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	7e5b6ea09a	compaction: validate time_window_compaction_strategy specific options	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	84fd90e472	compaction: add method to validate min and max threshold Add compaction_strategy_impl::validate_min_max_threshold method that will be used to validate min and max threshold values for different compaction methods.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	50c1bb555b	compaction: split size_tiered_compaction_strategy_options constructor Split size_tiered_compaction_strategy_options constructor into methods that will be reused for validation. Add additional checks providing that options' values are legal.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	702c19f941	compaction: make compaction strategy keys static constexpr	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	e3d8f71a88	compaction: use helpers in validate_* functions To be consistent with other compaction_strategy_options, time_window_compaction_strategy_options uses compaction_strategy_impl::get_value and cql3::statements::property_definitions::to_long helpers for parsing.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	c8c3c0e6a6	compaction: split time_window_compaction_strategy_options construtor Split time_window_compaction_strategy_options constructor into functions that will be reused for validation.	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	a01dd1351e	compaction: add validate method to compaction_strategy_options Add temporarily empty validate method to compaction_strategy_options. The method will validate the options and help determining whether only the allowed options were set.	2023-09-13 16:59:40 +02:00
Benny Halevy	e5cf6f0897	time_window_compaction_strategy_options: make copy and move-able Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-13 16:59:40 +02:00
Benny Halevy	c9475d6fe0	size_tiered_compaction_strategy_options: make copy and move-able Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-13 16:59:40 +02:00
Aleksandra Martyniuk	932f39e37c	compaction: warn about compaction_done() compaction_done() returns ready future before compaction_task_executor::run_compaction() even though the compaction did not start. Make compaction_done() private and add a comment to warn against incorrect usage.	2023-09-09 11:19:11 +02:00
Aleksandra Martyniuk	59b7a45f73	compaction: do not run stopped compaction Before compaction_task_executor::do_run is called, the executor can be already aborted. Check if compaction was stopped and set _compaction_done to exceptional future.	2023-09-09 11:19:11 +02:00
Aleksandra Martyniuk	515b8d4890	compaction: modify lowest compaction tasks' run method For compaction_task_executors, unlike for all other task manager tasks, run method does not embrace operations performed in a scope of a task, but only waits until shared_future connected with the operations is resolved. Apart from breaking task manager task conventions, such a run method must consider all corner cases, not to break task manager or compaction manager functionality. To fix existing and prevent further bugs related to task manager and compaction manager coexistence, call perform_task inside run method and wait for it in a standard way. Executors that are not going to be reflected in task manager run call perform_task the old way.	2023-09-09 11:19:11 +02:00
Aleksandra Martyniuk	832df38d26	compaction: pass do_throw_if_stopping to compaction_task_executor As a preparation for further changes, keep do_throw_if_stopping flag as a member of compaction_task_executor.	2023-09-09 11:19:11 +02:00
Benny Halevy	cfecb68245	compaction_manager: stop: close compaction_state:s gates Make sure the compaction_state:s are idle before they are destroyed. Although all tasks are stopped in stop_ongoing_compactions, make sure there is fiber holding the compaction_state gate. compaction_manager::remove now needs to close the compaction_state gate and to stop_ongoing_compactions only if the gate is not closed yet. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-05 09:17:25 +03:00
Benny Halevy	96055414c7	compaction_manager: gracefully handle gate close Check if the compaction_state gate is closed along with _state != state::enabled and return early in this case. At this point entering the gate is guaranteed to succeed. So enter the gate before calling `perform_compaction` keeping the std::optional<gate_holder> throughout the compaction task. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-09-05 09:17:25 +03:00
Aleksandra Martyniuk	5e31ca7d20	tasks: api: show tasks' scopes To make manual analysis of task manager tasks easier, task_status and task_stats contain operation scope (e.g. shard, table). Closes #15172	2023-08-29 11:32:16 +03:00
Botond Dénes	1609c76d62	tools/scylla-sstable: scrub: don't qurantine sstables after validate Scylla sstable promises to never mutate its input sstables. This promise was broken by `scylla sstable scrub --scrub-mode=validate`, because validate moves invalid input sstables into qurantine. This is unexpected and caused occasional failures in the scrub tests in test_tools.py. Fix by propagating a flag down to `scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether validate should qurantine invalid sstables, then set this flag to false in `scylla-sstable.cc`. The existing test for validate-mode scrub is ammended to check that the sstable is not mutated. The test now fails before the fix and passes afterwards. Fixes: #14309 Closes #15139	2023-08-23 21:53:12 +03:00
Aleksandra Martyniuk	e0ce711e4f	compaction: do not swallow compaction_stopped_exception for reshape Loop in shard_reshaping_compaction_task_impl::run relies on whether sstables::compaction_stopped_exception is thrown from run_custom_job. The exception is swallowed for each type of compaction in compaction_manager::perform_task. Rethrow an exception in perfrom task for reshape compaction. Fixes: #15058. Closes #15067	2023-08-21 12:41:55 +03:00
Tomasz Grabiec	bd8bb5d4b1	Merge 'Wire tablet into compaction group' from Raphael "Raph" Carvalho Compaction group is the data plane for tablets, so this integration allows each tablet to have its own storage (memtable + sstables). A crucial step for dynamic tablets, where each tablet can be worked on independently. There are still some inefficiencies to be worked on, but as it is, it already unlocks further development. ``` INFO 2023-07-27 22:43:38,331 [shard 0] init - loading tablet metadata INFO 2023-07-27 22:43:38,333 [shard 0] init - loading non-system sstables INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 0 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 2 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 4 present for ks.cf INFO 2023-07-27 22:43:38,354 [shard 0] table - Tablet with id 6 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 1 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 3 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 5 present for ks.cf INFO 2023-07-27 22:43:38,428 [shard 1] table - Tablet with id 7 present for ks.cf ``` Closes #14863 * github.com:scylladb/scylladb: Kill scylla option to configure number of compaction groups replica: Wire tablet into compaction group token_metadata: Add this_host_id to topology config replica: Switch to chunked_vector for storing compaction groups replica: Generate group_id for compaction_group on demand	2023-08-18 15:17:17 +02:00
Aleksandra Martyniuk	e9d94894f1	compaction: release resources of compaction executors Before compaction task executors started inheriting from compaction_task_impl, they were destructed immediately after compaction finished. Destructors of executors and their fields performed actions that affected global structures and statistics and had impact on compaction process. Currently, task executors are kept in memory much longer, as their are tracked by task manager. Thus, destructors are not called just after the compaction, which results in compaction stats not being updated, which causes e.g. infinite cleanup loop. Add release_resources() method which is called at the end of compaction process and does what destructors used to. Fixes: #14966. Fixes: #15030. Closes #15005	2023-08-16 15:51:17 +03:00
Avi Kivity	e8f3b073c3	Merge 'Maintain sstable state explicitly' from Pavel Emelyanov An sstable can be in one of several states -- normal, quarantined, staging, uploading. Right now this "state" is hard-wired into sstable's path, e.g. quarantined sstable would sit in e.g. /var/lib/data/ks-cf-012345/quarantine/ directory. Respectively, there's a bunch of directory names constexprs in sstables.hh defining each "state". Other than being confusing, this approach doesn't work well with S3 backend. Additionally, there's snapshot subdir that adds to the confusion, because snapshot is not quite a state. This PR converts "state" from constexpr char* directories names into a enum class and patches the sstable creation, opening and state-changing API to use that enum instead of parsing the path. refs: #13017 refs: #12707 Closes #14152 * github.com:scylladb/scylladb: sstable/storage: Make filesystem storage with initial state sstable: Maintain state sstable: Make .change_state() accept state, not directory string sstable: Construct it with state sstables_manager: Remove state-less make_sstable() table: Make sstables with required state test: Make sstables with upload state in some cases tools: Make sstables with normal state table: Open-code sstables making streaming helpers tests: Make sstables with normal state by default sstable_directory: Make sstable with required state sstable_directory: Construct with state distributed_loader: Make sstable with desired state when populating distributed_loader: Make sstable with upload state when uploading sstable: Introduce state enum sstable_directory: Merge verify and g.c. calls distributed_loader: Merge verify and gc invocations sstable/filesystem: Put underscores to dir members sstable/s3: Mark make_s3_object_name() const sstable: Remove filename(dir, ...) method	2023-08-15 17:44:06 +03:00
Raphael S. Carvalho	2590eec352	replica: Generate group_id for compaction_group on demand There are a few good reasons for this change. 1) compaction_group doesn't have to be aware of # of groups 2) thinking forward to dynamic tablets, # of groups cannot be statically embedded in group id, otherwise it gets stale. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-15 09:04:05 -03:00
Benny Halevy	9f77a32805	compaction_manager: run_offstrategy_compaction: retrieve owned_ranges from compaction_state perform_offstrategy is called from try_perform_cleanup when there are sstables in the maintenance set that require cleanup. The input sstables are inserted into the compaction_state `sstables_requiring_cleanup` and `try_perform_cleanup` expects offstrategy compaction to clean them up along with reshape compaction. Otherwise, the maintenance sstables that require cleanup are not cleaned up by cleanup compaction, since the reshape output sstable(s) are not analyzed again after reshape compaction, where that would insert the output sstable(s) into `sstables_requiring_cleanup` and trigger their cleanup in the subsequent cleanup compaction. The latter method is viable too, but it is less effficient since we can do reshape+cleanup in one pass, vs. reshape first and cleanup later. Fixes scylladb/scylladb#15041 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #15043	2023-08-14 18:37:34 +03:00
Aleksandra Martyniuk	7a28cc60ec	compaction: ignore future explicitly discard_result ignores only successful futures. Thus, if perform_compaction<regular_compaction_task_executor> call fails, a failure is considered abandoned, causing tests to fail. Explicitly ignore failed future. Fixes: #14971. Closes #15000	2023-08-14 16:41:15 +03:00
Pavel Emelyanov	b06917f235	sstable: Make .change_state() accept state, not directory string Pretty cosmetic change, but it will allow S3 to finally support moving sstables between states (after this patch it still doesn't) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-14 15:40:44 +03:00
Aleksandra Martyniuk	9ec43fd3a7	compaction: update comment in compaction_manager::submit Closes #15023	2023-08-14 09:34:56 +03:00
Aleksandra Martyniuk	db932c7106	compaction: hold gate immediately after task executor is created If make_task call in compaction_manager::perform_compaction yields, compaction_task_executor::_compaction_state may be gone and gate won't be held. Hold gate immediately after compaction_task_executor is created. Add comment not to call prepare_task without preparation. Refs: #14971. Fixes: #14977. Closes #14999	2023-08-11 13:56:38 +02:00

1 2 3 4 5 ...

717 Commits