scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 21:55:50 +00:00

Author	SHA1	Message	Date
Kefu Chai	1b7bde2e9e	compaction_manager: use range in compacting_sstable_registration simpler than the "begin, end" iterator pair. and also tighten the type constraints, now require the value type to be sstables::shared_sstable. this matches what we are expecting in the implementation. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14678	2023-07-27 09:40:20 +03:00
Botond Dénes	e62325babc	Merge 'Compaction reshard task' from Aleksandra Martyniuk Task manager tasks covering reshard compaction. Reattempt on https://github.com/scylladb/scylladb/pull/14044. Bugfix for https://github.com/scylladb/scylladb/issues/14618 is squashed with 95191f4. Regression test added. Closes #14739 * github.com:scylladb/scylladb: test: add test for resharding with non-empty owned_ranges_ptr test: extend test_compaction_task.py to test resharding compaction compaction: add shard_reshard_sstables_compaction_task_impl compaction: invoke resharding on sharded database compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run() compaction: add reshard_sstables_compaction_task_impl compaction: create resharding_compaction_task_impl	2023-07-20 16:43:22 +03:00
Kefu Chai	fdf61d2f7c	compaction_manager: prevent gc-only sstables from being compacted before this change, there are chances that the temporary sstables created for collecting the GC-able data create by a certain compaction can be picked up by another compaction job. this wastes the CPU cycles, adds write amplification, and causes inefficiency. in general, these GC-only SSTables are created with the same run id as those non-GC SSTables, but when a new sstable exhausts input sstable(s), we proactively replace the old main set with a new one so that we can free up the space as soon as possible. so the GC-only SSTables are added to the new main set along with the non-GC SSTables, but since the former have good chance to overlap the latter. these GC-only SSTables are assigned with different run ids. but we fail to register them to the `compaction_manager` when replacing the main sstable set. that's why future compactions pick them up when performing compaction, when the compaction which created them is not yet completed. so, in this change, * to prevent sstables in the transient stage from being picked up by regular compactions, a new interface class is introduced so that the sstable is always added to registration before it is added to sstable set, and removed from registration after it is removed from sstable set. the struct helps to consolidate the regitration related logic in a single place, and helps to make it more obvious that the timespan of an sstable in the registration should cover that in the sstable set. * use a different run_id for the gc sstable run, as it can overlap with the output sstable run. the run_id for the gc sstable run is created only when the gc sstable writer is created. because the gc sstables is not always created for all compactions. please note, all (indirect) callers of `compaction_task_executor::compact_sstables()` passes a non-empty `std::function` to this function, so there is no need to check for empty before calling it. so in this change, the check is dropped. Fixes #14560 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14725	2023-07-20 15:47:48 +03:00
Aleksandra Martyniuk	77dcdd743e	compaction: add shard_reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction on one shard.	2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk	f73178a114	compaction: invoke resharding on sharded database In reshard_sstables_compaction_task_impl::run() we call sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed to that method, we use both sharded sstable_directory service and its local instance. To make it straightforward that sharded and local instances are dependend, we call sharded<replica::database>::invoke_on_all instead and access local directory through the sharded one.	2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk	fa10c352a1	compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()	2023-07-19 17:19:10 +02:00
Aleksandra Martyniuk	7a7e287d8c	compaction: add reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction. A struct and some functions are moved from replica/distributed_loader.cc to compaction/task_manager_module.cc.	2023-07-19 17:15:40 +02:00
Aleksandra Martyniuk	e486f4eba6	compaction: create resharding_compaction_task_impl resharding_compaction_task_impl serves as a base class of all concrete resharding compaction task classes.	2023-07-19 10:41:35 +02:00
Botond Dénes	7d5cca1958	Merge 'Regular compaction task' from Aleksandra Martyniuk Task manager's tasks covering regular compaction. Uses multiple inheritance on already existing regular_compaction_task_executor to keep track of the operation with task manager. Closes #14377 * github.com:scylladb/scylladb: test: add regular compaction task test compaction: turn regular_compaction_task_executor into regular_compaction_task_impl compaction: add compaction_manager::perform_compaction method test: modify sstable_compaction_test.cc compaction: add regular_compaction_task_impl compaction: switch state after compaction is done	2023-07-18 16:52:53 +03:00
Avi Kivity	a51fdadfed	Merge 'treewide: remove #includes not use directly' from Kefu Chai for faster build times and clear inter-module dependencies, we should not #includes headers not directly used. instead, we should only #include the headers directly used by a certain compilation unit. in this change, the source files under "/compaction" directories are checked using clangd, which identifies the cases where we have an #include which is not directly used. all the #includes identified by clangd are removed. because some source files rely on the incorrectly included header file, those ones are updated to #include the header file they directly use. if a forward declaration suffice, the declaration is added instead. see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning Closes #14740 * github.com:scylladb/scylladb: treewide: remove #includes not use directly size_tiered_backlog_tracker: do not include remove header	2023-07-18 14:45:33 +03:00
Kefu Chai	bab16eb30e	treewide: remove #includes not use directly for faster build times and clear inter-module dependencies, we should not #includes headers not directly used. instead, we should only #include the headers directly used by a certain compilation unit. in this change, the source files under "/compaction" directories are checked using clangd, which identifies the cases where we have an #include which is not directly used. all the #includes identified by clangd are removed. because some source files rely on the incorrectly included header file, those ones are updated to #include the header file they directly use. if a forward declaration suffice, the declaration is added instead. see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 17:36:31 +08:00
Kefu Chai	58302ab145	size_tiered_backlog_tracker: do not include remove header according to cppreference, > <ctgmath> is deprecated in C++17 and removed in C++20 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 17:36:31 +08:00
Kefu Chai	4c1a26c99f	compaction_manager: sort sstables when compaction is enabled before this change, we sort sstables with compaction disabled, when we are about to perform the compaction. but the idea of of guarding the getting and registering as a transaction is to prevent other compaction to mutate the sstables' state and cause the inconsistency. but since the state is tracked on per-sstable basis, and is not related to the order in which they are processed by a certain compaction task. we don't need to guard the "sort()" with this mutual exclusive lock. for better readability, and probably better performance, let's move the sort out of the lock. and take this opportunity to use `std::ranges::sort()` for more concise code. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14699	2023-07-18 06:40:43 +03:00
Aleksandra Martyniuk	2e87ba1879	compaction: turn regular_compaction_task_executor into regular_compaction_task_impl regular_compaction_task_executor inherits both from compaction_task_executor and regular_compaction_task_impl.	2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk	e3b068be4d	compaction: add compaction_manager::perform_compaction method	2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk	9fdd130943	compaction: add regular_compaction_task_impl regular_compaction_task_impl serves as a base class of all concrete regular compaction task classes.	2023-07-17 15:54:33 +02:00
Aleksandra Martyniuk	33cb156ee3	compaction: switch state after compaction is done Compaction task executors which inherit from compaction_task_impl may stay in memory after the compaction is finished. Thus, state switch cannot happen in destructor. Switch state to none in perform_task defer.	2023-07-17 15:54:33 +02:00
Avi Kivity	bfaac3a239	Merge 'Make replace sstables implementations exception safe' from Benny Halevy This is the first phase of providing strong exception safety guarantees by the generic `compaction_backlog_tracker::replace_sstables`. Once all compaction strategies backlog trackers' replace_sstables provide strong exception safety guarantees (i.e. they may throw an exception but must revert on error any intermediate changes they made to restore the tracker to the pre-update state). Once this series is merged and ICS replace_sstables is also made strongly exception safe (using infrastructure from size_tiered_backlog_tracker introduced here), `compaction_backlog_tracker::replace_sstables` may allow exceptions to propagate back to the caller rather than disabling the backlog tracker on errors. Closes #14104 * github.com:scylladb/scylladb: leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution size_tiered_backlog_tracker: make log4 helper static size_tiered_backlog_tracker: define struct sstables_backlog_contribution size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety	2023-07-17 12:32:27 +03:00
Raphael S. Carvalho	d6029a195e	Remove DateTieredCompactionStrategy This is the last step of deprecation dance of DTCS. In Scylla 5.1, users were warned that DTCS was deprecated. In 5.2, altering or creation of tables with DTCS was forbidden. 5.3 branch was already created, so this is targetting 5.4. Users that refused to move away from DTCS will have Scylla falling back to the default strategy, either STCS or ICS. See: WARN 2023-07-14 09:49:11,857 [shard 0] schema_tables - Falling back to size-tiered compaction strategy after the problem: Unable to find compaction strategy class 'DateTieredCompactionStrategy Then user can later switch to a supported strategy with alter table. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14559	2023-07-14 16:20:48 +03:00
Kefu Chai	057701299c	compaction_manager: remove unnecessary include also, remove unnecessary forward declarations. * compaction_manager_test_task_executor is only referenced in the friend declaration. but this declaration does not need a forward declaration of the friend class * compaction_manager_test_task_executor is not used anywhere. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14680	2023-07-13 14:59:39 +03:00
Kefu Chai	3a67c31df0	compaction_manager: pass const reference to ctor the callers of the constructor does not move variable into this parameter, and the constructor itself is not able to consume it. as the parameter is a vector while `compaction_sstable_registration` use an `unordered_set` for tracking the sstables being compacted. so, to avoid creating a temporary copy of the vector, let's just pass by reference. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14661	2023-07-13 11:19:44 +03:00
Botond Dénes	968421a3e0	Merge 'Stop task manager compaction module properly' from Aleksandra Martyniuk Due to wrong order of stopping of compaction services, shutdown needs to wait until all compactions are complete, which may take really long. Moreover, test version of compaction manager does not abort task manager, which is strictly bounded to it, but stops its compaction module. This results in tests waiting for compaction task manager's tasks to be unregistered, which never happens. Stopping and aborting of compaction manager and task manager's compaction module are performed in a proper order. Closes #14461 * github.com:scylladb/scylladb: tasks: test: abort task manager when wrapped_compaction_manager is destructed compaction: swap compaction manager stopping order compaction: modify compaction_manager::stop()	2023-07-12 09:54:00 +03:00
Avi Kivity	1545ae2d3b	Merge 'Make SSTable cleanup more efficient by fast forwarding to next owned range' from Raphael "Raph" Carvalho Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: `INFO 2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.` after: `INFO 2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.` Fixes #12998. Fixes #14317. Closes #14469 * github.com:scylladb/scylladb: test: Extend cleanup correctness test to cover more cases compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range sstables: Close SSTable reader if index exhaustion is detected in fast forward call sstables: Simplify sstable reader initialization compaction: Extend make_sstable_reader() interface to work with mutation_source test: Extend sstable partition skipping test to cover fast forward using token	2023-07-11 23:28:15 +03:00
Raphael S. Carvalho	8d58ff1be6	compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in #14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: ... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028. after: ... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028. Fixes #12998. Fixes #14317. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-11 13:56:24 -03:00
Michał Chojnowski	b511d57fc8	Revert "Merge 'Compaction resharding tasks' from Aleksandra Martyniuk" This reverts commit `2a58b4a39a`, reversing changes made to `dd63169077`. After patch `87c8d63b7a`, table_resharding_compaction_task_impl::run() performs the forbidden action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard, which is a data race that can cause a use-after-free, typically manifesting as allocator corruption. Note: before the bad patch, this was avoided by copying the _contents_ of the lw_shared_ptr into a new, local lw_shared_ptr. Fixes #14475 Fixes #14618 Closes #14641	2023-07-11 19:11:37 +03:00
Raphael S. Carvalho	3b1829f0d8	compaction: base compaction throughput on amount of data read Today, we base compaction throughput on the amount of data written, but it should be based on the amount of input data compacted instead, to show the amount of data compaction had to process during its execution. A good example is a compaction which expire 99% of data, and today throughput would be calculated on the 1% written, which will mislead the reader to think that compaction was terribly slow. Fixes #14533. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14615	2023-07-11 15:48:05 +03:00
Raphael S. Carvalho	bd50943270	compaction: Extend make_sstable_reader() interface to work with mutation_source As the goal is to make compaction filter to the next owned range, make_sstable_reader() should be extended to create a reader with parameters forwarded from mutation_source interface, which will be used when wiring cleanup with multi range reader. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-10 17:19:30 -03:00
Avi Kivity	0cabf4eeb9	build: disable implicit fallthrough Prevent switch case statements from falling through without annotation ([[fallthrough]]) proving that this was intended. Existing intended cases were annotated. Closes #14607	2023-07-10 19:36:06 +02:00
Aleksandra Martyniuk	529c703143	compaction: swap compaction manager stopping order task_manager::module::stop() waits till all compactions are complete. Thus, ongoing compactions should be aborted before stop() is called not to prolong shutdown process. Task manager's compaction module is stopped after compaction_manager::do_stop(), which aborts ongoing compactions, is called.	2023-07-09 12:05:49 +02:00
Aleksandra Martyniuk	a59485b6da	compaction: modify compaction_manager::stop() In compaction_manager::stop(), do_stop() is called unconditionally. It relies on do_stop to return immediately when _state == none.	2023-07-09 12:04:14 +02:00
Aleksandra Martyniuk	87c8d63b7a	compaction: add shard_reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction on one shard.	2023-06-28 11:43:12 +02:00
Aleksandra Martyniuk	db6e4a356b	compaction: invoke resharding on sharded database In reshard_sstables_compaction_task_impl::run() we call sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed to that method, we use both sharded sstable_directory service and its local instance. To make it straightforward that sharded and local instances are dependend, we call sharded<replica::database>::invoke_on_all instead and access local directory through the sharded one.	2023-06-28 11:43:12 +02:00
Aleksandra Martyniuk	1acaed026a	compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()	2023-06-28 11:43:11 +02:00
Aleksandra Martyniuk	837d77ba8c	compaction: add reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction.	2023-06-28 11:41:43 +02:00
Aleksandra Martyniuk	0d6dd3eeda	compaction: replica: copy struct and functions from distributed_loader.cc As a preparation for integrating resharding compaction with task manager a struct and some functions are copied from replica/distributed_loader.cc to compaction/task_manager_module.cc.	2023-06-28 11:41:42 +02:00
Aleksandra Martyniuk	2b4874bbf7	compaction: create resharding_compaction_task_impl resharding_compaction_task_impl serves as a base class of all concrete resharding compaction task classes.	2023-06-28 11:36:53 +02:00
Benny Halevy	3ca0c6c0a5	compaction_manager: try_perform_cleanup: set owned_ranges_ptr with compaction disabled Otherwise regular compaction can sneak in and see !cs.sstables_requiring_cleanup.empty() with cs.owned_ranges_ptr == nullptr and trigger the internal error in `compaction_task_executor::compact_sstables`. Fixes scylladb/scylladb#14296 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14297	2023-06-27 08:47:13 +03:00
Raphael S. Carvalho	83c70ac04f	utils: Extract pretty printers into a header Can be easily reused elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:20 -03:00
Benny Halevy	5710ec55c2	leveled_compaction_backlog_tracker: replace_sstables: provide strong exception safety guarantees Modify a temporary copy of `_size_per_level` and apply it back only when done. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 14:21:49 +03:00
Benny Halevy	39d4b548fc	time_window_backlog_tracker: replace_sstables: provide strong exception safety guarantees Modify a temporary copy of the `_windows` map and move-assign it back atomically when done. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 14:20:05 +03:00
Benny Halevy	635c564a9d	size_tiered_backlog_tracker: replace_sstables: provide strong exception safety guarantees By making all changes on temporary variables and eventually moving them back into the tracker members in a noexcept block the function can safely throw until the changes are committed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 14:02:21 +03:00
Benny Halevy	054a031504	size_tiered_backlog_tracker: provide static calculate_sstables_backlog_contribution Instead of providing refresh_sstables_backlog_contribution that updates the tracker in place, provide a static function calculate_sstables_backlog_contribution that doesn't change the tracker state to facilitate exception safety in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 13:32:56 +03:00
Benny Halevy	4e5bfe2c18	size_tiered_backlog_tracker: make log4 helper static It is completely generic. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 13:30:43 +03:00
Benny Halevy	5d6c2b0d12	size_tiered_backlog_tracker: define struct sstables_backlog_contribution Encapsulate the contribution-related members in struct contribution, to be used for strong exception safety. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 13:29:38 +03:00
Benny Halevy	bf69584ccc	size_tiered_backlog_tracker: update_sstables: update total_bytes only if set changed Although replace_sstables is supposed to be called only once per {old_ssts, new_ssts} it is safer to update `_total_bytes` with `sst->data_size()` only if the sst was inserted/erased successfully. Otherwise _total_bytes may go out of sync with the contents of _all. That said, the next step should be to refer to the compaction_group's main sstable set directly rather than maintaining a "shadow" set in the tracker. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 13:28:50 +03:00
Benny Halevy	1a8cc84981	compaction_backlog_tracker: replace_sstables: pass old and new sstables vectors by ref To facilitate rollback on the error handling path, to provide strong exception safety guarantees. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 13:27:18 +03:00
Benny Halevy	0877e7a846	compaction_backlog_tracker: replace_sstables: add FIXME comments about strong exception safety Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-06-26 12:51:48 +03:00
Botond Dénes	b23361977b	Merge 'Compaction reshape tasks' from Aleksandra Martyniuk Task manager's tasks covering resharding compaction on top and shard level. Closes #14112 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test reshaping compaction compaction: move reshape function to shard_reshaping_table_compaction_task_impl::run() compaction: add shard_reshaping_compaction_task_impl replica: delete unused function compaction: add table_reshaping_compaction_task_impl compaction: copy reshape to task_manager_module.cc compaction: add reshaping_compaction_task_impl	2023-06-26 11:56:07 +03:00
Aleksandra Martyniuk	197635b44b	compaction: delete generation of new sequence number for table tasks Compaction tasks covering table major, cleanup, offstrategy, and upgrade sstables compaction inherit sequence number from their parents. Thus they do not need to have a new sequence number generated as it will be overwritten anyway. Closes #14379	2023-06-26 10:36:10 +03:00
Aleksandra Martyniuk	f9a527b06d	compaction: move reshape function to shard_reshaping_table_compaction_task_impl::run()	2023-06-23 16:22:53 +02:00

1 2 3 4 5 ...

648 Commits