scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 14:33:08 +00:00

Author	SHA1	Message	Date
Kefu Chai	057701299c	compaction_manager: remove unnecessary include also, remove unnecessary forward declarations. * compaction_manager_test_task_executor is only referenced in the friend declaration. but this declaration does not need a forward declaration of the friend class * compaction_manager_test_task_executor is not used anywhere. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14680	2023-07-13 14:59:39 +03:00
Kefu Chai	3a67c31df0	compaction_manager: pass const reference to ctor the callers of the constructor does not move variable into this parameter, and the constructor itself is not able to consume it. as the parameter is a vector while `compaction_sstable_registration` use an `unordered_set` for tracking the sstables being compacted. so, to avoid creating a temporary copy of the vector, let's just pass by reference. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14661	2023-07-13 11:19:44 +03:00
Botond Dénes	968421a3e0	Merge 'Stop task manager compaction module properly' from Aleksandra Martyniuk Due to wrong order of stopping of compaction services, shutdown needs to wait until all compactions are complete, which may take really long. Moreover, test version of compaction manager does not abort task manager, which is strictly bounded to it, but stops its compaction module. This results in tests waiting for compaction task manager's tasks to be unregistered, which never happens. Stopping and aborting of compaction manager and task manager's compaction module are performed in a proper order. Closes #14461 * github.com:scylladb/scylladb: tasks: test: abort task manager when wrapped_compaction_manager is destructed compaction: swap compaction manager stopping order compaction: modify compaction_manager::stop()	2023-07-12 09:54:00 +03:00
Avi Kivity	1545ae2d3b	Merge 'Make SSTable cleanup more efficient by fast forwarding to next owned range' from Raphael "Raph" Carvalho Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: `INFO 2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.` after: `INFO 2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.` Fixes #12998. Fixes #14317. Closes #14469 * github.com:scylladb/scylladb: test: Extend cleanup correctness test to cover more cases compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range sstables: Close SSTable reader if index exhaustion is detected in fast forward call sstables: Simplify sstable reader initialization compaction: Extend make_sstable_reader() interface to work with mutation_source test: Extend sstable partition skipping test to cover fast forward using token	2023-07-11 23:28:15 +03:00
Raphael S. Carvalho	8d58ff1be6	compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in #14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: ... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028. after: ... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028. Fixes #12998. Fixes #14317. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-11 13:56:24 -03:00
Michał Chojnowski	b511d57fc8	Revert "Merge 'Compaction resharding tasks' from Aleksandra Martyniuk" This reverts commit `2a58b4a39a`, reversing changes made to `dd63169077`. After patch `87c8d63b7a`, table_resharding_compaction_task_impl::run() performs the forbidden action of copying a lw_shared_ptr (_owned_ranges_ptr) on a remote shard, which is a data race that can cause a use-after-free, typically manifesting as allocator corruption. Note: before the bad patch, this was avoided by copying the _contents_ of the lw_shared_ptr into a new, local lw_shared_ptr. Fixes #14475 Fixes #14618 Closes #14641	2023-07-11 19:11:37 +03:00
Raphael S. Carvalho	3b1829f0d8	compaction: base compaction throughput on amount of data read Today, we base compaction throughput on the amount of data written, but it should be based on the amount of input data compacted instead, to show the amount of data compaction had to process during its execution. A good example is a compaction which expire 99% of data, and today throughput would be calculated on the 1% written, which will mislead the reader to think that compaction was terribly slow. Fixes #14533. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14615	2023-07-11 15:48:05 +03:00
Raphael S. Carvalho	bd50943270	compaction: Extend make_sstable_reader() interface to work with mutation_source As the goal is to make compaction filter to the next owned range, make_sstable_reader() should be extended to create a reader with parameters forwarded from mutation_source interface, which will be used when wiring cleanup with multi range reader. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-10 17:19:30 -03:00
Avi Kivity	0cabf4eeb9	build: disable implicit fallthrough Prevent switch case statements from falling through without annotation ([[fallthrough]]) proving that this was intended. Existing intended cases were annotated. Closes #14607	2023-07-10 19:36:06 +02:00
Aleksandra Martyniuk	529c703143	compaction: swap compaction manager stopping order task_manager::module::stop() waits till all compactions are complete. Thus, ongoing compactions should be aborted before stop() is called not to prolong shutdown process. Task manager's compaction module is stopped after compaction_manager::do_stop(), which aborts ongoing compactions, is called.	2023-07-09 12:05:49 +02:00
Aleksandra Martyniuk	a59485b6da	compaction: modify compaction_manager::stop() In compaction_manager::stop(), do_stop() is called unconditionally. It relies on do_stop to return immediately when _state == none.	2023-07-09 12:04:14 +02:00
Aleksandra Martyniuk	87c8d63b7a	compaction: add shard_reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction on one shard.	2023-06-28 11:43:12 +02:00
Aleksandra Martyniuk	db6e4a356b	compaction: invoke resharding on sharded database In reshard_sstables_compaction_task_impl::run() we call sharded<sstables::sstable_directory>::invoke_on_all. In lambda passed to that method, we use both sharded sstable_directory service and its local instance. To make it straightforward that sharded and local instances are dependend, we call sharded<replica::database>::invoke_on_all instead and access local directory through the sharded one.	2023-06-28 11:43:12 +02:00
Aleksandra Martyniuk	1acaed026a	compaction: move run_resharding_jobs into reshard_sstables_compaction_task_impl::run()	2023-06-28 11:43:11 +02:00
Aleksandra Martyniuk	837d77ba8c	compaction: add reshard_sstables_compaction_task_impl Add task manager's task covering resharding compaction.	2023-06-28 11:41:43 +02:00
Aleksandra Martyniuk	0d6dd3eeda	compaction: replica: copy struct and functions from distributed_loader.cc As a preparation for integrating resharding compaction with task manager a struct and some functions are copied from replica/distributed_loader.cc to compaction/task_manager_module.cc.	2023-06-28 11:41:42 +02:00
Aleksandra Martyniuk	2b4874bbf7	compaction: create resharding_compaction_task_impl resharding_compaction_task_impl serves as a base class of all concrete resharding compaction task classes.	2023-06-28 11:36:53 +02:00
Benny Halevy	3ca0c6c0a5	compaction_manager: try_perform_cleanup: set owned_ranges_ptr with compaction disabled Otherwise regular compaction can sneak in and see !cs.sstables_requiring_cleanup.empty() with cs.owned_ranges_ptr == nullptr and trigger the internal error in `compaction_task_executor::compact_sstables`. Fixes scylladb/scylladb#14296 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14297	2023-06-27 08:47:13 +03:00
Raphael S. Carvalho	83c70ac04f	utils: Extract pretty printers into a header Can be easily reused elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:20 -03:00
Botond Dénes	b23361977b	Merge 'Compaction reshape tasks' from Aleksandra Martyniuk Task manager's tasks covering resharding compaction on top and shard level. Closes #14112 * github.com:scylladb/scylladb: test: extend test_compaction_task.py to test reshaping compaction compaction: move reshape function to shard_reshaping_table_compaction_task_impl::run() compaction: add shard_reshaping_compaction_task_impl replica: delete unused function compaction: add table_reshaping_compaction_task_impl compaction: copy reshape to task_manager_module.cc compaction: add reshaping_compaction_task_impl	2023-06-26 11:56:07 +03:00
Aleksandra Martyniuk	197635b44b	compaction: delete generation of new sequence number for table tasks Compaction tasks covering table major, cleanup, offstrategy, and upgrade sstables compaction inherit sequence number from their parents. Thus they do not need to have a new sequence number generated as it will be overwritten anyway. Closes #14379	2023-06-26 10:36:10 +03:00
Aleksandra Martyniuk	f9a527b06d	compaction: move reshape function to shard_reshaping_table_compaction_task_impl::run()	2023-06-23 16:22:53 +02:00
Aleksandra Martyniuk	1960904a72	compaction: add shard_reshaping_compaction_task_impl shard_reshaping_compaction_task_impl covers reshaping compaction on one shard.	2023-06-23 16:22:38 +02:00
Aleksandra Martyniuk	e3e2d6b886	compaction: add table_reshaping_compaction_task_impl	2023-06-23 15:57:37 +02:00
Aleksandra Martyniuk	dace5fb004	compaction: copy reshape to task_manager_module.cc distributed_loader::reshape is copied to compaction/task_manager_module.cc as it will be used in reshape compaction tasks.	2023-06-23 12:53:16 +02:00
Aleksandra Martyniuk	981a50e490	compaction: add reshaping_compaction_task_impl reshaping_compaction_task_impl serves as a base class of all concrete reshaping compaction task classes.	2023-06-23 12:53:15 +02:00
Botond Dénes	320159c409	Merge 'Compaction group major compaction task' from Aleksandra Martyniuk Task manager task covering compaction group major compaction. Uses multiple inheritance on already existing major_compaction_task_executor to keep track of the operation with task manager. Closes #14271 * github.com:scylladb/scylladb: test: extend test_compaction_task.py test: use named variable for task tree depth compaction: turn major_compaction_task_executor into major_compaction_task_impl compaction: take gate holder out of task executor compaction: extend signature of some methods tasks: keep shared_ptr to impl in task compaction: rename compaction_task_executor methods	2023-06-22 08:15:17 +03:00
Tomasz Grabiec	36da062bcb	db: Use table sharder in compaction	2023-06-21 00:58:24 +02:00
Aleksandra Martyniuk	74e5b4ebfc	compaction: turn major_compaction_task_executor into major_compaction_task_impl major_compaction_task_executor inherits both from compaction_task_executor and major_compaction_task_impl. Thanks to that an executed operation is represented in task manager.	2023-06-20 12:12:49 +02:00
Aleksandra Martyniuk	4922f4cf80	compaction: take gate holder out of task executor In the following commits, classes deriving from compaction_task_executor will be alive longer than they are kept in compaction_manager::_tasks. Thus, the compaction_task_executor::_gate_holder would be held, blocking other compactions. compaction_task_executor::_gate_holder is moved outside of compaction_task_executor object.	2023-06-20 12:12:45 +02:00
Aleksandra Martyniuk	e317ffe23a	compaction: extend signature of some methods Extend a signature of table::compact_all_sstables and compaction_manager::perform_major_compaction so that they get the info of a covering task. This allows to easily create child tasks that cover compaction group compaction.	2023-06-20 10:45:34 +02:00
Aleksandra Martyniuk	3007fbeee3	compaction: rename compaction_task_executor methods compaction_task_executor methods are renamed to prevent name colisions between compaction_task_executor and tasks::task_manager::task::impl.	2023-06-20 10:45:34 +02:00
Pavel Emelyanov	5412c7947a	backlog_controller: Unwrap scheduling_group Some time ago (`997a34bf8c`) the backlog controller was generalized to maintain some scheduling group. Back then the group was the pair of seastar::scheduling_group and seastar::io_priority_class. Now the latter is gone, so the controller's notion of what sched group is can be relaxed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #14266	2023-06-16 12:02:14 +03:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Raphael S. Carvalho	156d771101	compaction: Fix sstable cleanup after resharding on refresh Problem can be reproduced easily: 1) wrote some sstables with smp 1 2) shut down scylla 3) moved sstables to upload 4) restarted scylla with smp 2 5) ran refresh (resharding happens, adds sstable to cleanup set and never removes it) 6) cleanup (tries to cleanup resharded sstables which were leaked in the cleanup set) Bumps into assert "Assertion `!sst->is_shared()' failed", as cleanup picks a shared sstable that was leaked and already processed by resharding. Fix is about not inserting shared sstables into cleanup set, as shared sstables are restricted to resharding and cannot be processed later by cleanup (nor it should because resharding itself cleaned up its input files). Dtest: https://github.com/scylladb/scylla-dtest/pull/3206 Fixes #14001. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14147	2023-06-06 12:14:03 +03:00
Benny Halevy	17795757d3	compaction_manager: compact_sstables: fix typo in log message about cleanup Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #14151	2023-06-06 11:17:02 +03:00
Botond Dénes	80b944a9b8	Merge 'Table compaction tasks' from Aleksandra Martyniuk Implementation of task_manager's tasks that cover major, cleanup, offstrategy, and upgrade sstables compaction of one table. Closes #13619 * github.com:scylladb/scylladb: test: extend compaction tasks test compaction: fix indentation compaction: create table_upgrade_sstables_compaction_task_impl compaction: create table_offstrategy_keyspace_compaction_task_impl compaction: create table_cleanup_keyspace_compaction_task_impl compaction: create table_major_keyspace_compaction_task_impl compaction: add helpers for table tasks scheduling compaction: add run_on_table compaction: pass std::string to run_on_existing_tables	2023-06-06 10:51:53 +03:00
Aleksandra Martyniuk	fecdd75cd6	compaction: fix indentation	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	53c24c0f7d	compaction: create table_upgrade_sstables_compaction_task_impl Implementation of task_manager's task that covers upgrade sstables compaction of one table.	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	143919cfa7	compaction: create table_offstrategy_keyspace_compaction_task_impl Implementation of task_manager's task that covers offstrategy keyspace compaction of one table.	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	55ef1c24e1	compaction: create table_cleanup_keyspace_compaction_task_impl Implementation of task_manager's task that covers cleanup keyspace compaction of one table.	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	5c7832ab59	compaction: create table_major_keyspace_compaction_task_impl Implementation of task_manager's task that covers major keyspace compaction of one table.	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	d0c4028d64	compaction: add helpers for table tasks scheduling In shard compaction tasks per table tasks will be created all at once and then they will wait for their turn to run. A function that allows waking up tasks one after another and a function that makes the task wait for its turn are added.	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	6dacc45c70	compaction: add run_on_table Extract code which runs a function on a particular table from run_on_existing_tables to run_on_table.	2023-05-31 14:59:24 +02:00
Aleksandra Martyniuk	5c65ac00ef	compaction: pass std::string to run_on_existing_tables Keyspace argument passed to run_on_existing_tables has its type changed from std::string_view to std::string.	2023-05-31 14:59:24 +02:00
Raphael S. Carvalho	23443e0574	compaction: Fix incremental compaction for sstable cleanup After `c7826aa910`, sstable runs are cleaned up together. The procedure which executes cleanup was holding reference to all input sstables, such that it could later retry the same cleanup job on failure. Turns out it was not taking into account that incremental compaction will exhaust the input set incrementally. Therefore cleanup is affected by the 100% space overhead. To fix it, cleanup will now have the input set updated, by removing the sstables that were already cleaned up. On failure, cleanup will retry the same job with the remaining sstables that weren't exhausted by incremental compaction. New unit test reproduces the failure, and passes with the fix. Fixes #14035. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14038	2023-05-31 06:46:12 +03:00
Aleksandra Martyniuk	f48b57e7b9	compaction: use table_info in compaction tasks Task manager compaction tasks need table names for logs. Thus, compaction tasks store table infos instead of table ids. get_table_ids function is deleted as it isn't used anywhere.	2023-05-30 09:58:55 +02:00
Aleksandra Martyniuk	24864e39dd	compaction: delete unnecessary sequence number incrementations Task manager's tasks that have parent task inherit sequence number from their parents. Thus they do not need to have a new sequence number generated as it will be overwritten anyway. Closes #14045	2023-05-29 23:03:25 +03:00
Botond Dénes	3b424e391b	Merge 'perform_cleanup: wait until all candidates are cleaned up' from Benny Halevy cleanup_compaction should resolve only after all sstables that require cleanup are cleaned up. Since it is possible that some of them are in staging and therefore cannot be cleaned up, retry once a second until they become eligible. Timeout if there is no progress within 5 minutes to prevent hanging due to view building bug. Fixes #9559 Closes #13812 * github.com:scylladb/scylladb: table: signal compaction_manager when staging sstables become eligible for cleanup compaction_manager: perform_cleanup: wait until all candidates are cleaned up compaction_manager: perform_cleanup: perform_offstrategy if needed compaction_manager: perform_cleanup: update_sstables_cleanup_state in advance sstable_set: add for_each_sstable_gently* helpers	2023-05-19 12:35:59 +03:00
Raphael S. Carvalho	38b226f997	Resurrect optimization to avoid bloom filter checks during compaction Commit `8c4b5e4283` introduced an optimization which only calculates max purgeable timestamp when a tombstone satisfy the grace period. Commit 'repair: Get rid of the gc_grace_seconds' inverted the order, probably under the assumption that getting grace period can be more expensive than calculating max purgeable, as repair-mode GC will look up into history data in order to calculate gc_before. This caused a significant regression on tombstone heavy compactions, where most of tombstones are still newer than grace period. A compaction which used to take 5s, now takes 35s. 7x slower. The reason is simple, now calculation of max purgeable happens for every single tombstone (once for each key), even the ones that cannot be GC'ed yet. And each calculation has to iterate through (i.e. check the bloom filter of) every single sstable that doesn't participate in compaction. Flame graph makes it very clear that bloom filter is a heavy path without the optimization: 45.64% 45.64% sstable_compact sstable_compaction_test_g [.] utils::filter::bloom_filter::is_present With its resurrection, the problem is gone. This scenario can easily happen, e.g. after a deletion burst, and tombstones becoming only GC'able after they reach upper tiers in the LSM tree. Before this patch, a compaction can be estimated to have this # of filter checks: (# of keys containing any tombstone) * (# of uncompacting sstable runs[1]) [1] It's # of runs, as each key tend to overlap with only one fragment of each run. After this patch, the estimation becomes: (# of keys containing a GC'able tombstone) * (# of uncompacting runs). With repair mode for tombstone GC, the assumption, that retrieval of gc_before is more expensive than calculating max purgeable, is kept. We can revisit it later. But the default mode, which is the "timeout" (i.e. gc_grace_seconds) one, we still benefit from the optimization of deferring the calculation until needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13908	2023-05-18 09:01:50 +03:00

1 2 3 4 5 ...

620 Commits