scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-24 10:30:38 +00:00

Author	SHA1	Message	Date
Botond Dénes	fb898d214c	Merge 'Shard major compaction task' from Aleksandra Martyniuk Implementation of task_manager's task that covers major keyspace compaction on one shard. Closes #12662 * github.com:scylladb/scylladb: test: extend major keyspace compaction tasks test compaction: create task manager's task for major keyspace compaction on one shard	2023-03-02 15:06:31 +02:00
Botond Dénes	91d64372db	Merge 'cmake: sync with `configure.py` (8/n)' from Kefu Chai - build: cmake: extract more subsystem out into its own CMakeLists.txt - build: cmake: remove swagger_gen_files - build: cmake: remove stale TODO comments - build: cmake: expose scylla_gen_build_dir - build: cmake: link against cryptopp - build: cmake: add missing source to utils - build: cmake: move lib sources into test-lib - build: cmake: add test/perf Closes #13059 * github.com:scylladb/scylladb: build: cmake: add expr_test test build: cmake: allow test to specify the sources build: cmake: add test/perf build: cmake: move lib sources into test-lib build: cmake: add missing source to utils build: cmake: link against cryptopp build: cmake: expose scylla_gen_build_dir build: cmake: remove stale TODO comments build: cmake: remove swagger_gen_files build: cmake: extract more subsystem out into its own CMakeLists.txt	2023-03-02 14:22:35 +02:00
Botond Dénes	1b5f8916d6	Merge 'Generalize sstable::move_to_new_dir() method' from Pavel Emelyanov This method requires callers to remember that the sstable is the collection of files on a filesystem and to know what exact directory they are all in. That's not going to work for object storage, instead, sstable should be moved between more abstract states. This PR replaces move_to_new_dir() call with the change_state() one that accepts target sub-directory string and moves files around. Currently supported state changes: * staging -> normal * upload -> normal \| staging * any -> quarantine All are pretty straightforward and move files between table basedir subdirectories with the exception that upload -> quarantine should move into upload/quarantine subdirectory. Another thing to keep in mind, that normal state doesn't have its subdir but maps directory to table's base directory. Closes #12648 * github.com:scylladb/scylladb: sstable: Remove explicit quarantization call test: Move move_to_new_dir() method from sstable class sstable, dist.-loader: Introduce and use pick_up_from_upload() method sstables, code: Introduce and use change_state() call distributed_loader: Let make_sstables_available choose target directory	2023-03-02 09:22:14 +02:00
Kefu Chai	563fbb2d11	build: cmake: extract more subsystem out into its own CMakeLists.txt namely, cdc, compaction, dht, gms, lang, locator, mutation_writer, raft, readers, replica, service, tools, tracing and transport. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-03-02 10:15:25 +08:00
Aleksandra Martyniuk	b188060535	compaction: create task manager's task for major keyspace compaction on one shard Implementation of task_manager's task that covers major keyspace compaction on one shard.	2023-03-01 18:56:26 +01:00
Aleksandra Martyniuk	159e603ac4	compaction: create task manager's task for major keyspace compaction Implementation of task_manager's task covering major keyspace compaction that can be started through storage_service api.	2023-02-23 15:48:05 +01:00
Aleksandra Martyniuk	6b1d7f5979	compaction: copy run_on_existing_tables to task_manager_module.cc Copy run_on_existing_tables from api/storage_service.cc to compaction/task_manager_module.cc	2023-02-23 15:31:59 +01:00
Aleksandra Martyniuk	b908369e85	compaction: add major_compaction_task_impl All major compaction tasks will share some methods like type or abort. The common part of the tasks should be inherited from major_compaction_task_impl.	2023-02-22 09:52:04 +01:00
Aleksandra Martyniuk	be101078a0	compacition: add pure virtual compaction_task_impl Add compaction_task_impl that is a pure virtual class from which all compaction tasks implementations will inherit.	2023-02-22 09:51:57 +01:00
Pavel Emelyanov	8a061bd862	sstables, code: Introduce and use change_state() call The call moves the sstable to the specified state. The change state is translated into the storage driver state change which is for todays filesystem storage means moving between directories. The "normal" state maps to the base dir of the table, there's no dedicated subdir for this state and this brings some trouble into the play. The thing is that in order to check if an sstable is in "normal" state already its impossible to compare filename of its path to any pre-defined values, as tables' basdirs are dynamic. To overcome this, the change-state call checks that the sstable is in one of "known" sub-states, and assumes that it's in normal state otherwise. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:39:34 +03:00
Aleksandra Martyniuk	4f67c0c36a	compaction: add compaction module getter to compaction manager	2023-02-20 11:19:29 +01:00
Botond Dénes	dc3d47b1e4	Merge 'Get compaction history without using qctx' from Pavel Emelyanov There are two methods to mess with compaction history -- update and get. The former had been patched to use local system-keyspace instance by `907fd2d3` (system_keyspace: De-static compaction history update) now it's time for the latter (spoiler: it's only used by the API handler) Closes #12889 * github.com:scylladb/scylladb: system_keyspace; Make get_compaction_history non static and drop qctx api, compaction_manager: Get compaction history via manager system_keyspace: Move compaction_history_entry to namespace scope	2023-02-16 19:05:48 +02:00
Pavel Emelyanov	52f69643b6	api, compaction_manager: Get compaction history via manager Right now the API handler directly calls static method from system keyspace. Patching it to call compaction manager instead will let the latter use on-board plugged system keyspace for that. If the system keyspace is not plugged, it means early boot or late shutdown, not a good time to get compaction history anyway. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-16 11:27:38 +03:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Kefu Chai	d4315245a1	main: use defer_verbose_shutdown() to shutdown compaction manager * use `defer_verbose_shutdown()` to shutdown compaction manager `EDQUOT` is quite similar as `ENOSPC`, in the sense that both of them are caused by environmental issues. before this change, `compaction_manager` filters the ENOSPC exceptions thrown by `compaction_manager::really_do_stop()`, so they are not propagated to caller when calling `compaction_manager::stop()` -- only a warning message is printed in the log. but `EDQUOT` is not handled. after this change, the exception raised by compaction manager's stop process is not filtered anymore and is handled by `defer_verbose_shutdown()` instead, which is able to check the type of exception, and print out error message in the log. so the `ENOSPC` and `EDQUOT` errors are taken care of, and more visible from user's perspective as they are printed as errors instead of warning. but they are not printed using the `compaction_manager` logger anymore. so if our testing or user's workflow depends on this behavior, the related setting should be updated accordingly. Fixes #12626 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-07 16:00:40 +08:00
Botond Dénes	511c0123a2	Merge 'Add compaction module to task manager' from Aleksandra Martyniuk Introduces task manager's compaction module. That's an initial part of integration of compaction with task manager. When fully integrated, task manager will allow user to track compaction operations, check status and progress of each individual one. It will help with creating an asynchronous version of rest api that forces any compaction. Currently, users can see with /task_manager/list_modules api call that compaction is one of the modules accessible through task manager. They won't get any additional information though, since compaction tasks are not created yet. A shared_ptr to compaction module is kept in compaction manager. Closes #12635 * github.com:scylladb/scylladb: compaction: test: pass task_manager to compaction_manager in test environment compaction: create and register task manager's module for compaction tasks: add task_manager constructor without arguments	2023-02-06 09:25:05 +02:00
Aleksandra Martyniuk	12789adb95	compaction: test: pass task_manager to compaction_manager in test environment Each instance of compaction manager should have compaction module pointer initialized. All contructors get task_manager reference with which the module is created.	2023-02-03 15:15:11 +01:00
Raphael S. Carvalho	5a784c3c6d	treewide: Use new sstable_set::size() wherever possible That's the preferred alternative because it's zero copy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-03 10:38:04 -03:00
Aleksandra Martyniuk	47ef689077	compaction: create and register task manager's module for compaction As an initial part of integration of compaction with task manager, compaction module is added. Compaction module inherits from tasks::task_manager::module and shared_ptr to it is kept in compaction manager. No compaction tasks are created yet.	2023-02-03 13:52:30 +01:00
Raphael S. Carvalho	1b2140e416	compaction: Fix inefficiency when updating LCS backlog tracker LCS backlog tracker uses STCS tracker for L0. Turns out LCS tracker is calling STCS tracker's replace_sstables() with empty arguments even when higher levels (> 0) only had sstables replaced. This unnecessary call to STCS tracker will cause it to recompute the L0 backlog, yielding the same value as before. As LCS has a fragment size of 0.16G on higher levels, we may be updating the tracker multiple times during incremental compaction, which operates on SSTables on higher levels. Inefficiency is fixed by only updating the STCS tracker if any L0 sstable is being added or removed from the table. This may be fixing a quadratic behavior during boot or refresh, as new sstables are loaded one by one. Higher levels have a substantial higher number of sstables, therefore updating STCS tracker only when level 0 changes, reduces significantly the number of times L0 backlog is recomputed. Refs #12499. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12676	2023-02-01 15:19:07 +02:00
Benny Halevy	82011fc489	dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const Its _it member keeps state about the current range. Although it's modified by the method, this is an implementation detail that irrelevant to the caller, hence mark the belongs_to_current_node method as const (and noexcept while at it). This allows the caller, cleanup_compaction, to use it from inside a const method, without having to mark its respective member as mutable too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12634	2023-01-25 14:52:21 +02:00
Raphael S. Carvalho	f2f839b9cc	compaction: LCS: don't reshape all levels if only a single breaks disjointness LCS reshape is compacting all levels if a single one breaks disjointness. That's unnecessary work because rewriting that single level is enough to restore disjointness. If multiple levels break disjointness, they'll each be reshaped in its own iteration, so reducing operation time for each step and disk space requirement, as input files can be released incrementally. Incremental compaction is not applied to reshape yet, so we need to avoid "major compaction", to avoid the space overhead. But space overhead is not the only problem, the inefficiency, when deciding what to reshape when overlapping is detected, motivated this patch. Fixes #12495. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12496	2023-01-17 09:55:15 +02:00
Raphael S. Carvalho	67ebd70e6e	compaction_manager: Fix reactor stalls during periodic submissions Every 1 hour, compaction manager will submit all registered table_state for a regular compaction attempt, all without yielding. This can potentially cause a reactor stall if there are 1000s of table states, as compaction strategy heuristics will run on behalf of each, and processing all buckets and picking the best one is not cheap. This problem can be magnified with compaction groups, as each group is represented by a table state. This might appear in dashboard as periodic stalls, every 1h, misleading the investigator into believing that the problem is caused by a chronological job. This is fixed by piggybacking on compaction reevaluation loop which can yield between each submission attempt if needed. Fixes #12390. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12391	2022-12-24 13:43:16 +02:00
Raphael S. Carvalho	e6fb3b3a75	compaction: Delete atomically off-strategy input sstables After commit `a57724e711`, off-strategy no longer races with view building, therefore deletion code can be simplified and piggyback on mechanism for deleting all sstables atomically, meaning a crash midway won't result in some of the files coming back to life, which leads to unnecessary work on restart. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12245	2022-12-16 08:15:49 +02:00
Botond Dénes	8f8284783a	Merge 'Fix handling of non-full clustering keys in the read path' from Tomasz Grabiec This PR fixes several bugs related to handling of non-full clustering keys. One is in trim_clustering_row_ranges_to(), which is broken for non-full keys in reverse mode. It will trim the range to position_in_partition_view::after_key(full_key) instead of position_in_partition_view::before_key(key), hence it will include the key in the resulting range rather than exclude it. Fixes #12180 after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys as after_key() is used in various parts in the read path. Refs #1446 Closes #12234 * github.com:scylladb/scylladb: position_in_partition: Make after_key() work with non-full keys position_in_partition: Introduce before_key(position_in_partition_view) db: Fix trim_clustering_row_ranges_to() for non-full keys and reverse order types: Fix comparison of frozen sets with empty values	2022-12-15 10:47:12 +02:00
Tomasz Grabiec	23e4c83155	position_in_partition: Make after_key() work with non-full keys This fixes a long standing bug related to handling of non-full clustering keys, issue #1446. after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys.	2022-12-14 14:47:33 +01:00
Pavel Emelyanov	9bdea110a6	code: Reduce fanout of sstables(_manager)?.hh over headers This change removes sstables.hh from some other headers replacing it with version.hh and shared_sstable.hh. Also this drops sstables_manager.hh from some more headers, because this header propagates sstables.hh via self. That change is pretty straightforward, but has a recochet in database.hh that needs disk-error-handler.hh. Without the patch touch sstables/sstable.hh results in 409 targets recompillation, with the patch -- 299 targets. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #12222	2022-12-07 14:34:19 +02:00
Avi Kivity	d2b1d2f695	compaction_manager: reindent postponed_compactions_reevaluation()	2022-12-05 22:02:27 +02:00
Avi Kivity	1669025736	compaction_manager: coroutinize postponed_compactions_reevaluation() So much nicer.	2022-12-05 22:01:41 +02:00
Avi Kivity	d2c44cba77	compaction_manager: make postponed_compactions_reevaluation() return a future postponed_compactions_reevaluation() runs until compaction_manager is stopped, checking if it needs to launch new compactions. Make it return a future instead of stashing its completion somewhere. This makes is easier to convert it to a coroutine.	2022-12-05 21:58:48 +02:00
Raphael S. Carvalho	d61b4f9dfb	compaction_manager: Delete compaction_state's move constructor compaction_state shouldn't be moved once emplaced. moving it could theoretically cause task's gate holder to have a dangling pointer to compaction_state's gate, but turns out gate's move ctor will actually fail under this assertion: assert(!_count && "gate reassigned with outstanding requests"); Cannot happen today, but let's make it more future proof. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #12167	2022-12-02 20:56:57 +03:00
Avi Kivity	f565db75ce	compaction: don't compare signed and unsigned compaction counts gcc warns as this can lead to incorrect results. Cast the threshold to an unsigned type (we know it's positive at this point) to avoid the warning.	2022-11-28 21:41:56 +02:00
Benny Halevy	8b81635d95	compaction: refactor dht::subtract_ranges out of get_ranges_for_invalidation The algorithm is generic and can be used elsewhere. Add a unit test for the function before it gets optimized in the following patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:48:26 +02:00
Benny Halevy	7c6f60ae72	compaction_manager: needs_cleanup: get first/last tokens from sstable decorated keys Currently, the function is inefficient in two ways: 1. unnecessary copy of first/last keys to automatic variables 2. redecorating the partition keys with the schema passed to needs_cleanup. We canjust use the tokens from the sstable first/last decorated keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-21 15:44:32 +02:00
Avi Kivity	994603171b	Merge 'Add validator to the mutation compactor' from Botond Dénes Fragment reordering and fragment dropping bugs have been plaguing us since forever. To fight them we added a validator to the sstable write path to prevent really messed up sstables from being written. This series adds validation to the mutation compactor. This will cover reads and compaction among others, hopefully ridding us of such bugs on the read path too. This series fixes some benign looking issues found by unit tests after the validator was added -- although how benign a producer emitting two partition-ends depends entirely on how the consumer reacts to it, so no such bug is actually benign. Fixes: https://github.com/scylladb/scylladb/issues/11174 Closes #11532 * github.com:scylladb/scylladb: mutation_compactor: add validator mutation_fragment_stream_validator: add a 'none' validation level test/boost/mutation_query_test: test_partition_limit: sort input data querier: consume_page(): use partition_start as the sentinel value treewide: use ::for_partition_end() instead of ::end_of_partition_tag_t{} treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} position_in_partition: add for_partition_{start,end}()	2022-11-20 20:33:26 +02:00
Aleksandra Martyniuk	7ead1a7857	compaction: request abort only once in compaction_data::stop compaction_manager::task (and thus compaction_data) can be stopped because of many different reasons. Thus, abort can be requested more than once on compaction_data abort source causing a crash. To prevent this before each request_abort() we check whether an abort was requested before. Closes #12004	2022-11-17 12:44:59 +02:00
Raphael S. Carvalho	b88acffd66	replica: Allow one compaction_backlog_tracker for each compaction_group Today, compaction_backlog_tracker is managed in each compaction_strategy implementation. So every compaction strategy is managing its own tracker and providing a reference to it through get_backlog_tracker(). But this prevents each group from having its own tracker, because there's only a single compaction_strategy instance per table. To remove this limitation, compaction_strategy impl will no longer manage trackers but will instead provide an interface for trackers to be created, such that each compaction group will be allowed to have its own tracker, which will be managed by compaction manager. On compaction strategy change, table will update each group with the new tracker, which is created using the previously introduced ompaction_group_sstable_set_updater. Now table's backlog will be the sum of all compaction_group backlogs. The normalization factor is applied on the sum, so we don't have to adjust each individual backlog to any factor. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:22:51 -03:00
Raphael S. Carvalho	d862dd815c	compaction: Make compaction_state available for compaction tasks being stopped compaction_backlog_tracker will be managed by compaction_manager, in the per table state. As compaction tasks can access the tracker throughout its lifetime, remove() can only deregister the state once we're done stopping all tasks which map to that state. remove() extracted the state upfront, then performed the stop, to prevent new tasks from being registered and left behind. But we can avoid the leak of new tasks by only closing the gate, which waits for all tasks (which are stopped a step earlier) and once closed, prevents new tasks from being registered. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:22:51 -03:00
Raphael S. Carvalho	0a152a2670	compaction: Implement move assignment for compaction_backlog_tracker That's needed for std::optional to work on its behalf. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:22:49 -03:00
Raphael S. Carvalho	fe305cefd0	compaction: Fix compaction_backlog_tracker move ctor Luckily it's not used anywhere. Default move ctor was picked but it won't clear _manager of old object, meaning that its destructor will incorrectly deregister the tracker from compaction_backlog_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	8e1e30842d	compaction: Use table_state's backlog tracker in compaction_read_monitor_generator A step closer towards a separate backlog tracker for each compaction group. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	fedafd76eb	compaction: kill undefined get_unimplemented_backlog_tracker() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	244efddb22	Fix exception safety when transferring ongoing charges to new backlog tracker When setting a new strategy, the charges of old tracker is transferred to the new one. The problem is that we're not reverting changes if exception is triggered before the new strategy is successfully set. To fix this exception safety issue, let's copy the charges instead of moving them. If exception is triggered, the old tracker is still the one used and remain intact. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:37 -03:00
Raphael S. Carvalho	1ec0ef18a5	compaction/table_state: Introduce get_backlog_tracker() This interface will be helpful for allowing replica::table, unit tests and sstables::compaction to access the compaction group's tracker which will be managed by the compaction manager, once we complete the decoupling work. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-11-11 09:17:36 -03:00
Botond Dénes	f1a039fc2b	treewide: use ::for_partition_start() instead of ::partition_start_tag_t{} We just added a convenience static factory method for partition start, change the present users of the clunky constructor+tag to use it instead.	2022-11-11 09:58:18 +02:00
Botond Dénes	3aff59f189	Merge 'staging sstables: filter tokens for view update generation' from Benny Halevy This mini-series introduces dht::tokens_filter and uses it for consuming staging sstable in the view_update_generator. The tokens_filter uses the token ranges owned by the current node, as retrieved by get_keyspace_local_ranges. Refs #9559 Closes #11932 * github.com:scylladb/scylladb: db: view_update_generator: always clean up staging sstables compaction: extract incremental_owned_ranges_checker out to dht	2022-11-10 07:00:51 +02:00
Benny Halevy	fd3e66b0cc	compaction: extract incremental_owned_ranges_checker out to dht It is currently used by cleanup_compaction partition filter. Factor it out so it can be used to filter staging sstables in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-11-09 07:32:56 +02:00
Raphael S. Carvalho	a57724e711	Make off-strategy compaction wait for view building completion Prior to off-strategy compaction, streaming / repair would place staging files into main sstable set, and wait for view building completion before they could be selected for regular compaction. The reason for that is that view building relies on table providing a mutation source without data in staging files. Had regular compaction mixed staging data with non-staging one, table would have a hard time providing the required mutation source. After off-strategy compaction, staging files can be compacted in parallel to view building. If off-strategy completes first, it will place the output into the main sstable set. So a parallel view building (on sstables used for off-strategy) may potentially get a mutation source containing staging data from the off-strategy output. That will mislead view builder as it won't be able to detect changes to data in main directory. To fix it, we'll do what we did before. Filter out staging files from compaction, and trigger the operation only after we're done with view building. We're piggybacking on off-strategy timer for still allowing the off-strategy to only run at the end of the node operation, to reduce the amount of compaction rounds on the data introduced by repair / streaming. Fixes #11882. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #11919	2022-11-08 08:53:58 +02:00

1 2 3 4 5 ...

490 Commits