scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-05 14:33:08 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	5bf51ced14	compaction: split compaction info and data for control compaction_info must only contain info data to be exported to the outside world, whereas compaction_data will contain data for controlling compaction behavior and stats which change as compaction progresses. This separation makes the interface clearer, also allowing for future improvements like removing direct references to table in compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:56:18 -03:00
Raphael S. Carvalho	6e7729fa21	compaction_manager: use task when stopping a given compaction type compaction_info will eventually only be used for exporting data about ongoing compactions, so task must be used instead. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:53:53 -03:00
Raphael S. Carvalho	6d1170ac94	compaction: remove start_size and end_size from compaction_info those stats aren't used in compaction stats API and therefore they can be removed. end_size is added to compaction_result (needed for updating history) and start_size can be calculated in advance. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:41:13 -03:00
Raphael S. Carvalho	2353f40f63	compaction_manager: introduce helpers for task Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:39 -03:00
Raphael S. Carvalho	6820fbf460	compaction_manager: introduce explicit ctor for task Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:36 -03:00
Raphael S. Carvalho	d73a241a4e	compaction: kill sstables field in compaction_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:32 -03:00
Raphael S. Carvalho	b6b4042faf	compaction: kill table pointer in compaction_info Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:38:11 -03:00
Raphael S. Carvalho	98f8673d4e	compaction: simplify procedure to stop ongoing compactions Today, compactions are tracked by both _compactions and _tasks, where _compactions refer to actual ongoing compaction tasks, whereas _tasks refer to manager tasks which is responsible for spawning new compactions, retry them on failure, etc. As each task can only have one ongoing compaction at a time, let's move compaction into task, such that manager won't have to look at both when deciding to do something like stopping a task. So stopping a task becomes simpler, and duplication is naturally gone. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:25:51 -03:00
Raphael S. Carvalho	0885376a85	compaction: move management of compaction_info to compaction_manager Today, compaction is calling compaction manager to register / deregister the compaction_info created by it. This is a layer violation because manager sits one layer above compaction, so manager should be responsible for managing compaction info. From now on, compaction_info will be created and managed by compaction_manager. compaction will only have a reference to info, which it can use to update the world about compaction progress. This will allow compaction_manager to be simplified as info can be coupled with its respective task, allowing duplication to be removed and layer violation to be fixed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 10:00:49 -03:00
Raphael S. Carvalho	7688d0432c	compaction: move output run id from compaction_info into task this run id is used to track partial runs that are being written to. let's move it from info into task, as this is not an external info, but rather one that belongs to compaction_manager. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-23 09:56:01 -03:00
Raphael S. Carvalho	ff38f59f67	compaction: Update backlog tracker correctly when schema is updated Currently the following can happen: 1) there's ongoing compaction with input sstable A, so sstable set and backlog tracker both contains A. 2) ongoing compaction replaces input sstable A by B, so sstable set contains only B now. 3) schema is updated, so a new backlog tracker is built without A because sstable set now contains only B. 4) ongoing compaction tries to remove A from tracker, but it was excluded in step 3. 5) tracker can now have a negative value if table is decreasing in size, which leads to log(<negative number>) == -NaN This problem happens because backlog tracker updates are decoupled from sstable set updates. Given that the essential content of backlog tracker should be the same as one of sstable set, let's move tracker management to table. Whenever sstable set is updated, backlog tracker will be updated with the same changes, making their management less error prone. Fixes #9157 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-20 15:54:41 -03:00
Raphael S. Carvalho	0a3049908c	compaction: Don't leak backlog of input sstable when compaction strategy is changed The generic back formula is: ALL + PARTIAL - COMPACTING With transfer_ongoing_charges() we already ignore the effect of ongoing compactions on COMPACTING as we judge them to be pointless. But ongoing compactions will run to completion, meaning that output sstables will be added to ALL anyway, in the formula above. With stop_tracking_ongoing_compactions(), input sstables are never removed from the tracker, but output sstables are added, which means we end up with duplicate backlog in the tracker. By removing this tracking mechanism, pointless ongoing compaction will be ignored as expected and the leaks will be fixed. Later, the intention is to force a stop on ongoing compactions if strategy has changed as they're pointless anyway. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-20 15:36:05 -03:00
Raphael S. Carvalho	3dc1821287	compaction: introduce compaction_read_monitor_generator::remove_exhausted_sstables() This new function makes it easier to remove monitor of exhausted sstables. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-20 15:16:41 -03:00
Raphael S. Carvalho	28ba8bde80	compaction: simplify removal of monitors by switching to unordered_map, removal of generated monitors is made easier. this is a preparatory change for patch which will remove monitor for all exhausted sstables Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-20 15:06:37 -03:00
Avi Kivity	daf028210b	build: enable -Winconsistent-missing-override warning This warning can catch a virtual function that thinks it overrides another, but doesn't, because the two functions have different signatures. This isn't very likely since most of our virtual functions override pure virtuals, but it's still worth having. Enable the warning and fix numerous violations. Closes #9347	2021-09-15 12:55:54 +03:00
Raphael S. Carvalho	acba3bd3c4	sstables: give a more descriptive name to compaction_options the name compaction_options is confusing as it overlaps in meaning with compaction_descriptor. hard to reason what are the exact difference between them, without digging into the implementation. compaction_options is intended to only carry options specific to a give compaction type, like a mode for scrub, so let's rename it to compaction_type_options to make it clearer for the readers. [avi: adjust for scrub changes] Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210908003934.152054-1-raphaelsc@scylladb.com>	2021-09-12 11:21:33 +03:00
Benny Halevy	389ef9316f	compaction: scrub/validate: prevent printing non-utf8 partition keys Corrupt keys might be printed as non-utf8 strings to the log, and that, in turn, may break applications reading the logs, such as Python (3.7) For example: ``` Traceback (most recent call last): File "/home/bhalevy/dev/scylla-dtest/dtest.py", line 1148, in tearDown self.cleanUpCluster() File "/home/bhalevy/dev/scylla-dtest/dtest.py", line 1184, in cleanUpCluster matches = node.grep_log(expr) File "/home/bhalevy/dev/scylla-ccm/ccmlib/node.py", line 367, in grep_log for line in f: File "/usr/lib64/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb3 in position 5577: invalid start byte ``` Test: unit(dev) DTest: scrub_with_one_node_expect_data_loss_test Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210730105428.2844668-1-bhalevy@scylladb.com>	2021-09-12 10:52:18 +03:00
Benny Halevy	40a6049ac2	compaction_manager: rewrite_sstables: acquire _compaction_locks Take write lock for cf to serialize cleanup/upgrade sstables/scrub with major compaction/reshape/reshard. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-09-09 14:13:45 +03:00
Benny Halevy	44348b3080	compaction_manager: perform_sstable_scrub: run_with_compaction_disabled since we might potentially have ongoing compactions, and we must ensure that all sstables created before we run are scrubbed, we need to barrier out any previously running compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-09-09 14:13:40 +03:00
Raphael S. Carvalho	a145ffcf52	compaction: don't rule out compacting sstables in validate-mode scrub even sstables being compacted must be validated. otherwise scrub validate may return false negative. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-09-09 14:06:50 +03:00
Raphael S. Carvalho	6849ec46b8	compaction: Don't purge tombstones in scrub Scrub is supposed to not remove anything from input, write it as is while fixing any corruption it might have. It shouldn't have any assumption on the input. Additionally, a data shadowed by a tombstone might be in another corrupted sstable, so expired tombstones should not be purged in order to prevent data ressurection from occurring. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210904165908.135044-1-raphaelsc@scylladb.com>	2021-09-05 17:10:34 +03:00
Avi Kivity	a81057b2e1	Merge "sstables: introduce crawling reader" from Botond " A special-purpose reader which doesn't use the index at all, designed to be used in circumstances where the index is not reliable. The use-case is scrub and validate which often have to work with corrupt indexes and it is especially important that they don't further any existing corruption. Tests: unit(dev) " * 'crawling-sstable-reader/v2' of https://github.com/denesb/scylla: compaction: scrub/validate: use the crawling sstable reader sstables: wire in crawling reader sstables: mx/reader: add crawling reader sstables: kl/reader: add crawling reader	2021-09-02 16:26:35 +03:00
Raphael S. Carvalho	3263c1d5f1	Make shutdown clean when stopping sstable reshard After `aa7cdc0392`, run_custom_job() propagates stop exception. The problem is that we fail to handle stop exception in the procedure which stops ongoing compactions, so the exception will be propagated all the way to init, which causes scylla to abort. to fix this, let's swallow stop_exception in stop_ongoing_compactions(), which is correct because compactions are stopped by triggering that exception if signalled to stop. when reshard is stopped, scylla init will fail as follow instead: ERROR 2021-08-16 20:13:13,770 [shard 0] init - Startup failed: std::runtime_error (Exception while populating keyspace 'keyspace5' with column family 'standard1' from file ... Fixes #9158. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210816232434.78375-1-raphaelsc@scylladb.com>	2021-09-02 13:50:24 +03:00
Botond Dénes	7a78601b5d	compaction: scrub/validate: use the crawling sstable reader Sstables that are scrubbed or validated are typically problematic ones that potentially have corrupt indexes. To avoid using the index altogether use the recently added crawling reader. Scrub and validate never skips in the sstable anyway.	2021-09-01 16:21:49 +03:00
Benny Halevy	4476800493	flat_mutation_reader: get rid of timeout parameter Now that the timeout is taken from the reader_permit. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 16:30:51 +03:00
Benny Halevy	fe479aca1d	reader_permit: add timeout member To replace the timeout parameter passed to flat_mutation_reader methods. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-24 14:29:44 +03:00
Benny Halevy	e9aff2426e	everywhere: make deferred actions noexcept Prepare for updating seastar submodule to a change that requires deferred actions to be noexcept (and return void). Test: unit(dev, debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-08-22 21:11:52 +03:00
Avi Kivity	a7ef826c2b	Merge "Fold validation compaction into scrub" from Botond " Validation compaction -- although I still maintain that it is a good descriptive name -- was an unfortunate choice for the underlying functionality because Origin has burned the name already as it uses it for a compaction type used during repair. This opens the door for confusion for users coming from Cassandra who will associate Validation compaction with the purpose it is used for in Origin. Additionally, since Origin's validation compaction was not user initiated, it didn't have a corresponding `nodetool` command to start it. Adding such a command would create an operational difference between us and Origin. To avoid all this we fold validation compaction into scrub compaction, under a new "validation" mode. I decided against using the also suggested `--dry-mode` flag as I feel that a new mode is a more natural choice, we don't have to define how it interacts with all the other modes, unlike with a `--dry-mode` flag. Fixes: #7736 Tests: unit(dev), manual(REST API) " * 'scrub-validation-mode/v2' of https://github.com/denesb/scylla: compaction/compaction_descriptor: add comment to Validation compaction type compaction/compaction_descriptor: compaction_options: remove validate api: storage_service: validate_keyspace -> scrub_keyspace (validate mode) compaction/compaction_manager: hide perform_sstable_validation() compaction: validation compaction -> scrub compaction (validate mode) compaction/compaction_descriptor: compaction_options: add options() accessor compaction/compaction_descriptor: compaction_options::scrub::mode: add validate	2021-08-10 12:18:35 +03:00
Asias He	4ae6eae00a	table: Get rid of table::run_compaction helper The table::run_compaction is a trivial wrapper for table::compact_sstables. We have lots of similar {start, trigger, run}_compaction functions. Dropping the run_compaction wrapper to reduce confusion. Closes #9161	2021-08-09 14:02:54 +03:00
Nadav Har'El	61bcc0ad29	Merge 'compaction: Move compaction_strategy.hh and compaction_garbage_collector.hh to compaction directory ' from Asias He This trial patch set moves compaction_strategy.hh and compaction_garbage_collector.hh to compaction directory and drops two unused compact_for_mutation_query_state and compact_for_data_query_state. Closes #9156 * github.com:scylladb/scylla: compaction: Move compaction_garbage_collector.hh to compaction dir compaction: Move compaction_strategy.hh to compaction dir mutation_compactor: Drop compact_for_mutation_query_state and compact_for_data_query_state	2021-08-08 11:58:41 +03:00
Asias He	4c1f8c2f83	compaction: Move compaction_garbage_collector.hh to compaction dir The top dir is a mess. Move compaction_garbage_collector.hh to the new home.	2021-08-07 08:07:09 +08:00
Asias He	6350a19f73	compaction: Move compaction_strategy.hh to compaction dir The top dir is a mess. Move compaction_strategy.hh and compaction_strategy_type.hh to the new home.	2021-08-07 08:06:37 +08:00
Botond Dénes	76f2790c24	compaction/compaction_descriptor: add comment to Validation compaction type Add a note explaining what Origin uses this for, to deter future attempts at reusing this for something else.	2021-08-05 07:36:45 +03:00
Botond Dénes	ab7a2cabb3	compaction/compaction_descriptor: compaction_options: remove validate It is unused now.	2021-08-05 07:36:45 +03:00
Botond Dénes	5f6468d7d7	compaction/compaction_manager: hide perform_sstable_validation() We are folding validation compaction into scrub (at least on the interface level), so remove the validation entry point accordingly and have users go through `perform_sstable_scrub()` instead.	2021-08-05 07:36:44 +03:00
Botond Dénes	a258f5639b	compaction: validation compaction -> scrub compaction (validate mode) Fold validation compaction into scrub compaction (validate mode). Only on the interface level though: to initiate validation compaction one now has to use `compaction_options::make_scrub(compaction_options::scrub::mode::validate)`. The implementation code stays as-is -- separate.	2021-08-05 07:32:05 +03:00
Raphael S. Carvalho	154e8959f9	compaction: Optimize partition filtering for cleanup compaction Realized that the overall complexity of partition filtering in cleanup is O(N * log(M)), where N is # of tokens M is # of ranges owned by the node Assuming N=10,000,000 for a table and M=257, Nlog(M) ~= 80,056,245 checks performed during the whole cleanup. This can be optimized by taking advantage that owned ranges are both sorted and non wrapping, so an incremental iterator-oriented checker is introduced to reduce complexity from O(N log(M)) to O(N + M) or just O(N). BEFORE 240MB to 237MB (~98% of original) in 3239ms = 73MB/s. ~950016 total partitions merged to 949943. 719MB to 719MB (~99% of original) in 9649ms = 74MB/s. ~2900608 total partitions merged to 2900576. 1GB to 1GB (~100% of original) in 15231ms = 74MB/s. ~4536960 total partitions merged to 4536852. 1GB to 1GB (~100% of original) in 15244ms = 74MB/s. ~4536960 total partitions merged to 4536840. 1GB to 1GB (~100% of original) in 15263ms = 74MB/s. ~4536832 total partitions merged to 4536783. 1GB to 1GB (~100% of original) in 15216ms = 74MB/s. ~4536832 total partitions merged to 4536812. AFTER 240MB to 237MB (~98% of original) in 3169ms = 74MB/s. ~950016 total partitions merged to 949943. 719MB to 719MB (~99% of original) in 9444ms = 76MB/s. ~2900608 total partitions merged to 2900576. 1GB to 1GB (~100% of original) in 14882ms = 76MB/s. ~4536960 total partitions merged to 4536852. 1GB to 1GB (~100% of original) in 14918ms = 76MB/s. ~4536960 total partitions merged to 4536840. 1GB to 1GB (~100% of original) in 14919ms = 76MB/s. ~4536832 total partitions merged to 4536783. 1GB to 1GB (~100% of original) in 14894ms = 76MB/s. ~4536832 total partitions merged to 4536812. Fixes #6807. test: mode(dev). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210802213159.182393-1-raphaelsc@scylladb.com>	2021-08-04 20:35:44 +03:00
Benny Halevy	3ad0067272	date_tiered_manifest: get_now: fix use after free of sstable_list The sstable_list is destroyed right after the temporary lw_shared_ptr<sstable_list> returned from `cf.get_sstables()` is dereferenced. Fixes #9138 Test: unit(dev) DTest: resharding_test.py:ReshardingTombstones_with_DateTieredCompactionStrategy.disable_tombstone_removal_during_reshard_test (debug) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210804075813.42526-1-bhalevy@scylladb.com>	2021-08-04 15:24:47 +03:00
Botond Dénes	8b64a6caa7	compaction/compaction_descriptor: compaction_options: add options() accessor	2021-08-03 09:34:17 +03:00
Botond Dénes	f01b799a30	compaction/compaction_descriptor: compaction_options::scrub::mode: add validate To replace compaction_type::Validation.	2021-08-03 09:34:15 +03:00
Raphael S. Carvalho	aa7cdc0392	compaction: Allow reshape to be aborted Now reshape can be aborted on either boot or refresh. The workflow is: 1) reshape starts 2) user notices it's taking too long 3) nodetool stop RESHAPE the good thing is that completed reshape work isn't lost, allowing table to enjoy the benefits of all reshaping done up to the abortion point. Fixes #7738. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-08-02 13:54:51 -03:00
Nadav Har'El	6c27000b98	Merge 'Propagate exceptions without throwing' from Piotr Sarna NOTE: this series depends on a Seastar submodule update, currently queued in next: 0ed35c6af052ab291a69af98b5c13e023470cba3 In order to avoid needless throwing, exceptions are passed directly wherever possible. Two mechanisms which help with that are: 1. `make_exception_future<>` for futures 2. `co_return coroutine::exception(...)` for coroutines which return `future<T>` (the mechanism does not work for `future<>` without parameters, unfortunately) Tests: unit(release) Closes #9079 * github.com:scylladb/scylla: system_keyspace: pass exceptions without throwing sstables: pass exceptions without throwing storage_proxy: pass exceptions without throwing multishard_mutation_query: pass exceptions without throwing client_state: pass exceptions without throwing flat_mutation_reader: pass exceptions without throwing table: pass exceptions without throwing commitlog: pass exceptions without throwing compaction: pass exceptions without throwing database: pass exceptions without throwing	2021-08-01 16:47:47 +03:00
Benny Halevy	3a4e4f9914	compaction: to_string: handle invalid values as internal error Although the switch in `to_string(compaction_options::scrub::mode)` covers all possible cases, gcc 10.3.1 warns about: ``` sstables/compaction.cc: In function ‘std::string_view sstables::to_string(sstables::compaction_options::scrub::mode)’: sstables/compaction.cc:95:1: error: control reaches end of non-void function [-Werror=return-type] ``` Adding __builtin_unreachable(), as in `to_string(compaction_type)` does calm the compiler down, but it might cause undefined behavior in the future in case the switch won't cover all cases, or the passed value is corrupt somehow. Instead, call on_internal_error_noexcept to report the error and abort if configure to do so, otherwise, just return an "(invalid)" string. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20210727130251.2283068-1-bhalevy@scylladb.com>	2021-07-27 16:04:09 +03:00
Piotr Sarna	6e994ce7c2	compaction: pass exceptions without throwing In order to avoid needless throwing, exceptions are passed directly wherever possible. Two mechanisms which help with that are: 1. make_exception_future<> for futures 2. co_return coroutine::exception(...) for coroutines which return future<T> (the mechanism does not work for future<> without parameters, unfortunately)	2021-07-26 17:03:06 +02:00
Pavel Emelyanov	c39f04fa6f	code: Remove storage-service header from irrelevant places Some .cc files over the code include the storage service for no real need. Drop the header and include (in some) what's really needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-07-22 18:50:19 +03:00
Botond Dénes	a819f013f6	compaction/compaction: create_compaction_info(): take const compaction_descriptor& Don't copy the descriptor. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210721120219.326972-1-bdenes@scylladb.com>	2021-07-21 16:19:03 +03:00
Botond Dénes	7bfa40a2f1	treewide: use make_tracking_only_permit() For all those reads that don't (won't or can't) pass through admission currently.	2021-07-14 17:19:02 +03:00
Botond Dénes	2bbfb76cc5	compaction/leveled_compaction_strategy.cc: remove unused <ranges> include Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20210713063506.419658-1-bdenes@scylladb.com>	2021-07-13 10:34:22 +03:00
Botond Dénes	47283ed151	sstables/compaction_manager: add perform_sstable_validation() Exposing validation compaction on the compaction manager level. To keep things simple, validation compaction uses the custom job infrastructure.	2021-07-12 10:25:15 +03:00
Botond Dénes	4c05e5f966	sstables/compaction_manager: rewrite_sstables(): resolve maintenance group FIXME Run this compaction in the maintenance group which is now available, resolving the FIXME asking for this.	2021-07-12 10:25:15 +03:00

1 2

61 Commits