scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-12 19:02:12 +00:00

Author	SHA1	Message	Date
Raphael S. Carvalho	ad82ede5f3	compaction: simplify rewrite_sstables() with coroutine rewrite_sstables() is terribly nested, making it hard to read. as usual, can be nicely simplified with coroutines. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211223135012.56277-1-raphaelsc@scylladb.com>	2021-12-26 14:10:52 +02:00
Raphael S. Carvalho	e05859c3f9	compaction: kill unused code for resharding_compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217162728.114936-2-raphaelsc@scylladb.com>	2021-12-20 18:21:31 +02:00
Raphael S. Carvalho	d1f2fd7f03	compaction: rename compacting_sstable_writer to compacted_fragments_writer the name compacting_sstable_writer is misleading as it doesn't perform any compaction. let's rename it to a name that reflects more what it does. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217162728.114936-1-raphaelsc@scylladb.com>	2021-12-20 18:21:31 +02:00
Botond Dénes	55bb70a878	Merge "Make sure TWCS per-window major includes all files" from Raphael " TWCS perform STCS on a window as long as it's the most recent one. From there on, TWCS will compact all files in the past window into a single file. With some moderate write load, it could happen that there's still some compaction activity in that past window, meaning that per-window major may miss some files being currently compacted. As a result, a past window may contain more than 1 file after all compaction activity is done on its behalf, which may increase read amplification. To avoid that, TWCS will now make sure that per-window major is serialized, to make sure no files are missed. Fixes #9553. tests: unit(dev). " * 'fix_twcs_per_window_major_v3' of https://github.com/raphaelsc/scylla: TWCS: Make sure major on past window is done on all its sstables TWCS: remove needless param for STCS options TWCS: kill unused param in newest_bucket() compaction: Implement strategy control and wire it compaction: Add interface to control strategy behavior.	2021-12-20 17:12:50 +02:00
Nadav Har'El	252ce8afd4	Merge 'Extend stop compaction api' from Benny Halevy Allow stopping compaction by type on a given keyspace and list of tables. Also add api unit test suite that tests the existing `stop_compaction` api and the new `stop_keyspace_compaction` api. Fixes #9700 Closes #9746 * github.com:scylladb/scylla: api: storage_service: validate_keyspace: improve exception error message api: compaction_manager: add stop_keyspace_compaction api: storage_service: expose validate_keyspace and parse_tables api: compaction_manager: stop_compaction: fix type description compaction_manager: stop_compaction: expose optional table* test: api: add basic compaction_manager test	2021-12-20 00:18:46 +02:00
Benny Halevy	c89876c975	compaction: scrub_validate_mode_validate_reader: throw compaction_stopped_exception if stop is requested Currently when scrub/validate is stopped (e.g. via the api), scrub_validate_mode_validate_reader co_return:s without closing the reader passed to it - causing a crash due to internal error check, see #9766. Throwing a compaction_stopped_exception rather than co_return:ing an exception will be handled as any other exeption, including closing the reader. Fixes #9766 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211213125528.2422745-1-bhalevy@scylladb.com>	2021-12-14 11:15:23 +02:00
Raphael S. Carvalho	8eace8fc49	TWCS: Make sure major on past window is done on all its sstables Once current window is sealed, TWCS is supposed to compact all its sstables into one. If there's ongoing compaction, it can happen that sstables are missed and therefore past windows will contain more than one sstable. Additionally, it could happen that major doesn't happen at all if under heavy load. All these problems are fixed by serializing major on past window and also postponing it if manager refuses to run the job now. Fixes #9553. Reviewed-by: Benny Halevy <bhalevy@scylladb.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:10:43 -03:00
Raphael S. Carvalho	2dc890d8e6	TWCS: remove needless param for STCS options STCS option can be retrieved from class member, as newest_bucket() is no longer a static function. let's get rid of it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:40 -03:00
Raphael S. Carvalho	41a5736aaf	TWCS: kill unused param in newest_bucket() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:36 -03:00
Raphael S. Carvalho	49f40c8791	compaction: Implement strategy control and wire it This implements strategy control interface for both manager and tests, and wire it. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 16:05:23 -03:00
Raphael S. Carvalho	6d9466052e	compaction: Add interface to control strategy behavior. This interface is akin to table_state, but compaction manager's representative instead. It will allow compaction manager to set goals and contraints on compaction strategies. It will start by allowing strategy to know if there's ongoing compaction, which is useful for virtually all strategies. For example, LCS may want to compact L0 in parallel with higher levels, to avoid L0 falling behind. This interface can be easily extended to allow manager to switch to a reclaim mode, if running out of space, etc. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-13 15:55:37 -03:00
Avi Kivity	e44a28dce4	Merge "compaction: Allow data from different buckets (e.g. windows) to be compacted together" from Raphael " Today, data from different buckets (e.g. windows) cannot be compacted together because mutation compactor happens inside each consumer, where each consumer is done on behalf of a particular bucket. To solve this problem, mutation compaction process is being moved from consumer into producer, such that interposer consumer, which is responsible for segregation, will be feeded with compacted data and forward it into the owner bucket. Fixes #9662. tests: unit(debug). " * 'compact_across_buckets_v2' of github.com:raphaelsc/scylla: tests: sstable_compaction_test: add test_twcs_compaction_across_buckets compaction: Move mutation compaction into producer for TWCS compaction: make enable_garbage_collected_sstable_writer() more precise	2021-12-12 15:07:15 +02:00
Raphael S. Carvalho	9b8aa1e9ae	compaction: Move mutation compaction into producer for TWCS If interposer is enabled, like the timestamp-based one for TWCS, data from different buckets (e.g. windows) cannot be compacted together because mutation compaction happens inside each consumer, where each consumer will be belong to a different bucket. To remove this limitation, let's move the mutation compactor from consumer into producer, such that compacted data will be feeded into the interposer, before it segregates data. We're short-circuiting this logic if TWCS isn't in use as compacting reader adds overhead to compaction, given that this reader will pop fragments from combined sstable reader, compact them using mutation_compactor and finally push them out to the underlying reader. without compacting reader (e.g. STCS + no interposer): 228255.92 +- 1519.53 partitions / sec (50 runs, 1 concurrent ops) 224636.13 +- 1165.05 partitions / sec (100 runs, 1 concurrent ops) 224582.38 +- 1050.71 partitions / sec (100 runs, 1 concurrent ops) with compacting reader (e.g. TWCS + interposer): 221376.19 +- 1282.11 partitions / sec (50 runs, 1 concurrent ops) 216611.65 +- 985.44 partitions / sec (100 runs, 1 concurrent ops) 215975.51 +- 930.79 partitions / sec (100 runs, 1 concurrent ops) So the cost of compacting data across buckets is ~3.5%, which happens only with interposer enabled and GC writer disabled. Fixes #9662. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-10 17:14:44 -03:00
Raphael S. Carvalho	484269cd8f	compaction: make enable_garbage_collected_sstable_writer() more precise we only want to enable GC writer if incremental compaction is required. let's make it more precise by checking that size limit for sstable isn't disabled, so GC writer will only be enabled for compaction strategies that really need it. So strategies that don't need it won't pay the penalty. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-10 15:22:08 -03:00
Raphael S. Carvalho	e0758fded1	compaction_manager: make get_compaction_state() private internal method that should never be directly used by the outside world. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211210120806.19233-1-raphaelsc@scylladb.com>	2021-12-10 17:19:24 +03:00
Mikołaj Sielużycki	504efe0607	table: Prevent resurrecting data from memtable on compaction Mutations are not guaranteed to come in the order of their timestamps. If there is an expired tombstone in the sstable and a repair inserts old data into memtable, the compaction would not consider memtable data and purge the tombstone leading to data resurrection. The solution is to disallow purging tombstones newer than min memtable timestamp.	2021-12-09 13:22:14 +01:00
Benny Halevy	fed7319698	compaction_manager: stop_compaction: expose optional table* To be used by api layer. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-09 14:14:49 +02:00
Mikołaj Sielużycki	7ce0ca040d	table: Add min_memtable_timestamp function to table	2021-12-09 13:14:38 +01:00
Botond Dénes	2e5440bdf2	Merge 'Convert compaction to flat_mutation_reader_v2' from Raphael Carvalho Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too. There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2. tests: unit(debug). Closes #9725 * github.com:scylladb/scylla: compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_crawling_reader() to flat_mutation_reader_v2 sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2 sstable_set: update incremental_reader_selector to flat_mutation_reader_v2	2021-12-07 15:17:38 +02:00
Raphael S. Carvalho	2435bd14c6	compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:57 -03:00
Raphael S. Carvalho	c6399005a3	sstable_set: update make_crawling_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:55 -03:00
Raphael S. Carvalho	aebbe68239	sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:53 -03:00
Raphael S. Carvalho	c3c070a5ca	sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:51 -03:00
Avi Kivity	395b30bca8	mutation_reader: update make_filtering_reader() to flat_mutation_reader_v2 As part of the drive to move over to flat_mutation_reader_v2, update make_filtering_reader(). Since it doesn't examine range tombstones (only the partition_start, to filter the key) the entire patch is just glue code upgrading and downgrading users in the pipeline (or removing a conversion, in one case). Test: unit (dev) Closes #9723	2021-12-07 12:18:07 +02:00
Raphael S. Carvalho	6737c88045	compaction_manager: use single semaphore for serialization of maintenance compactions We have three semaphores for serialization of maintenance ops. 1) _rewrite_sstables_sem: for scrub, cleanup and upgrade. 2) _major_compaction_sem: for major 3) _custom_job_sem: for reshape, resharding and offstrategy scrub, cleanup and upgrade should be serialized with major, so rewrite sem should be merged into major one. offstrategy is also a maintenance op that should be serialized with others, to reduce compaction aggressiveness and space requirement. resharding is one-off operation, so can be merged there too. the same applies for reshape, which can take long and not serializing it with other maintenance activity can lead to exhaustion of resources and high space requirement. let's have a single semaphore to guarantee their serialization. deadlock isn't an issue because locks are always taken in same order. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211201182046.100942-1-raphaelsc@scylladb.com>	2021-12-07 12:18:07 +02:00
Benny Halevy	cc122984d6	compaction: scrub: add quarantine_mode option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:29:04 +02:00
Benny Halevy	60ff28932c	compaction_manager: perform_sstable_scrub: get the whole compaction_type_options::scrub So we can pass additional options on top of the scrub mode. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:21:37 +02:00
Benny Halevy	bbe275f37d	compaction: scrub_sstables_validate_mode: quarantine invalid sstables When invalid sstables are detected, move them to the quarantine subdirectory so they won't be selected for regular compaction. Refs #7658 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:14:16 +02:00
Benny Halevy	13e7b00f2e	sstables: add is_quarantined Quarantined sstables will reside in a "quarantine" subdirectory and are also not eligible for regular compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Benny Halevy	07c5ddf182	sstables: add is_eligible_for_compaction Currently compaction_manager tracks sstables based on !requires_view_building() and similarly, table::in_strategy_sstables picks up only sstables that are not in staging. is_eligible_for_compaction() generalizes this condition in preparation for adding a quarantine subdirectory for invalid sstables that should not be compacted as well. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Raphael S. Carvalho	2f9f089eda	compaction_strategy: kill unused compaction_strategy_type::major Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-03 12:27:10 -03:00
Raphael S. Carvalho	0e3d388ebb	compaction: Log skip of fully expired sstables Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-03 12:25:48 -03:00
Raphael S. Carvalho	9725e5efa9	compaction_strategy: kill unused can_compact_partial_runs() This strategy method was introduced unnecessarily. We assume it was going to be needed, but turns out it was never needed, not even for ICS. Also it's built on a wrong assumption as an output sstable run being generated can never be compacted in parallel as the non-overlapping requirement can be easily broken. LCS for example can allow parallel compaction on different runs (levels) but correctness cannto be guaranteed with same runs are compacted in parallel. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-03 12:20:51 -03:00
Raphael S. Carvalho	7a7a2467fa	compaction: kill useless on_skipped_expired_sstable() It was introduced by commit `5206a97915` because fully expired sstable wouldn't be registed and therefore could be never removed from backlog tracker. This is no longer possible as table is now responsible for removing all input sstables. So let's kill on_skipped_expired_sstable() as it's now only boilerplate we don't need. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-03 12:19:29 -03:00
Raphael S. Carvalho	32c2534e91	compaction: merge _total_input_sstables and _ancestors Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-03 12:19:23 -03:00
Raphael S. Carvalho	4a02e312f6	compaction: increase disjoint tolerance in TWCS reshape When reshaping TWCS table in relaxed mode, which is the case for offstrategy and boot, disjoint tolerance is too strict, which can lead those processes to do more work than needed. Let's increase the tolerance to max threshold, which will limit the amount of sstables opened in compaction to a reasonable amount. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211130132538.56285-1-raphaelsc@scylladb.com>	2021-12-03 06:38:42 +02:00
Raphael S. Carvalho	6d750d4f59	compaction_manager: move check_for_cleanup into perform_cleanup() Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 14:39:31 -03:00
Raphael S. Carvalho	9aed7e9d67	compaction_manager: replace get_total_size by one liner Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 14:39:31 -03:00
Raphael S. Carvalho	760cfd93fb	compaction_manager: make consistent usage of type and name table new code in manager adopted name and type table, whereas historical code still uses name and type column family. let's make it consistent for newcomers to not get confused. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 14:39:27 -03:00
Raphael S. Carvalho	e460f72250	compaction_manager: simplify rewrite_sstables() as rewrite_sstables() switched to coroutine, it can be simplified by not using smart pointers to handle lifetime issues. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 08:15:41 -03:00
Raphael S. Carvalho	48124fc15a	compaction_manager: restore indentation Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-02 08:15:38 -03:00
Raphael S. Carvalho	f23e0d7f2d	compaction_manager: Disconsider inactive tasks when filtering sstables After commit `1f5b17f`, overlapping can be introduced in level 1 because procedure that filters out sstables from partial runs is considering inactive tasks, so L1 sstables can be incorrectly filtered out from next compaction attempt. When L0 is merged into L1, overlapping is then introduced in L1 because old L1 sstables weren't considered in L0 -> L1 compaction. From now on, compaction_manager::get_candidates() will only consider active tasks, to make sure actual partial runs are filtered out. Fixes #9693. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129180459.125847-1-raphaelsc@scylladb.com>	2021-12-01 16:11:44 +02:00
Raphael S. Carvalho	9de7abdc80	compaction: LCS: Fix inefficiency when pushing SSTables to higher levels To satisfy backlog controller, commit `28382cb25c` changed LCS to incrementally push sstables to highest level when there's nothing else to be done. That's overkill because controller will be satisfied with level L being fanout times larger than L-1. No need to push everything to last level as it's even worse than a major, because any file being promoted will overlap with ~10 files in next level. At least, the cost is amortized by multiple iterations, but terrible write amplification is still there. Consequently, this reduces overall efficiency. For example, it might happen that LCS in table A start pushing everything to highest level, when table B needs resources for compaction to reduce its backlog. Increased write amplification in A may prevent other tables from reducing their backlog in a timely manner. It's clear that LCS should stop promoting as soon as level L is 10x larger than L-1, so strategy will still be satisfied while fixing the inefficiency problem. Now layout will look like as follow: SSTables in each level: [0, 2, 15, 121] Previously, it looked like once table stopped being written to: SSTables in each level: [0, 0, 0, 138] It's always good to have everything in a single run, but that comes with a high write amplification cost which we cannot afford in steady state. With this change, the layout will still be good enough to make everybody happy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211129143606.71257-1-raphaelsc@scylladb.com>	2021-12-01 16:10:25 +02:00
Avi Kivity	03755b362a	Merge 'compaction_manager api: stop ongoing compactions' from Benny Halevy This series extends `compaction_manager::stop_ongoing_compaction` so it can be used from the api layer for: - table::disable_auto_compaction - compaction_manager::stop_compaction Fixes #9313 Fixes #9695 Test: unit(dev) Closes #9699 * github.com:scylladb/scylla: compaction_manager: stop_compaction: wait for ongoing compactions to stop compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level compaction_manager: unify stop_ongoing_compactions implementations compaction_manager: stop_ongoing_compactions: add compaction_type option compaction_manager: get_compactions: get a table* parameter table: disable_auto_compaction: stop ongoing compactions compaction_manager: make stop_ongoing_compactions public table: futurize disable_auto_compactions	2021-11-30 19:08:14 +02:00
Benny Halevy	957003e73f	compaction_manager: stop_compaction: wait for ongoing compactions to stop Similar to #9313, stop_compaction should also reuse the stop_ongoing_comapctions() infrastructure and wait on ongoing compactions of the given type to stop. Fixes #9695 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:11 +02:00
Benny Halevy	b9ba181d3c	compaction_manager: stop_ongoing_compactions: log Stopping 0 tasks at debug level Normally, "Stopping 0 tasks for 0 ongoing compactions for table ..." is not very interesting so demote its log_level to debug. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:11 +02:00
Benny Halevy	03e969dbef	compaction_manager: unify stop_ongoing_compactions implementations Now stop_ongoing_compactions(reason) is equivalent to to stop_ongoing_compactions(reason, nullptr, std::nullopt) so share the code of the latter for the former entry point. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:09:07 +02:00
Benny Halevy	94011bdcca	compaction_manager: stop_ongoing_compactions: add compaction_type option And make the table optional as well, so it can be used by stop_compaction() to a particular compaction type on all tables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:07:47 +02:00
Benny Halevy	a419759835	compaction_manager: get_compactions: get a table* parameter Optionally get running compaction on the provided table. This is required for stop_ongoing_compactions on a given table. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:34 +02:00
Benny Halevy	3c721eb228	compaction_manager: make stop_ongoing_compactions public So it can be used directly by table code in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-11-30 16:06:29 +02:00

1 2 3 4

197 Commits