scylladb

Author	SHA1	Message	Date
Raphael S. Carvalho	f52ad722f3	compaction_manager: rename table_state's get_sstable_set to main_sstable_set With compaction_manager switching to table_state, we'll need to introduce a method in table_state to return maintenance set. So better to have a descriptive name for main set. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-07-13 11:12:33 -03:00
Raphael S. Carvalho	aa667e590e	sstable_set: Fix partitioned_sstable_set constructor The sstable set param isn't being used anywhere, and it's also buggy as sstable run list isn't being updated accordingly. so it could happen that set contains sstables but run list is empty, introducing inconsistency. we're fortunate that the bug wasn't activated as it would've been a hard one to catch. found this while auditting the code. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220617203438.74336-1-raphaelsc@scylladb.com>	2022-06-21 11:58:13 +03:00
Avi Kivity	8edb79ea80	Merge 'Reduce compaction serialization' from Mikołaj Sielużycki update_history can take a long time compared to compaction, as a call issued on shard S1 can be handled on shard S2. If the other shard is under heavy load, we may unnecessarily block kicking off a new compaction. Normally it isn't a problem, as compactions aren't super frequent, but there were edge cases where the described behaviour caused compaction to fail to keep up with excessive flushing, leading to too many sstables on disk and OOM during a read. There is no need to wait with next compaction until history is updated, so release the weight earlier to remove unnecessary serialization. Changelog: v3: - explicitly call deregister instead of moving the weight RAII object to release weight - mark compaction as finished when sstables are compacted, without waiting for history to update v2: - Split the patches differently for easier review - Rebased agains newer master, which contains fixes that failed the debug version of the test - Removed the test, as it will be provided by [PR#10717](https://github.com/scylladb/scylla/pull/10717) Closes #10507 * github.com:scylladb/scylla: compaction: Release compaction weight before updating history. compaction: Inline compact_sstables_and_update_history call. compaction: Extract compact_sstables function compaction: Rename compact_sstables to compact_sstables_and_update_history compaction: Extract update_history function compaction: Extract should_update_history function. compaction: Fetch start_size from compaction_result compaction: Add tracking start_size in compaction_result.	2022-06-13 16:04:20 +03:00
Benny Halevy	593a192664	compaction: setup: reserve space for _input_sstable_generations We know in advance the maximum number of sstable generations to track, so reserve space for it to prevent vector reallocation for large number of sstables. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 10:18:24 +03:00
Benny Halevy	4fac6e0b27	compaction: coroutinize setup and maybe yield To prevent reactor stalls with large number of sstables. Fixes #10738 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-06-08 10:12:41 +03:00
Mikołaj Sielużycki	2edf137f61	compaction: Add tracking start_size in compaction_result.	2022-06-07 12:55:28 +02:00
Raphael S. Carvalho	2a7eb16c02	sstables: Use generation_type for compaction ancestors Let's also use generation_type for compaction ancestors, so once we support something other than integer for SSTable generation, we won't have discrepancy about what the generation type is. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-05-31 15:28:02 -03:00
Raphael S. Carvalho	0307cdd2bf	compaction: Fix incremental compaction logging The messages only dumps the last sealed fragment, but it should dump all the output fragments replacing the exhausted input ones. Let's print origin of output fragments, so we can differ between files with compaction and garbage-collection origin. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220524232232.119520-1-raphaelsc@scylladb.com>	2022-05-30 15:58:14 +03:00
Piotr Sarna	209c2f5d99	sstables: define generation_type for sstables No functional changes intended - this series is quite verbose, but after it's in, it should be considerably easier to change the type of SSTable generations to something else - e.g. a string or timeUUID. Closes #10533	2022-05-11 14:46:30 +02:00
Benny Halevy	78d6f6a519	compaction: sanitize headers from flat_mutation_reader v1 flat_mutation_reader make_scrubbing_reader no longer exists and there is no need to include flat_mutation_reader.hh nor forward declare the class. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-28 17:23:04 +03:00
Benny Halevy	f634b6d3be	compaction: cleanup_compaction: make_partition_filter: return flat_mutation_reader_v2::filter We filter only on the parittion key, so it doesn't matter, but we want to get rid of flat_mutation_reader v1. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-04-28 17:16:47 +03:00
Raphael S. Carvalho	f05ae92849	compaction: move compaction::enable_garbage_collected_sstable_writer() into protected namespace Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220411181322.192830-2-raphaelsc@scylladb.com>	2022-04-12 11:21:18 +03:00
Botond Dénes	b029bd3db7	tree: remove mutation_reader.hh include In most files it was unused. We should move these to the patch which moved out the last interesting reader from mutation_reader.hh (and added the corresponding new header include) but its probably not worth the effort. Some other files still relied on mutation_reader.hh to provide reader concurrency semaphore and some other misc reader related definitions.	2022-03-30 15:42:51 +03:00
Botond Dénes	b7954138ac	mutation_reader: move compacting reader into readers/	2022-03-30 15:42:51 +03:00
Botond Dénes	f24f2f726a	mutation_reader: move filtering reader into readers/	2022-03-30 15:42:51 +03:00
Botond Dénes	2ae0e0093e	compaction/compaction: abort scrub when attempting to rectify stream with active tombstone	2022-03-29 13:19:05 +03:00
Raphael S. Carvalho	25be958ab9	compaction: Introduce compaction_descriptor::sstables_size This method can be reused in manager, and will be useful for upcoming cleanup task. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2022-03-21 12:55:10 -03:00
Raphael S. Carvalho	67a7b7a3f4	compaction: rename interrupt() to a descriptive name interrupt() makes it sound like it's interrupting the compaction, but it's actually called on interrupt, to handle the interrupt scenario. Let's rename it to on_interrupt(). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220311000128.189840-1-raphaelsc@scylladb.com>	2022-03-11 10:16:34 +02:00
Botond Dénes	7e0b51ff23	Merge 'Overhaul compaction_manager::task' from Benny Halevy The series overhauls the compaction_manager::task design and implementation by properly layering the functionality between the compaction_manager that deals with generic task execution, and the per-task business logic that is defined in a set of classes derived from the generic task class. While at it, the series introduces `task::state` and a set of helper functions to manage it to prevent leaks in the statistics, fixing #9974. Two more stats counter were exposed: `completed_tasks` and a new `postponed_tasks`. Test: sstable_compaction_test Dtest: compaction_test.py compaction_additional_test.py Fixes #9974 Closes #10122 * github.com:scylladb/scylla: compaction_manager: use coroutine::switch_to compaction_manager::task: drop _compaction_running compaction_manager: move per-type logic to derived task compaction_manager: task: add state enum compaction_manager: task: add maybe_retry compaction_manager: reevaluate_postponed_compactions: mark as noexcept compaction_manager: define derived task types compaction_manager: register_metrics: expose postponed_compactions compaction_manager: register_metrics: expose failed_compactions compaction_manager: register_metrics: expose _stats.completed_tasks compaction: add documentation for compaction_type to string conversions compaction: expose to_string(compaction_type) compaction_manager: task: standardize task description in log messages compaction_manager: refactor can_proceed compaction_manager: pass compaction_manager& to task ctor compaction_manager: use shared_ptr<task> rather than lw_shared_ptr compaction_manager: rewrite_sstables: acquire _maintenance_ops_sem once compaction_manager: use compaction_state::lock only to synchronize major and regular compaction	2022-03-10 13:33:56 +02:00
Benny Halevy	28a74a2e90	compaction: expose to_string(compaction_type) To be used in the next patch to generate a string dscription from the compaction_type. In theory, we could use compaction_name() btu the latter returns the compaction type in all-upper case and that is very different from what we print to the log today. The all-upper strings are used for the api layer, e.g. to stop tasks of a particular compaction type. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-03-10 08:39:18 +02:00
Botond Dénes	105bf8888a	sstables: convert mx writer to v2 The sstables::sstable class has two methods for writing sstables: 1) sstable_writer get_writer(...); 2) future<> write_components(flat_mutation_reader, ...); (1) directly exposes the writer type, so we have to update all users of it (there is not that many) in this same patch. We defer updating users of (2) to a follow-up commits.	2022-03-10 07:03:49 +02:00
Botond Dénes	7a37e30310	mutation_reader: convert compacting reader v2 Its input was already a v2 reader, now itself is also a v2 reader. With this commit, compaction.cc is finally v2 all-the-way.	2022-03-10 07:03:46 +02:00
Benny Halevy	c7de2e0682	compaction: log info message when interrupting compaction Info messages are logged when compaction jobs start and finish but there is no message logged when the job is interrupted, e.g. when stopped by the compaction_manager. Refs scylladb/scylla-dtest#2468 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2022-03-07 11:43:58 +02:00
Botond Dénes	fb0e0ec7c1	mutation_reader: compacting_reader: require a v2 input reader Before we add a v2 output option to the compactor, we want to get rid of all the v1 inputs to make it simpler. This means that for a while the compacting reader will be in a strange place of having a v2 input and a v1 output. Hopefully, not for long.	2022-02-21 12:27:55 +02:00
Raphael S. Carvalho	5d654a6b9a	compaction: don't copy owned ranges in cleanup ctor Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20220119142322.39791-1-raphaelsc@scylladb.com>	2022-01-20 14:05:58 +02:00
Avi Kivity	fcb8d040e8	treewide: use Software Package Data Exchange (SPDX) license identifiers Instead of lengthy blurbs, switch to single-line, machine-readable standardized (https://spdx.dev) license identifiers. The Linux kernel switched long ago, so there is strong precedent. Three cases are handled: AGPL-only, Apache-only, and dual licensed. For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0), reasoning that our changes are extensive enough to apply our license. The changes we applied mechanically with a script, except to licenses/README.md. Closes #9937	2022-01-18 12:15:18 +01:00
Botond Dénes	a7f4ab6b14	compaction/compaction: remove v1 version of validate and scrub reader factory methods	2022-01-14 10:19:56 +02:00
Botond Dénes	d57634ad46	compaction: use v2 version of mutation_writer::segregate_by_partition()	2022-01-14 08:54:26 +02:00
Botond Dénes	b315d17c2a	compaction: migrate scrub and validate to v2 We add v2 version of external API but leave the old v1 in place to help incremental migration. The implementation is migrated to v2.	2022-01-14 08:54:26 +02:00
Botond Dénes	15d8ea983e	compaction: upgrade compaction::make_interposer_consumer() to v2 Almost all (except the scrub one) actual interposer consumers are v2.	2022-01-07 13:52:14 +02:00
Botond Dénes	aa3c943f4c	mutation_reader: remove unecessary stable_flattened_mutations_consumer Said wrapper was conceived to make unmovable `compact_mutation` because readers wanted movable consumers. But `compact_mutation` is movable for years now, as all its unmovable bits were moved into an `lw_shared_ptr<>` member. So drop this unnecessary wrapper and its unnecessary usages.	2022-01-07 13:52:07 +02:00
Botond Dénes	1ba19c2aa4	compaction/compaction_strategy: convert make_interposer_consumer() to v2 The underlying timestamp-based splitter is v2 already.	2022-01-07 13:51:59 +02:00
Botond Dénes	0601a465a2	mutation_writer: migrate shard_based_splitting_writer to v2	2022-01-07 13:48:53 +02:00
Asias He	a8ad385ecd	repair: Get rid of the gc_grace_seconds The gc_grace_seconds is a very fragile and broken design inherited from Cassandra. Deleted data can be resurrected if cluster wide repair is not performed within gc_grace_seconds. This design pushes the job of making the database consistency to the user. In practice, it is very hard to guarantee repair is performed within gc_grace_seconds all the time. For example, repair workload has the lowest priority in the system which can be slowed down by the higher priority workload, so that there is no guarantee when a repair can finish. A gc_grace_seconds value that is used to work might not work after data volume grows in a cluster. Users might want to avoid running repair during a specific period where latency is the top priority for their business. To solve this problem, an automatic mechanism to protect data resurrection is proposed and implemented. The main idea is to remove the tombstone only after the range that covers the tombstone is repaired. In this patch, a new table option tombstone_gc is added. The option is used to configure tombstone gc mode. For example: 1) GC a tombstone after gc_grace_seconds cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'timeout'} ; This is the default mode. If no tombstone_gc option is specified by the user. The old gc_grace_seconds based gc will be used. 2) Never GC a tombstone cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'disabled'}; 3) GC a tombstone immediately cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'immediate'}; 4) GC a tombstone after repair cqlsh> ALTER TABLE ks.cf WITH tombstone_gc = {'mode':'repair'}; In addition to the 'mode' option, another option 'propagation_delay_in_seconds' is added. It defines the max time a write could possibly delay before it eventually arrives at a node. A new gossip feature TOMBSTONE_GC_OPTIONS is added. The new tombstone_gc option can only be used after the whole cluster supports the new feature. A mixed cluster works with no problem. Tests: compaction_test.py, ninja test Fixes #3560 [avi: resolve conflicts vs data_dictionary]	2022-01-04 19:48:14 +02:00
Raphael S. Carvalho	e05859c3f9	compaction: kill unused code for resharding_compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217162728.114936-2-raphaelsc@scylladb.com>	2021-12-20 18:21:31 +02:00
Raphael S. Carvalho	d1f2fd7f03	compaction: rename compacting_sstable_writer to compacted_fragments_writer the name compacting_sstable_writer is misleading as it doesn't perform any compaction. let's rename it to a name that reflects more what it does. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20211217162728.114936-1-raphaelsc@scylladb.com>	2021-12-20 18:21:31 +02:00
Benny Halevy	c89876c975	compaction: scrub_validate_mode_validate_reader: throw compaction_stopped_exception if stop is requested Currently when scrub/validate is stopped (e.g. via the api), scrub_validate_mode_validate_reader co_return:s without closing the reader passed to it - causing a crash due to internal error check, see #9766. Throwing a compaction_stopped_exception rather than co_return:ing an exception will be handled as any other exeption, including closing the reader. Fixes #9766 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Message-Id: <20211213125528.2422745-1-bhalevy@scylladb.com>	2021-12-14 11:15:23 +02:00
Raphael S. Carvalho	9b8aa1e9ae	compaction: Move mutation compaction into producer for TWCS If interposer is enabled, like the timestamp-based one for TWCS, data from different buckets (e.g. windows) cannot be compacted together because mutation compaction happens inside each consumer, where each consumer will be belong to a different bucket. To remove this limitation, let's move the mutation compactor from consumer into producer, such that compacted data will be feeded into the interposer, before it segregates data. We're short-circuiting this logic if TWCS isn't in use as compacting reader adds overhead to compaction, given that this reader will pop fragments from combined sstable reader, compact them using mutation_compactor and finally push them out to the underlying reader. without compacting reader (e.g. STCS + no interposer): 228255.92 +- 1519.53 partitions / sec (50 runs, 1 concurrent ops) 224636.13 +- 1165.05 partitions / sec (100 runs, 1 concurrent ops) 224582.38 +- 1050.71 partitions / sec (100 runs, 1 concurrent ops) with compacting reader (e.g. TWCS + interposer): 221376.19 +- 1282.11 partitions / sec (50 runs, 1 concurrent ops) 216611.65 +- 985.44 partitions / sec (100 runs, 1 concurrent ops) 215975.51 +- 930.79 partitions / sec (100 runs, 1 concurrent ops) So the cost of compacting data across buckets is ~3.5%, which happens only with interposer enabled and GC writer disabled. Fixes #9662. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-10 17:14:44 -03:00
Raphael S. Carvalho	484269cd8f	compaction: make enable_garbage_collected_sstable_writer() more precise we only want to enable GC writer if incremental compaction is required. let's make it more precise by checking that size limit for sstable isn't disabled, so GC writer will only be enabled for compaction strategies that really need it. So strategies that don't need it won't pay the penalty. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-10 15:22:08 -03:00
Mikołaj Sielużycki	504efe0607	table: Prevent resurrecting data from memtable on compaction Mutations are not guaranteed to come in the order of their timestamps. If there is an expired tombstone in the sstable and a repair inserts old data into memtable, the compaction would not consider memtable data and purge the tombstone leading to data resurrection. The solution is to disallow purging tombstones newer than min memtable timestamp.	2021-12-09 13:22:14 +01:00
Botond Dénes	2e5440bdf2	Merge 'Convert compaction to flat_mutation_reader_v2' from Raphael Carvalho Since sstable reader was already converted to flat_mutation_reader_v2, compaction layer can naturally be converted too. There are many dependencies that use v1. Those strictly needed like readers in sstable set, which links compaction to sstable reader, were converted to v2 in this series. For those that aren't essential we're relying on V1<-->V2 adaptors, and conversion work on them will be postponed. Those being postponed are: scrub specialized reader (needs a validator for mutation_fragment_v2), interposer consumer, combined reader which is used by incremental selector. incremental selector itself was converted to v2. tests: unit(debug). Closes #9725 * github.com:scylladb/scylla: compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_crawling_reader() to flat_mutation_reader_v2 sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2 sstable_set: update incremental_reader_selector to flat_mutation_reader_v2	2021-12-07 15:17:38 +02:00
Raphael S. Carvalho	2435bd14c6	compaction: update compaction::make_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:57 -03:00
Raphael S. Carvalho	c6399005a3	sstable_set: update make_crawling_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:55 -03:00
Raphael S. Carvalho	aebbe68239	sstable_set: update make_range_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:53 -03:00
Raphael S. Carvalho	c3c070a5ca	sstable_set: update make_local_shard_sstable_reader() to flat_mutation_reader_v2 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2021-12-07 09:37:51 -03:00
Avi Kivity	395b30bca8	mutation_reader: update make_filtering_reader() to flat_mutation_reader_v2 As part of the drive to move over to flat_mutation_reader_v2, update make_filtering_reader(). Since it doesn't examine range tombstones (only the partition_start, to filter the key) the entire patch is just glue code upgrading and downgrading users in the pipeline (or removing a conversion, in one case). Test: unit (dev) Closes #9723	2021-12-07 12:18:07 +02:00
Benny Halevy	cc122984d6	compaction: scrub: add quarantine_mode option Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:29:04 +02:00
Benny Halevy	bbe275f37d	compaction: scrub_sstables_validate_mode: quarantine invalid sstables When invalid sstables are detected, move them to the quarantine subdirectory so they won't be selected for regular compaction. Refs #7658 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:14:16 +02:00
Benny Halevy	13e7b00f2e	sstables: add is_quarantined Quarantined sstables will reside in a "quarantine" subdirectory and are also not eligible for regular compaction. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00
Benny Halevy	07c5ddf182	sstables: add is_eligible_for_compaction Currently compaction_manager tracks sstables based on !requires_view_building() and similarly, table::in_strategy_sstables picks up only sstables that are not in staging. is_eligible_for_compaction() generalizes this condition in preparation for adding a quarantine subdirectory for invalid sstables that should not be compacted as well. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-12-05 18:00:44 +02:00

1 2 3

120 Commits