scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-13 03:12:13 +00:00

Author	SHA1	Message	Date
Benny Halevy	d6071945c8	compaction, table: ignore foreign sstables replay_position The sstables replay_position in stats_metadata is valid only on the originating node and shard. Therefore, validate the originating host and shard before using it in compaction or table truncate. Fixes #10080 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#16550	2024-01-16 18:45:59 +02:00
Raphael S. Carvalho	b1c5d5dd4e	compaction: Add splitting compaction Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-12-17 11:40:08 -03:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Raphael S. Carvalho	b551f4abd2	streaming: Improve partition estimation with TWCS When off-strategy is disabled, data segregation is not postponed, meaning that getting partition estimate right is important to decrease filter's false positives. With streaming, we don't have min and max timestamps at destination, well, we could have extended the RPC verb to send them, but turns out we can deduce easily the amount of windows using default TTL. Given partitioner random nature, it's not absurd to assume that a given range being streamed may overlap with all windows, meaning that each range will yield one sstable for each window when segregating incoming data. Today, we assume the worst of 100 windows (which is the max amount of sstables the input data can be segregated into) due to the lack of metadata for estimating the window count. But given that users are recommended to target a max of ~20 windows, it means partition estimate is being downsized 5x more than needed. Let's improve it by using default TTL when estimating window count, so even on absence of timestamp metadata, the partition estimation won't be way off. Fixes #15704. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-11-08 12:10:03 +02:00
Aleksandra Martyniuk	0c6a3f568a	compaction: delete default_compaction_progress_monitor default_compaction_progress_monitor returns a reference to a static object. So, it should be read-only, but its users need to modify it. Delete default_compaction_progress_monitor and use one's own compaction_progress_monitor instance where it's needed. Closes scylladb/scylladb#15800	2023-10-23 16:03:34 +03:00
Raphael S. Carvalho	fded314e46	sstables: Fix update of tombstone GC settings to have immediate effect After "repair: Get rid of the gc_grace_seconds", the sstable's schema (mode, gc period if applicable, etc) is used to estimate the amount of droppable data (or determine full expiration = max_deletion_time < gc_before). It could happen that the user switched from timeout to repair mode, but sstables will still use the old mode, despite the user asked for a new one. Another example is when you play with value of grace period, to prevent data resurrection if repair won't be able to run in a timely manner. The problem persists until all sstables using old GC settings are recompacted or node is restarted. To fix this, we have to feed latest schema into sstable procedures used for expiration purposes. Fixes #15643. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15746	2023-10-19 16:27:59 +03:00
Botond Dénes	f7e269ccb8	Merge 'Progress of compaction executors' from Aleksandra Martyniuk compaction_read_monitor_generator is an existing mechanism for monitoring progress of sstables reading during compaction. In this change information gathered by compaction_read_monitor_generator is utilized by task manager compaction tasks of the lowest level, i.e. compaction executors, to calculate task progress. compaction_read_monitor_generator has a flag, which decides whether monitored changes will be registered by compaction_backlog_tracker. This allows us to pass the generator to all compaction readers without impacting the backlog. Task executors have access to compaction_read_monitor_generator_wrapper, which protects the internals of compaction_read_monitor_generator and provides only the necessary functionality. Closes scylladb/scylladb#14878 * github.com:scylladb/scylladb: compaction: add get_progress method to compaction_task_impl compaction: find total compaction size compaction: sstables: monitor validation scrub with compaction_read_generator compaction: keep compaction_progress_monitor in compaction_task_executor compaction: use read monitor generator for all compactions compaction: add compaction_progress_monitor compaction: add flag to compaction_read_monitor_generator	2023-10-18 12:19:51 +03:00
Raphael S. Carvalho	da04fea71e	compaction: Fix key estimation per sstable to produce efficient filters The estimation assumes that size of other components are irrelevant, when estimating the number of partitions for each output sstable. The sstables are split according to the data file size, therefore size of other files are irrelevant for the estimation. With certain data models, like single-row partitions containing small values, the index could be even larger than data. For example, assume index is as large as data, then the estimation would say that 2x more sstables will be generated, and as a result, each sstable are underestimated to have 2x less keys. Fix it by only accounting size of data file. Fixes #15726. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#15727	2023-10-17 11:21:11 +03:00
Aleksandra Martyniuk	39e96c6521	compaction: find total compaction size	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	7b3e0ab1f2	compaction: sstables: monitor validation scrub with compaction_read_generator Validation scrub bypasses the usual compaction machinery, though it still needs to be tracked with compaction_progress_monitor so that we could reach its progress from compaction task executor. Track sstable scrub in validate mode with read monitors.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	3553556708	compaction: keep compaction_progress_monitor in compaction_task_executor Keep compaction_progress_monitor in compaction_task_executor and pass a reference to it further, so that the compaction progress could be retrieved out of it.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	37da5a0638	compaction: use read monitor generator for all compactions Compaction read monitor generators are used in all compaction types. Classes which did not use _monitor_generator so far, create it with _use_backlog_tracker set to no, not to impact backlog tracker.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	22bf3c03df	compaction: add compaction_progress_monitor In the following patches compaction_read_monitor_generator will be used to find progress of compaction_task_executor's. To avoid unnecessary life prolongation and exposing internals of the class out of compaction.cc, compaction_progress_monitor is created. Compaction class keeps a reference to the compaction_progress_monitor. Inheriting classes which actually use compaction_read_monitor_generator, need to set it with set_generator method.	2023-10-12 17:03:46 +02:00
Aleksandra Martyniuk	b852ad25bf	compaction: add flag to compaction_read_monitor_generator Following patches will use compaction_read_monitor_generator to track progress of all types of compaction. Some of them should not be registered in compaction_backlog_tracker. _use_backlog_tracker flag, which is by default set to true, is added to compaction_read_monitor_generator and passed to all compaction_read_monitors created by this generator.	2023-10-12 17:03:46 +02:00
Raphael S. Carvalho	9d92374b20	compaction: Clear pending_replacement list when tombstone GC is disabled pending_replacement list is used by incremental compaction to communicate to other ongoing compactions about exhausted sstables that must be replaced in the sstable set they keep for tombstone GC purposes. Reshape doesn't enable tombstone GC, so that list will not be cleared, which prevents incremental compaction from releasing sstables referenced by that list. It's not a problem until now where we want reshape to do incremental compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:46 -03:00
Raphael S. Carvalho	db9ce9f35a	compaction: Extend reshape type to allow for incremental compaction That's done by inheriting regular_compaction, which implement incremental compaction. But reshape still implements its own methods for creating writer and reader. One reason is that reshape is not driven by controller, as input sstables to it live in maintenance set. Another reason is customization of things like sstable origin, etc. stop_sstable_writer() is extended because that's used by regular_compaction to check for possibility of removing exhausted sstables earlier whenever an output sstable is sealed. Also, incremental compaction will be unconditionally enabled for ICS/LCS during off-strategy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:15:12 -03:00
Raphael S. Carvalho	33a0f42304	compaction: Move reshape_compaction in the source That's in preparation to next change that will make reshape inherit from regular compaction. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-21 11:11:13 -03:00
Raphael S. Carvalho	6cc85068d7	compaction: Enable incremental compaction only if replacer callback is engaged That's needed for enabling incremental compaction to operate, and needed for subsequent work that enables incremental compaction for off-strategy, which in turn uses reshape compaction type. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-09-18 17:57:11 -03:00
Botond Dénes	1609c76d62	tools/scylla-sstable: scrub: don't qurantine sstables after validate Scylla sstable promises to never mutate its input sstables. This promise was broken by `scylla sstable scrub --scrub-mode=validate`, because validate moves invalid input sstables into qurantine. This is unexpected and caused occasional failures in the scrub tests in test_tools.py. Fix by propagating a flag down to `scrub_sstables_validate_mode()` in `compaction.cc`, specifying whether validate should qurantine invalid sstables, then set this flag to false in `scylla-sstable.cc`. The existing test for validate-mode scrub is ammended to check that the sstable is not mutated. The test now fails before the fix and passes afterwards. Fixes: #14309 Closes #15139	2023-08-23 21:53:12 +03:00
Pavel Emelyanov	b06917f235	sstable: Make .change_state() accept state, not directory string Pretty cosmetic change, but it will allow S3 to finally support moving sstables between states (after this patch it still doesn't) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-08-14 15:40:44 +03:00
Kefu Chai	6dc885a8e2	compaction: mark more member variables const quite a few member variables serves as the configuration for a given compaction, they are immutable in the life cycle of it, so for better readability, let's mark them `const`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14981	2023-08-09 09:28:44 +03:00
Kefu Chai	1bcd9dd80a	compaction: drop unnecessary type cast get_compacted_fragments_writer() returns a instance of `compacted_fragments_writer`, there is no need to cast it again. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14919	2023-08-02 11:36:10 +03:00
Kefu Chai	fdf61d2f7c	compaction_manager: prevent gc-only sstables from being compacted before this change, there are chances that the temporary sstables created for collecting the GC-able data create by a certain compaction can be picked up by another compaction job. this wastes the CPU cycles, adds write amplification, and causes inefficiency. in general, these GC-only SSTables are created with the same run id as those non-GC SSTables, but when a new sstable exhausts input sstable(s), we proactively replace the old main set with a new one so that we can free up the space as soon as possible. so the GC-only SSTables are added to the new main set along with the non-GC SSTables, but since the former have good chance to overlap the latter. these GC-only SSTables are assigned with different run ids. but we fail to register them to the `compaction_manager` when replacing the main sstable set. that's why future compactions pick them up when performing compaction, when the compaction which created them is not yet completed. so, in this change, * to prevent sstables in the transient stage from being picked up by regular compactions, a new interface class is introduced so that the sstable is always added to registration before it is added to sstable set, and removed from registration after it is removed from sstable set. the struct helps to consolidate the regitration related logic in a single place, and helps to make it more obvious that the timespan of an sstable in the registration should cover that in the sstable set. * use a different run_id for the gc sstable run, as it can overlap with the output sstable run. the run_id for the gc sstable run is created only when the gc sstable writer is created. because the gc sstables is not always created for all compactions. please note, all (indirect) callers of `compaction_task_executor::compact_sstables()` passes a non-empty `std::function` to this function, so there is no need to check for empty before calling it. so in this change, the check is dropped. Fixes #14560 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14725	2023-07-20 15:47:48 +03:00
Kefu Chai	bab16eb30e	treewide: remove #includes not use directly for faster build times and clear inter-module dependencies, we should not #includes headers not directly used. instead, we should only #include the headers directly used by a certain compilation unit. in this change, the source files under "/compaction" directories are checked using clangd, which identifies the cases where we have an #include which is not directly used. all the #includes identified by clangd are removed. because some source files rely on the incorrectly included header file, those ones are updated to #include the header file they directly use. if a forward declaration suffice, the declaration is added instead. see also https://clangd.llvm.org/guides/include-cleaner#unused-include-warning Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-07-18 17:36:31 +08:00
Avi Kivity	1545ae2d3b	Merge 'Make SSTable cleanup more efficient by fast forwarding to next owned range' from Raphael "Raph" Carvalho Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in https://github.com/scylladb/scylladb/issues/14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: `INFO 2023-07-01 07:10:26,281 [shard 0] compaction - [Cleanup keyspace2.standard1 701af580-17f7-11ee-8b85-a479a1a77573] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s8o_06uww24drzrroaodpv-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.` after: `INFO 2023-07-01 07:07:52,354 [shard 0] compaction - [Cleanup keyspace2.standard1 199dff90-17f7-11ee-b592-b4f5d81717b9] Cleaned 1 sstables to [./tmp/1/keyspace2/standard1-b490ee20179f11ee9134afb16b3e10fd/me-3g7a_0s4m_5hehd2rejj8w15d2nt-big-Data.db:level=0]. 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.` Fixes #12998. Fixes #14317. Closes #14469 * github.com:scylladb/scylladb: test: Extend cleanup correctness test to cover more cases compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range sstables: Close SSTable reader if index exhaustion is detected in fast forward call sstables: Simplify sstable reader initialization compaction: Extend make_sstable_reader() interface to work with mutation_source test: Extend sstable partition skipping test to cover fast forward using token	2023-07-11 23:28:15 +03:00
Raphael S. Carvalho	8d58ff1be6	compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range Today, SSTable cleanup skips to the next partition, one at a time, when it finds that the current partition is no longer owned by this node. That's very inefficient because when a cluster is growing in size, existing nodes lose multiple sequential tokens in its owned ranges. Another inefficiency comes from fetching index pages spanning all unowned tokens, which was described in #14317. To solve both problems, cleanup will now use multi range reader, to guarantee that it will only process the owned data and as a result skip unowned data. This results in cleanup scanning an owned range and then fast forwarding to the next one, until it's done with them all. This reduces significantly the amount of data in the index caching, as index will only be invoked at each range boundary instead. Without further ado, before: ... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028. after: ... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028. Fixes #12998. Fixes #14317. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-11 13:56:24 -03:00
Raphael S. Carvalho	3b1829f0d8	compaction: base compaction throughput on amount of data read Today, we base compaction throughput on the amount of data written, but it should be based on the amount of input data compacted instead, to show the amount of data compaction had to process during its execution. A good example is a compaction which expire 99% of data, and today throughput would be calculated on the 1% written, which will mislead the reader to think that compaction was terribly slow. Fixes #14533. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #14615	2023-07-11 15:48:05 +03:00
Raphael S. Carvalho	bd50943270	compaction: Extend make_sstable_reader() interface to work with mutation_source As the goal is to make compaction filter to the next owned range, make_sstable_reader() should be extended to create a reader with parameters forwarded from mutation_source interface, which will be used when wiring cleanup with multi range reader. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-07-10 17:19:30 -03:00
Raphael S. Carvalho	83c70ac04f	utils: Extract pretty printers into a header Can be easily reused elsewhere. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-06-26 21:58:20 -03:00
Tomasz Grabiec	36da062bcb	db: Use table sharder in compaction	2023-06-21 00:58:24 +02:00
Pavel Emelyanov	66e43912d6	code: Switch to seastar API level 7 In that level no io_priority_class-es exist. Instead, all the IO happens in the context of current sched-group. File API no longer accepts prio class argument (and makes io_intent arg mandatory to impls). So the change consists of - removing all usage of io_priority_class - patching file_impl's inheritants to updated API - priority manager goes away altogether - IO bandwidth update is performed on respective sched group - tune-up scylla-gdb.py io_queues command The first change is huge and was made semi-autimatically by: - grep io_priority_class \| default_priority_class - remove all calls, found methods' args and class' fields Patching file_impl-s is smaller, but also mechanical: - replace io_priority_class& argument with io_intent* one - pass intent to lower file (if applicatble) Dropping the priority manager is: - git-rm .cc and .hh - sed out all the #include-s - fix configure.py and cmakefile The scylla-gdb.py update is a bit hairry -- it needs to use task queues list for IO classes names and shares, but to detect it should it checks for the "commitlog" group is present. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13963	2023-06-06 13:29:16 +03:00
Raphael S. Carvalho	38b226f997	Resurrect optimization to avoid bloom filter checks during compaction Commit `8c4b5e4283` introduced an optimization which only calculates max purgeable timestamp when a tombstone satisfy the grace period. Commit 'repair: Get rid of the gc_grace_seconds' inverted the order, probably under the assumption that getting grace period can be more expensive than calculating max purgeable, as repair-mode GC will look up into history data in order to calculate gc_before. This caused a significant regression on tombstone heavy compactions, where most of tombstones are still newer than grace period. A compaction which used to take 5s, now takes 35s. 7x slower. The reason is simple, now calculation of max purgeable happens for every single tombstone (once for each key), even the ones that cannot be GC'ed yet. And each calculation has to iterate through (i.e. check the bloom filter of) every single sstable that doesn't participate in compaction. Flame graph makes it very clear that bloom filter is a heavy path without the optimization: 45.64% 45.64% sstable_compact sstable_compaction_test_g [.] utils::filter::bloom_filter::is_present With its resurrection, the problem is gone. This scenario can easily happen, e.g. after a deletion burst, and tombstones becoming only GC'able after they reach upper tiers in the LSM tree. Before this patch, a compaction can be estimated to have this # of filter checks: (# of keys containing any tombstone) * (# of uncompacting sstable runs[1]) [1] It's # of runs, as each key tend to overlap with only one fragment of each run. After this patch, the estimation becomes: (# of keys containing a GC'able tombstone) * (# of uncompacting runs). With repair mode for tombstone GC, the assumption, that retrieval of gc_before is more expensive than calculating max purgeable, is kept. We can revisit it later. But the default mode, which is the "timeout" (i.e. gc_grace_seconds) one, we still benefit from the optimization of deferring the calculation until needed. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13908	2023-05-18 09:01:50 +03:00
Raphael S. Carvalho	5544d12f18	compaction: avoid excessive reallocation and during input list formatting with off-strategy, input list size can be close to 1k, which will lead to unneeded reallocations when formatting the list for logging. in the past, we faced stalls in this area, and excessive reallocation (log2 ~1k = ~10) may have contributed to that. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes #13907	2023-05-17 09:40:06 +03:00
Raphael S. Carvalho	3b28c26c77	table: Allow tombstone GC in compaction to be disabled on user request If tombstone GC was disabled, compaction will ensure that fully expired sstables won't be bypassed and that no expired tombstones will be purged. Changing the value takes immediate effect even on ongoing compactions. Not wired into an API yet. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-05-12 10:14:28 -03:00
Kefu Chai	d8cd62b91a	compaction/compaction: initialize local variable the initial `validation_errors` should be zero. so let's initialize it instead of leaving it to uninitialized. this should address following warning from Clang-16: ``` /usr/bin/clang++ -DDEBUG -DDEBUG_LSA_SANITIZER -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSANITIZE -DSCYLLA_BUILD_MODE=debug -DSCYLLA_ENABLE_ERROR_INJECTION -DSEASTAR_API_LEVEL=6 -DSEASTAR_DEBUG -DSEASTAR_DEBUG_SHARED_PTR -DSEASTAR_DEFAULT_ALLOCATOR -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SHUFFLE_TASK_QUEUE -DSEASTAR_TYPE_ERASE_MORE -DXXH_PRIVATE_API -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/cmake/seastar/gen/include -I/home/kefu/dev/scylladb/build/cmake/gen -isystem /home/kefu/dev/scylladb/build/cmake/rust -Wall -Werror -Wno-error=deprecated-declarations -Wno-c++11-narrowing -Wno-mismatched-tags -Wno-overloaded-virtual -Wno-unsupported-friend -march=westmere -Og -g -gz -std=gnu++20 -fvisibility=hidden -U_FORTIFY_SOURCE -DSEASTAR_SSTRING -Wno-error=unused-result "-Wno-error=#warnings" -fstack-clash-protection -fsanitize=address -fsanitize=undefined -fno-sanitize=vptr -MD -MT compaction/CMakeFiles/compaction.dir/compaction.cc.o -MF compaction/CMakeFiles/compaction.dir/compaction.cc.o.d -o compaction/CMakeFiles/compaction.dir/compaction.cc.o -c /home/kefu/dev/scylladb/compaction/compaction.cc /home/kefu/dev/scylladb/compaction/compaction.cc:1681:9: error: variable 'validation_errors' is uninitialized when used here [-Werror,-Wuninitialized] validation_errors += co_await sst->validate(permit, descriptor.io_priority, cdata.abort, [&schema] (sstring what) { ^~~~~~~~~~~~~~~~~ /home/kefu/dev/scylladb/compaction/compaction.cc:1676:31: note: initialize the variable 'validation_errors' to silence this warning uint64_t validation_errors; ^ = 0 ``` the change which introduced this local variable was `7ba5c9cc6a`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #13813	2023-05-09 22:49:29 +03:00
Botond Dénes	10fe76a0fe	compaction/compaction: remove now unused scrub_validate_mode_validate_reader()	2023-05-02 09:42:42 -04:00
Botond Dénes	f6e5be472d	compaction/compaction: move away from scrub_validate_mode_validate_reader() Use sstable::validate() directly instead.	2023-05-02 09:42:42 -04:00
Botond Dénes	7ba5c9cc6a	compaction/compaction: scrub_sstables_validate_mode(): validate sstables one-by-one Currently said method creates a combined reader from all the sstables passed to it then validates this combined reader. Change it to validate each sstable (reader) individually in preparation of the new validate method which can handle a single sstable at a time. Note that this is not going to make much impact in practice, all callers pass a single sstable to this method already.	2023-05-02 09:42:41 -04:00
Botond Dénes	e8c7ba98f1	compaction: scrub: use error messages from validator	2023-05-02 09:42:41 -04:00
Benny Halevy	2e24b05122	compaction: make_partition_filter: do not assert shard ownership Now, with `f1bbf705f9` (Cleanup sstables in resharding and other compaction types), we may filter sstables as part of resharding compaction and the assertion that all tokens are owned by the current shard when filtering is no longer true. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-23 15:24:20 +03:00
Benny Halevy	9105f9800c	sstables: add a printer for shared_sstable Refactor the printing logic in compaction::formatted_sstables_list out to sstables::to_string(const shared_sstable&, bool include_origin) and operator<<(const shared_sstable) on top of it. So that we can easily print std::vector<shared_sstable> from compaction_manager in the next patch. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 23:31:35 +03:00
Benny Halevy	0c6ce5af74	compaction: move owned ranges filtering to base class Move the token filtering logic down from cleanup_compaction to regular_compaction and class compaction so it can be reused by other compaction types. Create a _owned_ranges_checker in class compaction when _owned_ranges is engaged, and use it in compaction::setup to filter partitions based on the owned ranges. Ref scylladb/scylladb#12998 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:55:09 +03:00
Benny Halevy	09df04c919	compaction: move owned_ranges into descriptor Move the owned_ranges_ptr, currently used only by cleanup and upgrade compactions, to the generic compaction descriptor so we apply cleanup in other compaction types. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-04-10 22:52:12 +03:00
Pavel Emelyanov	8a061bd862	sstables, code: Introduce and use change_state() call The call moves the sstable to the specified state. The change state is translated into the storage driver state change which is for todays filesystem storage means moving between directories. The "normal" state maps to the base dir of the table, there's no dedicated subdir for this state and this brings some trouble into the play. The thing is that in order to check if an sstable is in "normal" state already its impossible to compare filename of its path to any pre-defined values, as tables' basdirs are dynamic. To overcome this, the change-state call checks that the sstable is in one of "known" sub-states, and assumes that it's in normal state otherwise. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-02-21 17:39:34 +03:00
Kefu Chai	0cb842797a	treewide: do not define/capture unused variables these warnings are found by Clang-17 after removing `-Wno-unused-lambda-capture` and '-Wno-unused-variable' from the list of disabled warnings in `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-02-15 22:57:18 +02:00
Avi Kivity	69a385fd9d	Introduce schema/ module Schema related files are moved there. This excludes schema files that also interact with mutations, because the mutation module depends on the schema. Those files will have to go into a separate module. Closes #12858	2023-02-15 11:01:50 +02:00
Avi Kivity	c5e4bf51bd	Introduce mutation/ module Move mutation-related files to a new mutation/ directory. The names are kept in the global namespace to reduce churn; the names are unambiguous in any case. mutation_reader remains in the readers/ module. mutation_partition_v2.cc was missing from CMakeLists.txt; it's added in this patch. This is a step forward towards librarization or modularization of the source base. Closes #12788	2023-02-14 11:19:03 +02:00
Raphael S. Carvalho	5a784c3c6d	treewide: Use new sstable_set::size() wherever possible That's the preferred alternative because it's zero copy. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-03 10:38:04 -03:00
Benny Halevy	82011fc489	dht: incremental_owned_ranges_checker: belongs_to_current_node: mark as const Its _it member keeps state about the current range. Although it's modified by the method, this is an implementation detail that irrelevant to the caller, hence mark the belongs_to_current_node method as const (and noexcept while at it). This allows the caller, cleanup_compaction, to use it from inside a const method, without having to mark its respective member as mutable too. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes #12634	2023-01-25 14:52:21 +02:00
Tomasz Grabiec	23e4c83155	position_in_partition: Make after_key() work with non-full keys This fixes a long standing bug related to handling of non-full clustering keys, issue #1446. after_key() was creating a position which is after all keys prefixed by a non-full key, rather than a position which is right after that key. This will issue will be caught by cql_query_test::test_compact_storage in debug mode when mutation_partition_v2 merging starts inserting sentinels at position after_key() on preemption. It probably already causes problems for such keys.	2022-12-14 14:47:33 +01:00

1 2 3 4

186 Commits