scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-29 11:10:40 +00:00

Author	SHA1	Message	Date
Avi Kivity	a34edb0a93	frozen_mutation: add unfreeze_gently(span<frozen_mutation>) While we have unfreeze(vector<frozen_mutation>), a gentle version is preferred.	2024-03-17 17:45:30 +02:00
Kefu Chai	e1a9340cc1	partition_version: add fmt::formatter for partition_entry::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `parition_entry::printer`, and drop its operator<< . Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17812	2024-03-15 09:52:27 +02:00
Botond Dénes	53e3325845	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * mutation_partition_v2::printer * frozen_mutation::printer * mutation their operator<<:s are dropped. Refs #13245 Closes scylladb/scylladb#17769 * github.com:scylladb/scylladb: mutation: add fmt::formatter for mutation mutation: add fmt::formatter for frozen_mutation::printer mutation: add fmt::formatter for mutation_partition_v2::printer	2024-03-13 10:13:09 +02:00
Pavel Emelyanov	d90db016bf	treewide: Use partition_slice::is_reversed() Continuation of `cc56a971e8`, more noisy places detected Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#17763	2024-03-13 08:52:46 +02:00
Kefu Chai	2d319fa789	mutation: add fmt::formatter for mutation before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation. but its operator<< is preserved, as we are still using our homebrew generic formatter for printing `std::vector<mutation>`, and this formatter is using operator<< for printing the elements in vector. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 11:07:42 +08:00
Kefu Chai	acd14f12f0	mutation: add fmt::formatter for frozen_mutation::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for frozen_mutation::printer, and drop its operator. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:22 +08:00
Kefu Chai	94d25e02ad	mutation: add fmt::formatter for mutation_partition_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_partition_v2::printer, and drop its operator<< Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-03-13 10:47:09 +08:00
Kefu Chai	f5f5ff1d51	clustering_interval_set: add fmt::formatter for clustering_interval_set before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `clustering_interval_set` their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17593	2024-03-08 15:13:14 +02:00
Kefu Chai	0fd85a98a9	mutation: add fmt::formatter for position_range before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `position_range`, and the helpers for printing related types are dropped. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 20:15:57 +08:00
Kefu Chai	2f532b9ebc	mutation: add fmt::formatter for mutation_fragment and range_tombstone_stream before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * mutation_fragment * range_tombstone_stream their operator<<:s are dropped Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 20:15:57 +08:00
Kefu Chai	1fe7a467e7	mutation: add fmt::formatter for mutation_fragment_v2::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for mutation_fragment_v2::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-26 17:47:05 +08:00
Nadav Har'El	b4cef638ef	Merge 'mutation: add fmt::formatter for mutation types' from Kefu Chai before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * canonical_mutation * atomic_cell_view * atomic_cell * atomic_cell_or_collection::printer Refs #13245 Closes scylladb/scylladb#17506 * github.com:scylladb/scylladb: mutation: add fmt::formatter for canonical_mutation mutation: add fmt::formatter for atomic_cell_view and atomic_cell mutation: add fmt::formatter for atomic_cell_or_collection::printer	2024-02-25 09:48:56 +02:00
Kefu Chai	84ba624415	mutation: add fmt::formatter for canonical_mutation before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for canonical_mutation Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:48:13 +08:00
Kefu Chai	3625796222	mutation: add fmt::formatter for atomic_cell_view and atomic_cell before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * atomic_cell_view * atomic_cell and drop their operator<<:s. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:19:11 +08:00
Kefu Chai	b4fa32ec17	mutation: add fmt::formatter for atomic_cell_or_collection::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `atomic_cell_or_collection::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-25 12:18:41 +08:00
Nadav Har'El	0aaa6b1a08	fmt: add formatter for mutation_fragment_v2::kind Unfortunately, fmt v10 dropped support for operator<< formatters, forcing us to replace the huge number of operator<< implementations in our code by uglier and templated fmt::formatter implementations to get Scylla to compile on modern distros (such as Fedora 39) :-( Kefu has already started doing this migration, here is my small contribution - the formatter for mutation_fragment_v2::kind. This patch is need to compile, for example, build/dev/mutation/mutation_fragment_stream_validator.o. I can't remove the old operator<< because it's still used by the implementation of other operator<< functions. We can remove all of them when we're done with this coversion. In the meantime, I replaced the original implementation of operator<< by a trivial implementation just passing the work to the new fmt::print support. Refs #13245 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#17432	2024-02-23 10:25:39 +02:00
Kefu Chai	37c6073fd5	mutation: add fmt::formatter for clustering_row and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * clustering_row::printer * static_row::printer * partition_start * partition_end * mutation_fragment::printer and drop their operator<<:s Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 17:53:34 +08:00
Kefu Chai	b61b5a8b5d	mutation: add fmt::formatter for row_tombstone and friends before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for * row_tombstone * row_marker * deletable_row::printer * row::printer Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-02-22 12:44:33 +08:00
Kefu Chai	acefde0735	mutation: add fmt::formatter for mutation_partition::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `mutation_partition::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17419	2024-02-20 09:01:22 +02:00
Kefu Chai	b309e42195	collection_mutation: add formatter for collection_mutation_view::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define formatters for `collection_mutation_view::printer`, and drop its operator<<. Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17300	2024-02-13 17:42:25 +02:00
Botond Dénes	120442231f	Merge 'row_cache: test cache consistency during multi-partition cache updates' from Michał Chojnowski Adds a test reproducing https://github.com/scylladb/scylladb/issues/16759, and the instrumentation needed for it. Closes scylladb/scylladb#17208 * github.com:scylladb/scylladb: row_cache_test: test cache consistency during memtable-to-cache merge row_cache: use preemption_source in update() utils: preempt: add preemption_source	2024-02-13 17:37:06 +02:00
Kefu Chai	54ed65bb50	mutation: s/statics/static content/ codespell reports that "statics" could be the misspelling of "statistics". but "static" here means the static column(s). so replace "static" with more specific wording. Refs #589 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17216	2024-02-13 17:33:21 +02:00
Michał Chojnowski	bed20a2e37	row_cache: use preemption_source in update() To facilitate testing the state of cache after the update is preempted at various points, pass a preemption_source& to update() instead of calling the reactor directly. In release builds, the calls to preemption_source methods should compile to the same direct reactor calls as today. Only in dev mode they should add an extra branch. (However, the `preemption_source&` argument has to be shoveled in any mode).	2024-02-07 18:31:36 +01:00
Kefu Chai	b931d93668	treewide: fix misspellings in code comments these misspellings are identified by codespell. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#17004	2024-01-31 09:16:10 +02:00
Patryk Wrobel	ba488b10ec	mutation/mutation.hh: remove anonymous namespace from header Anonymous namespace implies internal linkage for its members. When it is defined in a header, then each translation unit, which includes such header defines its own unique instance of members of the unnamed namespace that are ODR-used within that translation unit. This can lead to unexpected results including code bloat or undefined behavior due to ODR violations. This change aligns the code with the following guidelines: - CppCoreGuidelines: "SF.21: Don’t use an unnamed (anonymous) namespace in a header" - SEI CERT C++: "DCL59-CPP. Do not define an unnamed namespace in a header file" Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>	2024-01-26 08:38:39 +01:00
Kefu Chai	d1dd71fbd7	mutation: do not include unused headers these unused includes were identified by clangd. see https://clangd.llvm.org/guides/include-cleaner#unused-include-warning for more details on the "Unused include" warning. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16889	2024-01-21 16:58:26 +02:00
Kefu Chai	dd496afff3	mutation: add formatter for {atomic_cell_view,atomic_cell}::printer before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we define a formatter for `atomic_cell_view::printer` and `atomic_cell::printer` respectively, and remove their operator<<(). Refs #13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16602	2024-01-02 16:14:42 +02:00
Yaniv Kaul	ae2ab6000a	Typos: fix typos in code Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors. Refs: https://github.com/scylladb/scylladb/issues/16255	2023-12-05 15:18:11 +02:00
Yaniv Kaul	c658bdb150	Typos: fix typos in comments Fixes some typos as found by codespell run on the code. In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc. Follow-up commits will take care of them. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2023-12-02 22:37:22 +02:00
Botond Dénes	65e42e4166	Merge 'mutation_query: properly send range tombstones in reverse queries' from Michał Chojnowski reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, will not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired. This series fixes the bug and adds a minimal reproducer test. Fixes #10598 Closes scylladb/scylladb#16003 * github.com:scylladb/scylladb: mutation_query_test: test that range tombstones are sent in reverse queries mutation_query: properly send range tombstones in reverse queries	2023-11-21 09:19:14 +02:00
Michał Chojnowski	9ccd4ea416	partition_version: fix violation of "older versions are evicted first" during schema upgrades A schema upgrade appends a MVCC version B after an existing version A. The last dummy in B is added to the front of LRU, so it will be evicted after the entries in A. This alone doesn't quite violate the "older versions are evicted first" rule, because the new last dummy carries no information. But apply_monotonically generally assumes that entries on the same position have the obvious eviction order, even if they carry no information. Thus, after the merge, the rule can become broken. The proposed fix is as follows: - In the case where A is merged into B, the merged last dummy inherits the link of A. - The merging of B into anything is prevented until its merge with A is finished. This is relatively hacky, because it still involves a state that goes against some natural expectations granted by the "older versions..." rule. A less hacky fix would be to ensure that the new dummy is inserted into a proper place in the eviction order to begin with. Or, better yet, we could eliminate the rule altogether. Aside from being very hard to maintain, it also prevents the introduction of any eviction algorithm other than LRU.	2023-11-16 19:01:18 +01:00
Kefu Chai	efd65aebb2	build: cmake: add check-header target to have feature parity with `configure.py`. we won't need this once we migrate to C++20 modules. but before that day comes, we need to stick with C++ headers. we generate a rule for each .hh files to create a corresponding .cc and then compile it, in order to verify the self-containness of that header. so the number of rule is quite large, to avoid the unnecessary overhead. the check-header target is enabled only if `Scylla_CHECK_HEADERS` option is enabled. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15913	2023-11-13 10:27:06 +02:00
Michał Chojnowski	002357e238	mutation_query: properly send range tombstones in reverse queries reconcilable_result_builder passes range tombstone changes to _rt_assembler using table schema, not query schema. This means that a tombstone with bounds (a; b), where a < b in query schema but a > b in table schema, are not be emitted from mutation_query. This is a very serious bug, because it means that such tombstones in reverse queries are not reconciled with data from other replicas. If any queried replica has a row, but not the range tombstone which deleted the row, the reconciled result will contain the deleted row. In particular, range deletes performed while a replica is down, will not later be visible to reverse queries which select this replica, regardless of the consistency level. As far as I can see, this doesn't result in any persistent data loss. Only in that some data might appear resurrected to reverse queries, until the relevant range tombstone is fully repaired.	2023-11-08 14:54:48 +01:00
Benny Halevy	a1acf6854b	everywhere: reduce dependencies on i_partitioner.hh Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2023-11-05 20:47:44 +02:00
Michał Chojnowski	93ea3d41d8	position_in_partition: make operator= exception-safe The copy assignment operator of _ck can throw after _type and _bound_weight have already been changed. This leaves position_in_partition in an inconsistent state, potentially leading to various weird symptoms. The problem was witnessed by test_exception_safety_of_reads. Specifically: in cache_flat_mutation_reader::add_to_buffer, which requires the assignment to _lower_bound to be exception-safe. The easy fix is to perform the only potentially-throwing step first. Fixes #15822 Closes scylladb/scylladb#15864	2023-10-29 18:30:32 +02:00
Botond Dénes	9231454acd	mutation/json: extract generic streaming writer into utils/rjson.hh This writer is generally useful, not just for writing mutations as json. Make it generally available as well.	2023-10-20 10:04:56 -04:00
Kefu Chai	9bc0a9f95e	mutation: do not include unused header the `utils::UUID` class is not used by the implementation of `canonical_mutation`, so let's remove the include from this source file. the `#include` was originally added in `5a353486c6`, but that commit did add any code using UUID to this file. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15731	2023-10-17 20:38:07 +03:00
Tomasz Grabiec	8b7623f49e	database: Fix accounting of small partitions in mutation query The partition key size was ignored by the accounter, as well as the partition tombstone. As a result, a sequence of partitions with just tombstones would be accounted as taking no memory and page size limitter to not kick in. Fix by accounting the real size of accumulated frozen_mutation. Also, break pages across partitions even if there are no live rows. The coordinator can handle it now. Refs #7933	2023-09-22 02:53:14 -04:00
Tomasz Grabiec	17c1cad4b4	database, storage_proxy: Reconcile pages with no live rows incrementally Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stems from the fact that resulting reconcilable_result will be large: * Large allocations. Serialization of reconcilable_result causes large allocations for storing result rows in std::deque * Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s. * Large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs #9111. This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows. This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do. My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition): Node1: 1 live row, 1M dead rows Node2: 1M dead rows, 1 live row This was designed to trigger reconciliation right from the very start of the query. Before: Running query (node2, CL=ONE, cold cache) Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] After: Running query (node2, CL=ONE, cold cache) Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before). Refs #7929 Refs #3672 Refs #7933 Fixes #9111	2023-09-22 02:53:14 -04:00
Botond Dénes	7e7101c180	Revert "Merge 'database, storage_proxy: Reconcile pages with dead rows and partitions incrementally' from Botond Dénes" This reverts commit `628e6ffd33`, reversing changes made to `45ec76cfbf`. The test included with this PR is flaky and often breaks CI. Revert while a fix is found. Fixes: #15371	2023-09-13 10:45:37 +03:00
Tomasz Grabiec	0d773c9f9f	database: Fix accounting of small partitions in mutation query The partition key size was ignored by the accounter, as well as the partition tombstone. As a result, a sequence of partitions with just tombstones would be accounted as taking no memory and page size limitter to not kick in. Fix by accounting the real size of accumulated frozen_mutation. Also, break pages across partitions even if there are no live rows. The coordinator can handle it now. Refs #7933	2023-09-11 06:56:13 -04:00
Tomasz Grabiec	2c8a0e4175	database, storage_proxy: Reconcile pages with no live rows incrementally Currently, mutation query on replica side will not respond with a result which doesn't have at least one live row. This causes problems if there is a lot of dead rows or partitions before we reach a live row, which stems from the fact that resulting reconcilable_result will be large: * Large allocations. Serialization of reconcilable_result causes large allocations for storing result rows in std::deque * Reactor stalls. Serialization of reconcilable_result on the replica side and on the coordinator side causes reactor stalls. This impacts not only the query at hand. For 1M dead rows, freezing takes 130ms, unfreezing takes 500ms. Coordinator does multiple freezes and unfreezes. The reactor stall on the coordinator side is >5s. * Large repair mutations. If reconciliation works on large pages, repair may fail due to too large mutation size. 1M dead rows is already too much: Refs #9111. This patch fixes all of the above by making mutation reads respect the memory accounter's limit for the page size, even for dead rows. This patch also addresses the problem of client-side timeouts during paging. Reconciling queries processing long strings of tombstones will now properly page tombstones,like regular queries do. My testing shows that this solution even increases efficiency. I tested with a cluster of 2 nodes, and a table of RF=2. The data layout was as follows (1 partition): Node1: 1 live row, 1M dead rows Node2: 1M dead rows, 1 live row This was designed to trigger reconciliation right from the very start of the query. Before: Running query (node2, CL=ONE, cold cache) Query done, duration: 140.0633503ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 66.7195275ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 873.5400742ms, pages: 2, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] After: Running query (node2, CL=ONE, cold cache) Query done, duration: 136.9035122ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (node2, CL=ONE, hot cache) Query done, duration: 69.5286021ms, pages: 101, result: [Row(pk=0, ck=3000000, v=0)] Running query (all-nodes, CL=ALL, reconcile, cold-cache) Query done, duration: 162.6239498ms, pages: 100, result: [Row(pk=0, ck=0, v=0), Row(pk=0, ck=3000000, v=0)] Non-reconciling queries have almost identical duration (1 few ms changes can be observed between runs). Note how in the after case, the reconciling read also produces 100 pages, vs. just 2 pages in the before case, leading to a much lower duration (less than 1/4 of the before). Refs #7929 Refs #3672 Refs #7933 Fixes #9111	2023-09-11 06:56:13 -04:00
Tomasz Grabiec	4e9d95d78c	Merge 'Compact data before streaming' from Botond Dénes Currently, streaming and repair processes and sends data as-is. This is wasteful: streaming might be sending data which is expired or covered by tombstones, taking up valuable bandwidth and processing time. Repair additionally could be exposed to artificial differences, due to different nodes being in different states of compactness. This PR adds opt-in compaction to `make_streaming_reader()`, then opts in all users. The main difference being in how these choose the current compaction time to use: * Load'n'stream and streaming uses the current time on the local node. * Repair uses a centrally chosen compaction time, generated on the repair master and propagated to al repair followers. This is to ensure all repair participants work with the exact state of compactness. Importantly, this compaction does not purge tombstones (tombstone GC is disabled completely). Fixes: https://github.com/scylladb/scylladb/issues/3561 Closes #14756 * github.com:scylladb/scylladb: replica: make_[multishard_]streaming_reader(): make compaction_time mandatory repair/row_level: opt in to compacting the stream streaming: opt-in to compacting the stream sstables_loader: opt-in for compacting the stream replica/table: add optional compacting to make_multishard_streaming_reader() replica/table: add optional compacting to make_streaming_reader() db/config: add config item for enabling compaction for streaming and repair repair: log the error which caused the repair to fail readers: compacting_reader: use compact_mutation_state::abandon_current_partition() mutation/mutation_compactor: allow user to abandon current partition	2023-07-28 16:42:13 +02:00
Botond Dénes	7351c8424d	mutation/mutation_rebuilder: add comment about validity of returned mutation reference Closes #14853	2023-07-27 12:13:46 +03:00
Botond Dénes	326c3b92e5	mutation/mutation_compactor: allow user to abandon current partition Currently, the compactor requires a valid stream and thus abandoning a partition in the middle was not possible. This causes some complications for the compacting reader, which implements methods such as `next_partition()` which is possibly called in the middle of a partition. In this case the compacting reader attempts to close the partition properly by inserting a synthetic partition-end fragment into the stream. This is not enough however as it doesn't close any range tombstone changes that might be active. Instead of piling on more complexity, add an API to the compactor which allows abandoning the current partition.	2023-07-27 02:50:44 -04:00
Botond Dénes	fda4168300	mutation/mutation_rebuilder*: return const mutation& from consume_new_partition() To allow const access to the mutation under construction, e.g. so the user can query its size.	2023-07-25 10:34:31 -04:00
Botond Dénes	e6fa21d1b3	mutation/mutation: add memory_usage()	2023-07-25 10:34:30 -04:00
Nadav Har'El	5860820934	Merge 'mutation/mutation_compactor: validate the input stream' from Botond Dénes The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to the validator for the most part. This mistake was introduced by https://github.com/scylladb/scylladb/commit `e2c9cdb576`, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor. This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in. The existing unit test checking the validation in the compactor is greatly expanded to check all (I hope) different validation scenarios. Closes #13817 * github.com:scylladb/scylladb: test/mutation_test: test_compactor_validator_sanity_test mutation/mutation_compactor: fix indentation mutation/mutation_compactor: validate the input stream mutation: mutation_fragment_stream_validating_filter: add accessor to underlying validator readers: reader-from-fragment: don't modify stream when created without range	2023-07-21 00:26:46 +03:00
Botond Dénes	18ed94e60b	mutation/mutation_compactor: fix indentation Left broken by the previous patch.	2023-07-20 08:48:50 -04:00
Botond Dénes	3d5b70e0d7	mutation/mutation_compactor: validate the input stream The mutation compactor has a validator which it uses to validate the stream of mutation fragments that passes through it. This validator is supposed to validate the stream as it enters the compactor, as opposed to its compacted form (output). This was true for most fragment kinds except range tombstones, as purged range tombstones were not visible to the validator for the most part. This mistake was introduced by `e2c9cdb576`, which itself was a flawed attempt at fixing an error seen because purged tombstones were not terminated by the compactor. This patch corrects this mistake by fixing the above problem properly: on page-cut, if the validator has an active tombstone, a closing tombstone is generated for it, to avoid the false-positive error. With this, range tombstones can be validated again as they come in.	2023-07-20 08:48:50 -04:00

1 2 3

127 Commits