scylladb

Author	SHA1	Message	Date
Botond Dénes	104a47699c	mutation_fragment_stream_validator: add reset methods Allow resetting the validator to a given partition or mutation fragment. This allows a user which is able to fix corrupt streams to reset the validator to a partition or row which the validator normally wouldn't accept and hence it wouldn't advance its internal state to it.	2021-05-05 12:03:42 +03:00
Benny Halevy	b134640829	flat_mutation_reader: abort if not closed before destroyed The motivation to abort if the reader is not closed before its destroyed is mainly driven by: 1. Aborting will force us find and fix missing closes. Otherwise, log warnings can easily be lost in the noise. (ERRORs however are caught by dtests but won't be necessarily caught in SCT / production environments) 2. Following patches remove existing cleanup code in destructors that is not needed once close() is mandated. If we don't abort on missing close we'll have to keep maintaining both cleanup paths forever. 3. Not enforcing close exposes us to leaks and potential use-after-free from background tasks that are left behind. We want to stop guranteeing the safety of the background tasks post close(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	5b22731f9a	flat_mutation_reader: require close Make flat_mutation_reader::impl::close pure virtual so that all implementations are required to implemnt it. With that, provide a trivial implementation to all implementations that currently use the default, trivial close implementation. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:35:07 +03:00
Benny Halevy	0da2eea211	flat_mutation_reader: flat_multi_range_mutation_reader: close underlying reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	18268ab474	flat_mutation_reader: forwardable_empty_mutation_reader: close optional underlying reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	e2e642b1b1	flat_mutation_reader: make_forwardable, make_nonforwardable: close underlying reader Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	978501c336	flat_mutation_reader: partition_reversing_mutation_reader: implement no-op close We don't own _source therefore do not close it. That said, we still need to make sure that the reversing reader itself is closed to calm down the check when it's destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	f4dfaaa6c9	flat_mutation_reader: delegating_reader: close reader when moved to it The underlying reader is owned by the caller if it is moved to it, but not if it was constructed with a reference to the underlying reader. Close the underlying reader on close() only in the former case. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	ca06d3c92a	flat_mutation_reader: log a warning if destroyed without closing We cannot close in the background since there are use cases that require the impl to be destroyed synchronously. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Benny Halevy	a471579bd7	flat_mutation_reader: introduce close Allow closing readers before destorying them. This way, outstanding background operations such as read-aheads can be gently canceled and be waited upon. Note that similar to destructors, close must not fail. There is nothing to do about errors after the f_m_r is done. Enforce that in flat_mutation_reader::close() so if the f_m_r implementation did return a failure, report it and abort as internal error. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-04-25 11:16:10 +03:00
Botond Dénes	727bc0f5d4	mutation_fragment_stream_validator: add token validation level In some cases the full-blown partition key validation and especially the associated key copy per partition might be deemed too costly. As a next best thing this patch adds a token only validation, which should cover 99% (number pulled out of my sleeve) of the cases. Let's hope no one gets unlucky.	2021-03-01 07:49:23 +02:00
Botond Dénes	694f8a4ec6	mutation_fragment_stream_validating_filter: make validation levels more fine-grained Currently key order validation for the mutation fragment stream validating filter is all or nothing. Either no keys (partition or clustering) are validated or all of them. As we suspect that clustering key order validation would add a significant overhead, this discourages turning key validation on, which means we miss out on partition key monotonicity validation which has a much more moderate cost. This patch makes this configurable in a more fine-grained fashion, providing separate levels for partition and clustering key monotonicity validation. As the choice for the default validation level is not as clear-cut as before, the default value for the validation level is removed in the validating filter's constructor.	2021-03-01 07:49:23 +02:00
Pavel Emelyanov	bfcd6a4bb7	flat_mutation_reader: Use clear() in destroy_current_mutation() Currently the code uses a look of unlink_leftmost_without_rebalance calls. B-tree does have it, but plain clearing of the tree is a bit faster with clear(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Botond Dénes	8dae6152bf	mutation_fragment_stream_validator: make it easier to validate concrete fragment types The current API is tailored to the `mutation_fragment` type. In the next patch we will want to use the validator from a context where the mutation fragments are already decomposed into their respective concrete types, e.g. static_row, clustering_row, etc. To avoid having to reconstruct a mutation fragment type just to use the validator, add an API which allows validating these concrete types conveniently too.	2021-01-11 08:07:42 +02:00
Botond Dénes	495f9d54ba	flat_mutation_reader: extract fragment stream validator into its own header To allow using it without pulling in the huge `flat_mutation_reader.hh`.	2021-01-11 08:07:42 +02:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Botond Dénes	256140a033	mutation_fragment: memory_usage(): remove unused schema parameter The memory usage is now maintained and updated on each change to the mutation fragment, so it needs not be recalculated on a call to `memory_usage()`, hence the schema parameter is unused and can be removed.	2020-09-28 11:27:47 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	0518571e56	flat_mutation_reader: make _buffer a tracked buffer Via a tracked_allocator. Although the memory allocations made by the _buffer shouldn't dominate the memory consumption of the read itself, they can still be a significant portion that scales with the number of readers in the read.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Pavel Emelyanov	f19b85b61d	range_tombstone_list: Introduce and use pop-and-lock helper There's an optimization in flat_mutation_reader_from_mutations that folds the list from left-to-right in linear time. In case of currently used boost::set the .unlink_leftmost_without_rebalance helper is used, so wrap this exception with a method of the range_tombstone_list. This is the last place where caller need to mess with the exact internal collection. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	a89c7198c2	range_tombstone_list: Introduce and use pop_as<>() The method extracts an element from the list, constructs a desired object from it and frees. This is common usage of range_tombstone_list. Having a helper helps encapsulating the exact collection inside the class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	27912375b2	flat_mutation_reader: Use range_tombstone_list begin/end API The goal is to stop revealing the exact collection from the range_tombstone_list, so make use of existing begin/end methods and extend with rbegin() where needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Botond Dénes	92ce39f014	query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field We want to switch from using a single limit to a dual soft/hard limit. As a first step we switch the limit field of `query_class_config` to use the recently introduced type for this. As this field has a single user at the moment -- reverse queries (and not a lot of propagation) -- we update it in this same patch to use the soft/hard limit: warn on reaching the soft limit and abort on the hard limit (the previous behaviour).	2020-07-28 18:00:29 +03:00
Piotr Sarna	4cb79f04b0	treewide: replace libjsoncpp usage with rjson In order to eventually switch to a single JSON library, most of the libjsoncpp usage is dropped in favor of rjson. Unfortunately, one usage still remains: test/utils/test_repl utility heavily depends on the exact textual format of its output JSON files, so replacing a library results in all tests failing because of differences in formatting. It is possible to force rjson to print its documents in the exact matching format, but that's left for later, since the issue is not critical. It would be nice though if our test suite compared JSON documents with a real JSON parser, since there are more differences - e.g. libjsoncpp keeps children of the object sorted, while rapidjson uses an unordered data structure. This change should cause no change in semantics, it strives just to replace all usage of libjsoncpp with rjson.	2020-07-03 10:27:23 +02:00
Botond Dénes	0b4ec62332	flat_mutation_reader: flat_multi_range_reader: add reader_permit parameter Mutation sources will soon require a valid permit so make sure we have one and pass it to the mutation sources when creating the underlying readers. For now, pass no_reader_permit() on call sites, deferring the obtaining of a valid permit to later patches.	2020-05-28 11:34:35 +03:00
Botond Dénes	196dd5fa9b	treewide: throw std::bad_function_call with backtraces We typically use `std::bad_function_call` to throw from mandatory-to-implement virtual functions, that cannot have a meaningful implementation in the derived class. The problem with `std::bad_function_call` is that it carries absolutely no information w.r.t. where was it thrown from. I originally wanted to replace `std::bad_function_call` in our codebase with a custom exception type that would allow passing in the name of the function it is thrown from to be included in the exception message. However after I ended up also including a backtrace, Benny Halevy pointed out that I might as well just throw `std:bad_function_call` with a backtrace instead. So this is what this patch does. All users are various unimplemented methods of the `flat_mutation_reader::impl` interface. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>	2020-04-08 13:54:06 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Botond Dénes	091d80e8c3	flat_mutation_reader: expose reverse reader as a standalone reader Currently reverse reads just pass a flag to `flat_mutation_reader::consume()` to make the read happen in reverse. This is deceptively simple and streamlined -- while in fact behind the scenes a reversing reader is created to wrap the reader in question to reverse partitions, one-by-one. This patch makes this apparent by exposing the reversing reader via `make_reversing_reader()`. This now makes how reversing works more apparent. It also allows for more configuration to be passed to the reversing reader (in the next patches). This change is forward compatible, as in time we plan to add reversing support to the sstable layer, in which case the reversing reader will go.	2020-02-27 18:11:54 +02:00
Botond Dénes	1b7725af4b	mutation_fragment_stream_validator: split into low-level and high-level API The low-level validator allows fine-grained validation of different aspects of monotonicity of a fragment stream. It doesn't do any error handling. Since different aspects can be validated with different functions, this allows callers to understand what exactly is invalid. The high-level API is the previous fragment filter one. This is now built on the low-level API. This division allows for advanced use cases where the user of the validator wants to do all error handling and wants to decide exactly what monotonicity to validate. The motivating use-case is scrubbing compaction, added in the next patches.	2020-02-13 15:02:32 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Botond Dénes	a74a82d4d2	flat_mutation_reader: mutation_fragment_stream_validator: add name Add a name parameter to the validator, so that the validator can be identified in log messages. Schema identity information is added to the name automatically. This should help pinpoint the problematic place where validation failed. Although at the moment we have a single validator, it still benefits from having a name, as we can now include in it the name of the sstable being written and hence trace the source of the bad data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200117150616.895878-1-bdenes@scylladb.com>	2020-01-20 11:06:30 +01:00
Botond Dénes	08bb0bd6aa	mutation_fragment_stream_validator: wrap exceptions into own exception type So a higher level component using the validator to validate a stream can catch only validation errors, and let any other incidental exception through. This allows building data correctors on top of the `mutation_fragment_stream_validator`, by filtering a fragment stream through a validator, catching invalid fragment stream exceptions and dropping the respective fragments from the stream. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>	2019-12-20 12:05:00 +01:00
Benny Halevy	79d5fed40b	mutation_fragment_stream_validator: validate end of stream in partition_key filter Currently end of stream validation is done in the destructor, but the validator may be destructed prematurely, e.g. on exception, as seen in https://github.com/scylladb/scylla/issues/5215 This patch adds a on_end_of_stream() method explicitly called by consume_pausable_in_thread. Also, the respective concepts for ParitionFilter, MutationFragmentFilter and a new on for the on_end_of_stream method were unified as FlattenedConsumerFilter. Refs #5215 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)	2019-10-29 12:35:33 +01:00
Benny Halevy	d5f53bc307	mutation_fragment_stream_validator: validate partition key monotonicity Fixes #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)	2019-10-29 12:35:33 +01:00
Nadav Har'El	51fc6c7a8e	make static_row optional to reduce memory footprint Merged patch series from Avi Kivity: The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by allocating it as an external object rather than inlined into mutation_partition. This adds overhead when the static row is present (17 bytes for the reference, back reference, and lsa allocator overhead). perf_simple_query appears to marginally (2%) faster. Footprint is reduced by ~9% for a cache entry, 12% in memtables. More details are provided in the patch commitlog. Tests: unit (debug) Avi Kivity (4): managed_ref: add get() accessor managed_ref: add external_memory_usage() mutation_partition: introduce lazy_row mutation_partition: make static_row optional to reduce memory footprint cell_locking.hh \| 2 +- converting_mutation_partition_applier.hh \| 4 +- mutation_partition.hh \| 284 ++++++++++++++++++++++- partition_builder.hh \| 4 +- utils/managed_ref.hh \| 12 + flat_mutation_reader.cc \| 2 +- memtable.cc \| 2 +- mutation_partition.cc \| 45 +++- mutation_partition_serializer.cc \| 2 +- partition_version.cc \| 4 +- tests/multishard_mutation_query_test.cc \| 2 +- tests/mutation_source_test.cc \| 2 +- tests/mutation_test.cc \| 12 +- tests/sstable_mutation_test.cc \| 10 +- 14 files changed, 355 insertions(+), 32 deletions(-)	2019-10-22 12:25:15 +03:00
Avi Kivity	acc433b286	mutation_partition: make static_row optional to reduce memory footprint The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by using lazy_row instead of row. Some call sites treewide were adjusted to deal with the extra indirection. perf_simple_query appears to improve by 2%, from 163krps to 165 krps, though it's hard to be sure due to noisy measurements. memory_footprint comparisons (before/after): mutation footprint: mutation footprint: - in cache: 1096 - in cache: 992 - in memtable: 854 - in memtable: 750 - in sstable: 351 - in sstable: 351 - frozen: 540 - frozen: 540 - canonical: 827 - canonical: 827 - query result: 342 - query result: 342 sizeof(cache_entry) = 112 sizeof(cache_entry) = 112 -- sizeof(decorated_key) = 36 -- sizeof(decorated_key) = 36 -- sizeof(cache_link_type) = 32 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 200 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 112 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 sizeof(atomic_cell_or_collection) = 8 Tests: unit (dev)	2019-10-15 15:42:05 +03:00
Tomasz Grabiec	3177732b35	flat_mutation_reader: Introduce upgrade_schema()	2019-10-03 13:28:33 +02:00
Benny Halevy	507c99c011	mutation_fragment_stream_validator: add compare_keys flag Storing and comparing keys is expensive. Add a flag to enable/disable this feature (disabled by default). Without the flag, only the partition region monotonicity is validated, allowing repeated clustering rows, regardless of clustering keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	496467d0a2	sstables: writer: Validate input mutation fragment stream Fixes #4803 Refs #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Botond Dénes	23cc6d6fb2	make_flat_mutation_reader_from_fragments: reader: silence discarded future warning The fragment reader calls `fast_forward_to()` from its constructor to discard fragments that fall outside the query range. Mmove the the fast-forward code in to an internal void returning method, and call that from both the constructor and `fast_forward_to()`, to avoid a warning on a discarded future<>. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>	2019-08-05 16:21:50 +03:00
Botond Dénes	51e81cf027	flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice To be able to support this new overload, the reader is made partition-range aware. It will now correctly only return fragments that fall into the partition-range it was created with. For completeness' sake and to be able to test it, also implement `fast_forward_to(const dht::partition_range)`. Slicing is done by filtering out non-overlapping fragments from the initial list of fragments. Also add a unit test that runs it through the mutation_source test suite.	2019-04-29 10:24:14 +03:00
Botond Dénes	bc08f8fd07	flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice To be able to run the mutation-source test suite with this reader. In the next patch, this reader will be used in testing another reader, so it is important to make sure it works correctly first.	2019-04-26 12:43:45 +03:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Paweł Dziepak	048ed2e3d3	flat_mutation_reader_from_mutations: destroy all remaining mutations If the reader is fast-forwarded to another partition range mutation_ may be left with some partial mutations. Make sure that those are properly destroyed.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	d50cd31eee	flat_mutation_reader_from_mutations: fix empty range case An iterator shall not be dereferenced until it is verified that it is dereferencable.	2018-12-20 13:27:25 +00:00
Asias He	0067d32b47	flat_mutation_reader: Add make_generating_reader Move generating_reader from stream_session.cc to flat_mutation_reader.cc. It will be used by repair code soon. Also introduce a helper make_generating_reader to hide the implementation of generating_reader.	2018-12-12 16:49:01 +08:00
Botond Dénes	39bfd5d1df	make_flat_multi_range_reader: add generator overload Allows creating a multi range reader from an arbitrary callable that return std::optional<dht::partition_range>. The callable is expected to return a new range on each call, such that passing each successive range to `flat_mutation_reader::fast_forward_to` is valid. When exhausted the callable is expected to return std::nullopt.	2018-09-28 14:27:55 +03:00

1 2

96 Commits