scylladb

Author	SHA1	Message	Date
Botond Dénes	727bc0f5d4	mutation_fragment_stream_validator: add token validation level In some cases the full-blown partition key validation and especially the associated key copy per partition might be deemed too costly. As a next best thing this patch adds a token only validation, which should cover 99% (number pulled out of my sleeve) of the cases. Let's hope no one gets unlucky.	2021-03-01 07:49:23 +02:00
Botond Dénes	694f8a4ec6	mutation_fragment_stream_validating_filter: make validation levels more fine-grained Currently key order validation for the mutation fragment stream validating filter is all or nothing. Either no keys (partition or clustering) are validated or all of them. As we suspect that clustering key order validation would add a significant overhead, this discourages turning key validation on, which means we miss out on partition key monotonicity validation which has a much more moderate cost. This patch makes this configurable in a more fine-grained fashion, providing separate levels for partition and clustering key monotonicity validation. As the choice for the default validation level is not as clear-cut as before, the default value for the validation level is removed in the validating filter's constructor.	2021-03-01 07:49:23 +02:00
Pavel Emelyanov	bfcd6a4bb7	flat_mutation_reader: Use clear() in destroy_current_mutation() Currently the code uses a look of unlink_leftmost_without_rebalance calls. B-tree does have it, but plain clearing of the tree is a bit faster with clear(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2021-02-02 09:30:30 +03:00
Benny Halevy	29002e3b48	flat_mutation_reader: return future from next_partition To allow it to asynchronously close underlying readers on next_partition(). Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2021-01-13 17:35:07 +02:00
Botond Dénes	8dae6152bf	mutation_fragment_stream_validator: make it easier to validate concrete fragment types The current API is tailored to the `mutation_fragment` type. In the next patch we will want to use the validator from a context where the mutation fragments are already decomposed into their respective concrete types, e.g. static_row, clustering_row, etc. To avoid having to reconstruct a mutation fragment type just to use the validator, add an API which allows validating these concrete types conveniently too.	2021-01-11 08:07:42 +02:00
Botond Dénes	495f9d54ba	flat_mutation_reader: extract fragment stream validator into its own header To allow using it without pulling in the huge `flat_mutation_reader.hh`.	2021-01-11 08:07:42 +02:00
Botond Dénes	dd372c8457	flat_mutation_reader: de-virtualize buffer_size() The main user of this method, the one which required this method to return the collective buffer size of the entire reader tree, is now gone. The remaining two users just use it to check the size of the reader instance they are working with. So de-virtualize this method and reduce its responsibility to just returning the buffer size of the current reader instance.	2020-10-06 08:22:56 +03:00
Botond Dénes	256140a033	mutation_fragment: memory_usage(): remove unused schema parameter The memory usage is now maintained and updated on each change to the mutation fragment, so it needs not be recalculated on a call to `memory_usage()`, hence the schema parameter is unused and can be removed.	2020-09-28 11:27:47 +03:00
Botond Dénes	6ca0464af5	mutation_fragment: add schema and permit We want to start tracking the memory consumption of mutation fragments. For this we need schema and permit during construction, and on each modification, so the memory consumption can be recalculated and pass to the permit. In this patch we just add the new parameters and go through the insane churn of updating all call sites. They will be used in the next patch.	2020-09-28 11:27:23 +03:00
Botond Dénes	0518571e56	flat_mutation_reader: make _buffer a tracked buffer Via a tracked_allocator. Although the memory allocations made by the _buffer shouldn't dominate the memory consumption of the read itself, they can still be a significant portion that scales with the number of readers in the read.	2020-09-28 10:53:56 +03:00
Botond Dénes	3fab83b3a1	flat_mutation_reader: impl: add reader_permit parameter Not used yet, this patch does all the churn of propagating a permit to each impl. In the next patch we will use it to track to track the memory consumption of `_buffer`.	2020-09-28 10:53:48 +03:00
Pavel Emelyanov	f19b85b61d	range_tombstone_list: Introduce and use pop-and-lock helper There's an optimization in flat_mutation_reader_from_mutations that folds the list from left-to-right in linear time. In case of currently used boost::set the .unlink_leftmost_without_rebalance helper is used, so wrap this exception with a method of the range_tombstone_list. This is the last place where caller need to mess with the exact internal collection. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	a89c7198c2	range_tombstone_list: Introduce and use pop_as<>() The method extracts an element from the list, constructs a desired object from it and frees. This is common usage of range_tombstone_list. Having a helper helps encapsulating the exact collection inside the class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Pavel Emelyanov	27912375b2	flat_mutation_reader: Use range_tombstone_list begin/end API The goal is to stop revealing the exact collection from the range_tombstone_list, so make use of existing begin/end methods and extend with rbegin() where needed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2020-09-07 23:17:41 +03:00
Botond Dénes	92ce39f014	query: query_class_config: use max_result_size for the max_memory_for_unlimited_query field We want to switch from using a single limit to a dual soft/hard limit. As a first step we switch the limit field of `query_class_config` to use the recently introduced type for this. As this field has a single user at the moment -- reverse queries (and not a lot of propagation) -- we update it in this same patch to use the soft/hard limit: warn on reaching the soft limit and abort on the hard limit (the previous behaviour).	2020-07-28 18:00:29 +03:00
Piotr Sarna	4cb79f04b0	treewide: replace libjsoncpp usage with rjson In order to eventually switch to a single JSON library, most of the libjsoncpp usage is dropped in favor of rjson. Unfortunately, one usage still remains: test/utils/test_repl utility heavily depends on the exact textual format of its output JSON files, so replacing a library results in all tests failing because of differences in formatting. It is possible to force rjson to print its documents in the exact matching format, but that's left for later, since the issue is not critical. It would be nice though if our test suite compared JSON documents with a real JSON parser, since there are more differences - e.g. libjsoncpp keeps children of the object sorted, while rapidjson uses an unordered data structure. This change should cause no change in semantics, it strives just to replace all usage of libjsoncpp with rjson.	2020-07-03 10:27:23 +02:00
Botond Dénes	0b4ec62332	flat_mutation_reader: flat_multi_range_reader: add reader_permit parameter Mutation sources will soon require a valid permit so make sure we have one and pass it to the mutation sources when creating the underlying readers. For now, pass no_reader_permit() on call sites, deferring the obtaining of a valid permit to later patches.	2020-05-28 11:34:35 +03:00
Botond Dénes	196dd5fa9b	treewide: throw std::bad_function_call with backtraces We typically use `std::bad_function_call` to throw from mandatory-to-implement virtual functions, that cannot have a meaningful implementation in the derived class. The problem with `std::bad_function_call` is that it carries absolutely no information w.r.t. where was it thrown from. I originally wanted to replace `std::bad_function_call` in our codebase with a custom exception type that would allow passing in the name of the function it is thrown from to be included in the exception message. However after I ended up also including a backtrace, Benny Halevy pointed out that I might as well just throw `std:bad_function_call` with a backtrace instead. So this is what this patch does. All users are various unimplemented methods of the `flat_mutation_reader::impl` interface. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200408075801.701416-1-bdenes@scylladb.com>	2020-04-08 13:54:06 +02:00
Botond Dénes	7bdeec4b00	flat_mutation_reader: make_reversing_reader(): add memory limit If the reversing requires more memory than the limit, the read is aborted. All users are updated to get a meaningful limit, from the respective table object, with the exception of tests of course.	2020-02-27 18:11:54 +02:00
Botond Dénes	091d80e8c3	flat_mutation_reader: expose reverse reader as a standalone reader Currently reverse reads just pass a flag to `flat_mutation_reader::consume()` to make the read happen in reverse. This is deceptively simple and streamlined -- while in fact behind the scenes a reversing reader is created to wrap the reader in question to reverse partitions, one-by-one. This patch makes this apparent by exposing the reversing reader via `make_reversing_reader()`. This now makes how reversing works more apparent. It also allows for more configuration to be passed to the reversing reader (in the next patches). This change is forward compatible, as in time we plan to add reversing support to the sstable layer, in which case the reversing reader will go.	2020-02-27 18:11:54 +02:00
Botond Dénes	1b7725af4b	mutation_fragment_stream_validator: split into low-level and high-level API The low-level validator allows fine-grained validation of different aspects of monotonicity of a fragment stream. It doesn't do any error handling. Since different aspects can be validated with different functions, this allows callers to understand what exactly is invalid. The high-level API is the previous fragment filter one. This is now built on the low-level API. This division allows for advanced use cases where the user of the validator wants to do all error handling and wants to decide exactly what monotonicity to validate. The motivating use-case is scrubbing compaction, added in the next patches.	2020-02-13 15:02:32 +02:00
Botond Dénes	dfc8b2fc45	treewide: replace reader_resource_tracer with reader_permit The former was never really more than a reader_permit with one additional method. Currently using it doesn't even save one from any includes. Now that readers will be using reader_permit we would have to pass down both to mutation_source. Instead get rid of reader_resource_tracker and just use reader_permit. Instead of making it a last and optional parameter that is easy to ignore, make it a first class parameter, right after schema, to signify that permits are now a prominent part of the reader API. This -- mostly mechanical -- patch essentially refactors mutation_source to ask for the reader_permit instead of reader_resource_tracking and updates all usage sites.	2020-01-28 08:13:16 +02:00
Botond Dénes	a74a82d4d2	flat_mutation_reader: mutation_fragment_stream_validator: add name Add a name parameter to the validator, so that the validator can be identified in log messages. Schema identity information is added to the name automatically. This should help pinpoint the problematic place where validation failed. Although at the moment we have a single validator, it still benefits from having a name, as we can now include in it the name of the sstable being written and hence trace the source of the bad data. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20200117150616.895878-1-bdenes@scylladb.com>	2020-01-20 11:06:30 +01:00
Botond Dénes	08bb0bd6aa	mutation_fragment_stream_validator: wrap exceptions into own exception type So a higher level component using the validator to validate a stream can catch only validation errors, and let any other incidental exception through. This allows building data correctors on top of the `mutation_fragment_stream_validator`, by filtering a fragment stream through a validator, catching invalid fragment stream exceptions and dropping the respective fragments from the stream. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20191220073443.530750-1-bdenes@scylladb.com>	2019-12-20 12:05:00 +01:00
Benny Halevy	79d5fed40b	mutation_fragment_stream_validator: validate end of stream in partition_key filter Currently end of stream validation is done in the destructor, but the validator may be destructed prematurely, e.g. on exception, as seen in https://github.com/scylladb/scylla/issues/5215 This patch adds a on_end_of_stream() method explicitly called by consume_pausable_in_thread. Also, the respective concepts for ParitionFilter, MutationFragmentFilter and a new on for the on_end_of_stream method were unified as FlattenedConsumerFilter. Refs #5215 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 506ff40bd447f00158c24859819d4bb06436c996)	2019-10-29 12:35:33 +01:00
Benny Halevy	d5f53bc307	mutation_fragment_stream_validator: validate partition key monotonicity Fixes #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit 736360f823621f7994964fee77f37378ca934c56)	2019-10-29 12:35:33 +01:00
Nadav Har'El	51fc6c7a8e	make static_row optional to reduce memory footprint Merged patch series from Avi Kivity: The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by allocating it as an external object rather than inlined into mutation_partition. This adds overhead when the static row is present (17 bytes for the reference, back reference, and lsa allocator overhead). perf_simple_query appears to marginally (2%) faster. Footprint is reduced by ~9% for a cache entry, 12% in memtables. More details are provided in the patch commitlog. Tests: unit (debug) Avi Kivity (4): managed_ref: add get() accessor managed_ref: add external_memory_usage() mutation_partition: introduce lazy_row mutation_partition: make static_row optional to reduce memory footprint cell_locking.hh \| 2 +- converting_mutation_partition_applier.hh \| 4 +- mutation_partition.hh \| 284 ++++++++++++++++++++++- partition_builder.hh \| 4 +- utils/managed_ref.hh \| 12 + flat_mutation_reader.cc \| 2 +- memtable.cc \| 2 +- mutation_partition.cc \| 45 +++- mutation_partition_serializer.cc \| 2 +- partition_version.cc \| 4 +- tests/multishard_mutation_query_test.cc \| 2 +- tests/mutation_source_test.cc \| 2 +- tests/mutation_test.cc \| 12 +- tests/sstable_mutation_test.cc \| 10 +- 14 files changed, 355 insertions(+), 32 deletions(-)	2019-10-22 12:25:15 +03:00
Avi Kivity	acc433b286	mutation_partition: make static_row optional to reduce memory footprint The static row can be rare: many tables don't have them, and tables that do will often have mutations without them (if the static row is rarely updated, it may be present in the cache and in readers, but absent in memtable mutations). However, it always consumes ~100 bytes of memory, even if it not present, due to row's overhead. Change it to be optional by using lazy_row instead of row. Some call sites treewide were adjusted to deal with the extra indirection. perf_simple_query appears to improve by 2%, from 163krps to 165 krps, though it's hard to be sure due to noisy measurements. memory_footprint comparisons (before/after): mutation footprint: mutation footprint: - in cache: 1096 - in cache: 992 - in memtable: 854 - in memtable: 750 - in sstable: 351 - in sstable: 351 - frozen: 540 - frozen: 540 - canonical: 827 - canonical: 827 - query result: 342 - query result: 342 sizeof(cache_entry) = 112 sizeof(cache_entry) = 112 -- sizeof(decorated_key) = 36 -- sizeof(decorated_key) = 36 -- sizeof(cache_link_type) = 32 -- sizeof(cache_link_type) = 32 -- sizeof(mutation_partition) = 200 -- sizeof(mutation_partition) = 96 -- -- sizeof(_static_row) = 112 -- -- sizeof(_static_row) = 8 -- -- sizeof(_rows) = 24 -- -- sizeof(_rows) = 24 -- -- sizeof(_row_tombstones) = 40 -- -- sizeof(_row_tombstones) = 40 sizeof(rows_entry) = 232 sizeof(rows_entry) = 232 sizeof(lru_link_type) = 16 sizeof(lru_link_type) = 16 sizeof(deletable_row) = 168 sizeof(deletable_row) = 168 sizeof(row) = 112 sizeof(row) = 112 sizeof(atomic_cell_or_collection) = 8 sizeof(atomic_cell_or_collection) = 8 Tests: unit (dev)	2019-10-15 15:42:05 +03:00
Tomasz Grabiec	3177732b35	flat_mutation_reader: Introduce upgrade_schema()	2019-10-03 13:28:33 +02:00
Benny Halevy	507c99c011	mutation_fragment_stream_validator: add compare_keys flag Storing and comparing keys is expensive. Add a flag to enable/disable this feature (disabled by default). Without the flag, only the partition region monotonicity is validated, allowing repeated clustering rows, regardless of clustering keys. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Benny Halevy	496467d0a2	sstables: writer: Validate input mutation fragment stream Fixes #4803 Refs #4804 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2019-09-09 15:30:59 +03:00
Botond Dénes	23cc6d6fb2	make_flat_mutation_reader_from_fragments: reader: silence discarded future warning The fragment reader calls `fast_forward_to()` from its constructor to discard fragments that fall outside the query range. Mmove the the fast-forward code in to an internal void returning method, and call that from both the constructor and `fast_forward_to()`, to avoid a warning on a discarded future<>. Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <20190801133942.10744-1-bdenes@scylladb.com>	2019-08-05 16:21:50 +03:00
Botond Dénes	51e81cf027	flat_mutation_reader: add make_flat_mutation_reader_from_fragments() overload with range and slice To be able to support this new overload, the reader is made partition-range aware. It will now correctly only return fragments that fall into the partition-range it was created with. For completeness' sake and to be able to test it, also implement `fast_forward_to(const dht::partition_range)`. Slicing is done by filtering out non-overlapping fragments from the initial list of fragments. Also add a unit test that runs it through the mutation_source test suite.	2019-04-29 10:24:14 +03:00
Botond Dénes	bc08f8fd07	flat_mutation_reader: add flat_mutation_reader_from_mutations() overload with range and slice To be able to run the mutation-source test suite with this reader. In the next patch, this reader will be used in testing another reader, so it is important to make sure it works correctly first.	2019-04-26 12:43:45 +03:00
Duarte Nunes	fa2b0384d2	Replace std::experimental types with C++17 std version. Replace stdx::optional and stdx::string_view with the C++ std counterparts. Some instances of boost::variant were also replaced with std::variant, namely those that called seastar::visit. Scylla now requires GCC 8 to compile. Signed-off-by: Duarte Nunes <duarte@scylladb.com> Message-Id: <20190108111141.5369-1-duarte@scylladb.com>	2019-01-08 13:16:36 +02:00
Avi Kivity	c96fc1d585	Merge "Introduce row level repair" from Asias " === How the the partition level repair works - The repair master decides which ranges to work on. - The repair master splits the ranges to sub ranges which contains around 100 partitions. - The repair master computes the checksum of the 100 partitions and asks the related peers to compute the checksum of the 100 partitions. - If the checksum matches, the data in this sub range is synced. - If the checksum mismatches, repair master fetches the data from all the peers and sends back the merged data to peers. === Major problems with partition level repair - A mismatch of a single row in any of the 100 partitions causes 100 partitions to be transferred. A single partition can be very large. Not to mention the size of 100 partitions. - Checksum (find the mismatch) and streaming (fix the mismatch) will read the same data twice === Row level repair Row level checksum and synchronization: detect row level mismatch and transfer only the mismatch === How the row level repair works - To solve the problem of reading data twice Read the data only once for both checksum and synchronization between nodes. We work on a small range which contains only a few mega bytes of rows, We read all the rows within the small range into memory. Find the mismatch and send the mismatch rows between peers. We need to find a sync boundary among the nodes which contains only N bytes of rows. - To solve the problem of sending unnecessary data. We need to find the mismatched rows between nodes and only send the delta. The problem is called set reconciliation problem which is a common problem in distributed systems. For example: Node1 has set1 = {row1, row2, row3} Node2 has set2 = { row2, row3} Node3 has set3 = {row1, row2, row4} To repair: Node1 fetches nothing from Node2 (set2 - set1), fetches row4 (set3 - set1) from Node3. Node1 sends row1 and row4 (set1 + set2 + set3 - set2) to Node2 Node1 sends row3 (set1 + set2 + set3 - set3) to Node3. === How to implement repair with set reconciliation - Step A: Negotiate sync boundary class repair_sync_boundary { dht::decorated_key pk; position_in_partition position } Reads rows from disk into row buffers until the size is larger than N bytes. Return the repair_sync_boundary of the last mutation_fragment we read from disk. The smallest repair_sync_boundary of all nodes is set as the current_sync_boundary. - Step B: Get missing rows from peer nodes so that repair master contains all the rows Request combined hashes from all nodes between last_sync_boundary and current_sync_boundary. If the combined hashes from all nodes are identical, data is synced, goto Step A. If not, request the full hashes from peers. At this point, the repair master knows exactly what rows are missing. Request the missing rows from peer nodes. Now, local node contains all the rows. - Step C: Send missing rows to the peer nodes Since local node also knows what peer nodes own, it sends the missing rows to the peer nodes. === How the RPC API looks like - repair_range_start() Step A: - request_sync_boundary() Step B: - request_combined_row_hashes() - reqeust_full_row_hashes() - request_row_diff() Step C: - send_row_diff() - repair_range_stop() === Performance evaluation We created a cluster of 3 Scylla nodes on AWS using i3.xlarge instance. We created a keyspace with a replication factor of 3 and inserted 1 billion rows to each of the 3 nodes. Each node has 241 GiB of data. We tested 3 cases below. 1) 0% synced: one of the node has zero data. The other two nodes have 1 billion identical rows. Time to repair: old = 87 min new = 70 min (rebuild took 50 minutes) improvement = 19.54% 2) 100% synced: all of the 3 nodes have 1 billion identical rows. Time to repair: old = 43 min new = 24 min improvement = 44.18% 3) 99.9% synced: each node has 1 billion identical rows and 1 billion * 0.1% distinct rows. Time to repair: old: 211 min new: 44 min improvement: 79.15% Bytes sent on wire for repair: old: tx= 162 GiB, rx = 90 GiB new: tx= 1.15 GiB, tx = 0.57 GiB improvement: tx = 99.29%, rx = 99.36% It is worth noting that row level repair sends and receives exactly the number of rows needed in theory. In this test case, repair master needs to receives 2 million rows and sends 4 million rows. Here are the details: Each node has 1 billion * 0.1% distinct rows, that is 1 million rows. So repair master receives 1 million rows from repair slave 1 and 1 million rows from repair slave 2. Repair master sends 1 million rows from repair master and 1 million rows received from repair slave 1 to repair slave 2. Repair master sends sends 1 million rows from repair master and 1 million rows received from repair slave 2 to repair slave 1. In the result, we saw the rows on wire were as expected. tx_row_nr = 1000505 + 999619 + 1001257 + 998619 (4 shards, the numbers are for each shard) = 4'000'000 rx_row_nr = 500233 + 500235 + 499559 + 499973 (4 shards, the numbers are for each shard) = 2'000'000 Fixes: #3033 Tests: dtests/repair_additional_test.py " * 'asias/row_level_repair_v7' of github.com:cloudius-systems/seastar-dev: (51 commits) repair: Enable row level repair repair: Add row_level_repair repair: Add docs for row level repair repair: Add repair_init_messaging_service_handler repair: Add repair_meta repair: Add repair_writer repair: Add repair_reader repair: Add repair_row repair: Add fragment_hasher repair: Add decorated_key_with_hash repair: Add get_random_seed repair: Add get_common_diff_detect_algorithm repair: Add shard_config repair: Add suportted_diff_detect_algorithms repair: Add repair_stats to repair_info repair: Introduce repair_stats flat_mutation_reader: Add make_generating_reader storage_service: Introduce ROW_LEVEL_REPAIR feature messaging_service: Add RPC verbs for row level repair repair: Export the repair logger ...	2018-12-25 13:13:00 +02:00
Paweł Dziepak	048ed2e3d3	flat_mutation_reader_from_mutations: destroy all remaining mutations If the reader is fast-forwarded to another partition range mutation_ may be left with some partial mutations. Make sure that those are properly destroyed.	2018-12-20 13:27:25 +00:00
Paweł Dziepak	d50cd31eee	flat_mutation_reader_from_mutations: fix empty range case An iterator shall not be dereferenced until it is verified that it is dereferencable.	2018-12-20 13:27:25 +00:00
Asias He	0067d32b47	flat_mutation_reader: Add make_generating_reader Move generating_reader from stream_session.cc to flat_mutation_reader.cc. It will be used by repair code soon. Also introduce a helper make_generating_reader to hide the implementation of generating_reader.	2018-12-12 16:49:01 +08:00
Botond Dénes	39bfd5d1df	make_flat_multi_range_reader: add generator overload Allows creating a multi range reader from an arbitrary callable that return std::optional<dht::partition_range>. The callable is expected to return a new range on each call, such that passing each successive range to `flat_mutation_reader::fast_forward_to` is valid. When exhausted the callable is expected to return std::nullopt.	2018-09-28 14:27:55 +03:00
Botond Dénes	8c5387890d	flat_multi_range_reader: refactor to work in terms of generator Instead of working with a dht::partition_range_vector directly, work with an abstract generator that returns a pointer to the next range on each invocation. When exhausted it returns nullptr. This opens up the possibility to create multi range readers from a generator functor that creates ranges lazily. This is indeed what the next path does.	2018-09-28 14:27:55 +03:00
Botond Dénes	f3bf2e83dd	make_flat_multi_range_reader(): better handle the 0 range case Previously, when the passed in range of partition ranges contained 0 ranges, an empty reader was returned. This means that the returned reader was forwardable or not depending on the number of passed in ranges. This is inconsistent and can lead to nasty surprises. To solve this problem add `forwardable_empty_mutation_reader`, a specialized reader that delays creating the underlying reader until fast_forward_to() is called on it, and thus a range is available. When `make_flat_multi_range_mutation_reader()` is called with `mutation_reader::forwarding::no` a simple empty reader is created, like before.	2018-09-28 14:27:55 +03:00
Botond Dénes	68b6c83ee8	flat_multi_range_mutation_reader: drop fwd_mr ctor parameter The factory function creating this reader ensures that the passed-in ranges vector has more then one range, which effectively makes the `fwd_mr` constructor parameter have no effect. The underlying reader will always be created with `mutation_reader::forwarding::yes` as it has to be able to fast-forward between the ranges.	2018-09-28 14:25:03 +03:00
Botond Dénes	eb357a385d	flat_mutation_reader: make timeout opt-out rather than opt-in Currently timeout is opt-in, that is, all methods that even have it default it to `db::no_timeout`. This means that ensuring timeout is used where it should be is completely up to the author and the reviewrs of the code. As humans are notoriously prone to mistakes this has resulted in a very inconsistent usage of timeout, many clients of `flat_mutation_reader` passing the timeout only to some members and only on certain call sites. This is small wonder considering that some core operations like `operator()()` only recently received a timeout parameter and others like `peek()` didn't even have one until this patch. Both of these methods call `fill_buffer()` which potentially talks to the lower layers and is supposed to propagate the timeout. All this makes the `flat_mutation_reader`'s timeout effectively useless. To make order in this chaos make the timeout parameter a mandatory one on all `flat_mutation_reader` methods that need it. This ensures that humans now get a reminder from the compiler when they forget to pass the timeout. Clients can still opt-out from passing a timeout by passing `db::no_timeout` (the previous default value) but this will be now explicit and developers should think before typing it. There were suprisingly few core call sites to fix up. Where a timeout was available nearby I propagated it to be able to pass it to the reader, where I couldn't I passed `db::no_timeout`. Authors of the latter kind of code (view, streaming and repair are some of the notable examples) should maybe consider propagating down a timeout if needed. In the test code (the wast majority of the changes) I just used `db::no_timeout` everywhere. Tests: unit(release, debug) Signed-off-by: Botond Dénes <bdenes@scylladb.com> Message-Id: <1edc10802d5eb23de8af28c9f48b8d3be0f1a468.1536744563.git.bdenes@scylladb.com>	2018-09-20 11:31:24 +02:00
Tomasz Grabiec	6d6b93d1e7	flat_mutation_reader: Move field initialization to initializer list This works around a problem of std::terminate() being called in debug mode build if initialization of _current throws. Backtrace: Thread 2 "row_cache_test_" received signal SIGABRT, Aborted. 0x00007ffff17ce9fb in raise () from /lib64/libc.so.6 (gdb) bt #0 0x00007ffff17ce9fb in raise () from /lib64/libc.so.6 #1 0x00007ffff17d077d in abort () from /lib64/libc.so.6 #2 0x00007ffff5773025 in __gnu_cxx::__verbose_terminate_handler() () from /lib64/libstdc++.so.6 #3 0x00007ffff5770c16 in ?? () from /lib64/libstdc++.so.6 #4 0x00007ffff576fb19 in ?? () from /lib64/libstdc++.so.6 #5 0x00007ffff5770508 in __gxx_personality_v0 () from /lib64/libstdc++.so.6 #6 0x00007ffff3ce4ee3 in ?? () from /lib64/libgcc_s.so.1 #7 0x00007ffff3ce570e in _Unwind_Resume () from /lib64/libgcc_s.so.1 #8 0x0000000003633602 in reader::reader (this=0x60e0001160c0, r=...) at flat_mutation_reader.cc:214 #9 0x0000000003655864 in std::make_unique<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (__args#0=...) at /usr/include/c++/7/bits/unique_ptr.h:825 #10 0x0000000003649a63 in make_flat_mutation_reader<make_forwardable(flat_mutation_reader)::reader, flat_mutation_reader>(flat_mutation_reader &&) (args#0=...) at flat_mutation_reader.hh:440 #11 0x000000000363565d in make_forwardable (m=...) at flat_mutation_reader.cc:270 #12 0x000000000303f962 in memtable::make_flat_reader (this=0x61300001d540, s=..., range=..., slice=..., pc=..., trace_state_ptr=..., fwd=..., fwd_mr=...) at memtable.cc:592 Message-Id: <1528792447-13336-1-git-send-email-tgrabiec@scylladb.com>	2018-06-25 20:03:23 +03:00
Paweł Dziepak	ec9d166a4f	treewide: require type to compute cell memory usage	2018-05-31 15:51:11 +01:00
Paweł Dziepak	27014a23d7	treewide: require type info for copying atomic_cell_or_collection	2018-05-31 15:51:11 +01:00
Botond Dénes	f5b012c952	flat_multi_range_mutation_reader: optimize for non-plural range vectors Don't create a flat_multi_range_mutation_reader when the range vector has 0 or 1 element. In the former case create an empty reader and in the latter just create a reader with the mutation-source with the only range in the vector.	2018-05-10 06:22:39 +03:00
Duarte Nunes	1f3e3d3813	flat_mutation_reader: Make reader from mutation fragments Builds a reader from a set of ordered mutations fragments. This is useful for building a reader out of a subset of segments returned by a different reader. It is equivalent to building a mutation out of the set of mutation fragments, and calling make_flat_mutation_reader_from_mutations, except that it doest not yet support fast-forwarding. Signed-off-by: Duarte Nunes <duarte@scylladb.com>	2018-03-27 01:20:11 +01:00
Botond Dénes	f488ae3917	Add buffer_size() to flat_mutation_reader buffer_size() exposes the collective size of the external memory consumed by the mutattion-fragments in the flat reader's buffer. This provides a basis to build basic memory accounting on. Altought this is not the entire memory consumption of any given reader it is the most volatile component and usually by far the largest one too.	2018-03-13 10:34:34 +02:00

1 2

86 Commits