We use partition_reversing_data_source and the new `index_reader` methods to implement single-partition reads in `mx_sstable_mutation_reader`. The parsing logic does not need to change: the buffers returned by the source already contain rows in reversed clustering order. Some changes were required in `mp_row_consumer_m` which processes the parsed rows and emits appropriate mutation fragments. The consumer uses `mutation_fragment_filter` underneath to decide whether a fragment should be ignored or not (e.g. the parsed fragment may come from outside the requested clustering range), among other things. Previously `mutation_fragment_filter` was provided a `partition_slice`. If the slice was reversed, the filter would use `clustering_key_filter_ranges::get_ranges` to obtain the clustering ranges from the slice in unreversed order (they were reversed in the slice) since we didn't perform any reversing in the reader. Now the reader provides the ranges directly instead of the slice; furthermore, the ranges are provided in native-reversed format (the order of ranges is reversed and the ranges themselves are also reversed), and the schema provided to the filter is also reversed. Thus to the filter everything appears as if it was used during a non-reversed query but on a table with reversed schema, which works correctly given the fact that the reader is feeding parsed rows into the consumer in reversed order. During reversed queries the reader uses alternative logic for skipping to a later range (or, speaking in non-reversed terms, to an earlier range), which happens in `advance_context`. It asks the index to advance its upper bound in reverse so that the reversing_data_source notices the change of the index end position and returns following buffers with rows from the new range. There is a slight difference in behavior of the reader from `mp_row_consumer_m`'s point of view. For non-reversed reads, after the consumer obtains the beginning of a row (`consume_row_start`) - which contains the row's position but not the columns - and tells the reader that the row won't be emitted because we need to skip to a later range, the reader would tell the data source (the 'context') immediately to skip to a later range by calling `skip_to`. This caused the source not to return the rest of the row, and the rest of the row would not be fed to the consumer (`consume_row_end`). However, for reversed reads, the data source performs skipping 'on its own', after it notices that the index end position has changed. This may happen 'too late', causing the rest of the row to be returned anyway. We are prepared for this situation inside `mp_row_consumer` by consulting the mutation fragment filter again when the rest of the row arrives. Fast forwarding is not supported at this point, which is fine given that the cache is disabled for reversed queries for now (and the cache is the only user of fast forwarding). The `partition_slice` provided by callers is provided in 'half-reversed' format for reversed queries, where the order of clustering ranges is reversed, but the ranges themselves are not. This means we need to modify the slice sometimes: for non-single-partition queries the mx reader must use a non-reversed slice, and for single-partition queries the mx reader must use a native-reversed slice (where the clustering ranges themselves are reversed as well). The modified slice must be stored somewhere; we store it inside the mx reader itself so we don't need to allocate more intermediate readers at the call sites. This causes the interface of `mx::make_reader` to be a bit weird: for non-single-partition queries where the provided slice is reversed the reader will actually return a non-reversed stream of fragments, telling the user to reverse the stream on their own. The interface has been documented in detail with appropriate comments.
72 lines
2.5 KiB
C++
72 lines
2.5 KiB
C++
/*
|
|
* Copyright (C) 2021-present ScyllaDB
|
|
*/
|
|
|
|
/*
|
|
* This file is part of Scylla.
|
|
*
|
|
* Scylla is free software: you can redistribute it and/or modify
|
|
* it under the terms of the GNU Affero General Public License as published by
|
|
* the Free Software Foundation, either version 3 of the License, or
|
|
* (at your option) any later version.
|
|
*
|
|
* Scylla is distributed in the hope that it will be useful,
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
* GNU General Public License for more details.
|
|
*
|
|
* You should have received a copy of the GNU General Public License
|
|
* along with Scylla. If not, see <http://www.gnu.org/licenses/>.
|
|
*/
|
|
|
|
#pragma once
|
|
|
|
#include "flat_mutation_reader_v2.hh"
|
|
#include "sstables/progress_monitor.hh"
|
|
|
|
namespace sstables {
|
|
namespace mx {
|
|
|
|
// Precondition: if the slice is reversed, the schema must be reversed as well
|
|
// and the range must be singular (`range.is_singular()`).
|
|
// Reversed slices must be provided in the 'half-reversed' format (the order of ranges
|
|
// being reversed, but the ranges themselves are not).
|
|
// Fast-forwarding is not supported in reversed queries (FIXME).
|
|
flat_mutation_reader_v2 make_reader(
|
|
shared_sstable sstable,
|
|
schema_ptr schema,
|
|
reader_permit permit,
|
|
const dht::partition_range& range,
|
|
const query::partition_slice& slice,
|
|
const io_priority_class& pc,
|
|
tracing::trace_state_ptr trace_state,
|
|
streamed_mutation::forwarding fwd,
|
|
mutation_reader::forwarding fwd_mr,
|
|
read_monitor& monitor);
|
|
|
|
// Same as above but the slice is moved and stored inside the reader.
|
|
flat_mutation_reader_v2 make_reader(
|
|
shared_sstable sstable,
|
|
schema_ptr schema,
|
|
reader_permit permit,
|
|
const dht::partition_range& range,
|
|
query::partition_slice&& slice,
|
|
const io_priority_class& pc,
|
|
tracing::trace_state_ptr trace_state,
|
|
streamed_mutation::forwarding fwd,
|
|
mutation_reader::forwarding fwd_mr,
|
|
read_monitor& monitor);
|
|
|
|
// A reader which doesn't use the index at all. It reads everything from the
|
|
// sstable and it doesn't support skipping.
|
|
flat_mutation_reader_v2 make_crawling_reader(
|
|
shared_sstable sstable,
|
|
schema_ptr schema,
|
|
reader_permit permit,
|
|
const io_priority_class& pc,
|
|
tracing::trace_state_ptr trace_state,
|
|
read_monitor& monitor);
|
|
|
|
} // namespace mx
|
|
} // namespace sstables
|