Files
scylladb/sstables/mx
Kamil Braun 27238eaa0f sstables: mx: implement reversed single-partition reads
We use partition_reversing_data_source and the new `index_reader` methods
to implement single-partition reads in `mx_sstable_mutation_reader`.

The parsing logic does not need to change: the buffers returned by the
source already contain rows in reversed clustering order.

Some changes were required in `mp_row_consumer_m` which processes the
parsed rows and emits appropriate mutation fragments. The consumer uses
`mutation_fragment_filter` underneath to decide whether a fragment
should be ignored or not (e.g. the parsed fragment may come from outside
the requested clustering range), among other things. Previously
`mutation_fragment_filter` was provided a `partition_slice`. If the
slice was reversed, the filter would use
`clustering_key_filter_ranges::get_ranges` to obtain the clustering
ranges from the slice in unreversed order (they were reversed in the
slice) since we didn't perform any reversing in the reader. Now the
reader provides the ranges directly instead of the slice; furthermore,
the ranges are provided in native-reversed format (the order of ranges
is reversed and the ranges themselves are also reversed), and the schema
provided to the filter is also reversed. Thus to the filter everything
appears as if it was used during a non-reversed query but on a table
with reversed schema, which works correctly given the fact that the
reader is feeding parsed rows into the consumer in reversed order.

During reversed queries the reader uses alternative logic for skipping
to a later range (or, speaking in non-reversed terms, to an earlier range),
which happens in `advance_context`. It asks the index to advance its
upper bound in reverse so that the reversing_data_source notices the
change of the index end position and returns following buffers with rows
from the new range.

There is a slight difference in behavior of the reader from
`mp_row_consumer_m`'s point of view. For non-reversed reads, after
the consumer obtains the beginning of a row (`consume_row_start`)
- which contains the row's position but not the columns - and tells the
reader that the row won't be emitted because we need to skip to a later
range, the reader would tell the data source (the 'context') immediately
to skip to a later range by calling `skip_to`. This caused the source
not to return the rest of the row, and the rest of the row would not
be fed to the consumer (`consume_row_end`). However, for reversed reads,
the data source performs skipping 'on its own', after it notices that
the index end position has changed. This may happen 'too late', causing
the rest of the row to be returned anyway. We are prepared for this
situation inside `mp_row_consumer` by consulting the mutation fragment
filter again when the rest of the row arrives.

Fast forwarding is not supported at this point, which is fine given that
the cache is disabled for reversed queries for now (and the cache is the
only user of fast forwarding).

The `partition_slice` provided by callers is provided in 'half-reversed'
format for reversed queries, where the order of clustering ranges is
reversed, but the ranges themselves are not. This means we need to modify
the slice sometimes: for non-single-partition queries the mx reader must
use a non-reversed slice, and for single-partition queries the mx reader
must use a native-reversed slice (where the clustering ranges themselves
are reversed as well). The modified slice must be stored somewhere; we
store it inside the mx reader itself so we don't need to allocate more
intermediate readers at the call sites.  This causes the interface of
`mx::make_reader` to be a bit weird: for non-single-partition queries
where the provided slice is reversed the reader will actually return a
non-reversed stream of fragments, telling the user to reverse the stream
on their own. The interface has been documented in detail with
appropriate comments.
2021-10-04 15:24:12 +02:00
..