Files
scylladb/sstables/kl
Avi Kivity 1930f3e67f Merge 'sstables/mx/reader: accommodate inexact partition indexes' from Michał Chojnowski
Unlike the currently-used sstable index files, BTI indexes don't store the entire partition keys. They only store prefixes of decorated keys, up to the minimum length needed to differentiate a key from its neighbours in the sstable. This saves space.

However, it means that a BTI index query might be off by one partition (on each end of the queried partition range) with respect to the optimal Data position.

For example, if the index stores prefixes `a`, `b`, `c`,
the index has no way to know if the first index entry after key `bb`
is `b` (which might correspond to `ba` as well as `bc`), or `c`.
So the index reader conservatively has to pick the wider Data range, and the Data reader must ignore the superfluous partitions. (And there's no way around that.)

Before this patch, the sstable reader expects the index query to return an exact (optimal) Data range. This patch adjusts the logic of the sstable reader to allow for inexact ranges.

Note: the patch is more complicated that it looks. The logic of the sstable reader was already fairly hard to follow and this adds even more flags, more weird special states and more edge cases. I think I managed to write a decent test and it did find three or four edge cases I wouldn't have noticed otherwise. I think it should cover all the added logic, but I didn't verify code coverage. (Do our scripts for that even work nowadays)? Simplification ideas are welcome.

Preparation for new functionality, no backporting needed.

Closes scylladb/scylladb#25093

* github.com:scylladb/scylladb:
  sstables/index_reader: weaken some exactness guarantees in abstract_index_reader
  test/boost: add a test for inexact index lookups
  sstables/mx/reader: allow passing a custom index reader to the constructor
  sstables/index_reader: remove advance_to
  sstables/mx/reader: handle inexact lookups in `advance_context()`
  sstables/mx/reader: handle inexact lookups in `advance_to_next_partition()`
  sstables/index_reader: make the return value of `get_partition_key` optional
  sstables/mx/reader: handle "backward jumps" in forward_to
  sstables/mx/reader: filter out partitions outside the queried range
  sstables/mx/reader: update _pr after `fast_forward_to`
2025-07-27 19:39:36 +03:00
..