mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-01 12:36:56 +00:00
When we get two range tombstones with the same lower bound from different data sources (e.g. two sstable), which need to be combined into a single stream, they need to be de-overlapped, because each mutation fragment in the stream must have a different position. If we have range tombstones [1, 10) and [1, 20), the result of that de-overlapping will be [1, 10) and [10, 20]. The problem is that if the stream corresponds to a clustering slice with upper bound greater than 1, but lower than 10, the second range tombstone would appear as being out of the query range. This is currently violating assumptions made by some consumers, like cache populator. One effect of this may be that a reader will miss rows which are in the range (1, 10) (after the start of the first range tombstone, and before the start of the second range tombstone), if the second range tombstone happens to be the last fragment which was read for a discontinuous range in cache and we stopped reading at that point because of a full buffer and cache was evicted before we resumed reading, so we went to reading from the sstable reader again. There could be more cases in which this violation may resurface. There is also a related bug in mutation_fragment_merger. If the reader is in forwarding mode, and the current range is [1, 5], the reader would still emit range_tombstone([10, 20]). If that reader is later fast forwarded to another range, say [6, 8], it may produce fragments with smaller positions which were emitted before, violating monotonicity of fragment positions in the stream. A similar bug was also present in partition_snapshot_flat_reader. Possible solutions: 1) relax the assumption (in cache) that streams contain only relevant range tombstones, and only require that they contain at least all relevant tombstones 2) allow subsequent range tombstones in a stream to share the same starting position (position is weakly monotonic), then we don't need to de-overlap the tombstones in readers. 3) teach combining readers about query restrictions so that they can drop fragments which fall outside the range 4) force leaf readers to trim all range tombstones to query restrictions This patch implements solution no 2. It simplifies combining readers, which don't need to accumulate and trim range tombstones. I don't like solution 3, because it makes combining readers more complicated, slower, and harder to properly construct (currently combining readers don't need to know restrictions of the leaf streams). Solution 4 is confined to implementations of leaf readers, but also has disadvantage of making those more complicated and slower. Fixes #3093.
26 KiB
26 KiB