mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-20 00:20:47 +00:00
In partition_snapshot_row_cursor::maybe_refresh(), the !is_in_latest_version() path calls lower_bound(_position) on the latest version's rows to find the cursor's position in that version. When lower_bound returns null (the cursor is positioned above all entries in the latest version in table order), the code unconditionally sets _background_continuity = true and allows the subsequent if(!it) block to erase the latest version's entry from the heap. This is correct for forward traversal: null means there are no more entries ahead, so removing the version from the heap is safe. However, in reversed mode, null from lower_bound means the cursor is above all entries in table order -- those entries are BELOW the cursor in query order and will be visited LATER during reversed traversal. Erasing the heap entry permanently loses them, causing live rows to be skipped. The fix mirrors what prepare_heap() already does correctly: when lower_bound returns null in reversed mode, use std::prev(rows.end()) to keep the last entry in the heap instead of erasing it. Add test_reversed_maybe_refresh_keeps_latest_version_entry to mvcc_test, alongside the existing reversed cursor tests. The test creates a two-version partition snapshot (v0 with range tombstones, v1 with a live row positioned below all v0 entries in table order), and traverses in reverse calling maybe_refresh() at each step -- directly exercising the buggy code path. The test fails without the fix. The bug was introduced by6b7473be53("Handle non-evictable snapshots", 2022-11-21), which added null-iterator handling for non-evictable snapshots (memtable snapshots lack the trailing dummy entry that evictable snapshots have). prepare_heap() got correct reversed-mode handling at that time, but maybe_refresh() received only forward-mode logic. The bug is intermittent because multiple mechanisms cause iterators_valid() to return false, forcing maybe_refresh() to take the full rebuild path via prepare_heap() (which handles reversed mode correctly): - Mutation cleaner merging versions in the background (changes change_mark) - LSA segment compaction during reserve() (invalidates references) - B-tree rebalancing on partition insertion (invalidates references) - Debug mode's always-true need_preempt() creating many multi-version partitions via preempted apply_monotonically() A dtest reproducer confirmed the same root cause: with 100K overlapping range tombstones creating a massively multi-version memtable partition (287K preemption events), the reversed scan's latest_iterator was observed jumping discontinuously during a version transition -- the latest version's heap entry was erased -- causing the query to walk the entire partition without finding the live row. Fixes: SCYLLADB-1253 Closes scylladb/scylladb#29368 (cherry picked from commit21d9f54a9a) Closes scylladb/scylladb#29480