When streaming SSTables by tablet range, the original implementation of tablet_sstable_streamer::stream may break out of the loop too early when encountering a non-overlaping SSTable. As a result, subsequent SSTables that should be classified as partially contained are skipped entirely.
Tablet range: [4, 5]
SSTable ranges:
[0,5]
[0, 3] <--- is considered exhausted, and causes skip to next tablet
[2, 5] <--- is missed for range [4, 5]
The loop uses if (!overlaps) break; semantics, which conflated “no overlap” with “done scanning.” This caused premature termination when an SSTable did not overlapped but the following one did.
Correct logic should be:
before(sst_last) → skip and continue.
after(sst_first) → break (no further SSTables can overlap).
Otherwise → `contains` to classify as full or partial.
Missing SSTables in streaming and potential data loss or incomplete streaming in repair/streaming operations.
1. Correct the loop termination logic that previously caused certain SSTables to be prematurely excluded, resulting in lost mutations. This change ensures all relevant SSTables are properly streamed and their mutations preserved.
2. Refactor the loop to use before() and after() checks explicitly, and only break when the SSTable is entirely after the tablet range
3. Add pytest to cover this case, full streaming flow by means of `restore`
4. Add boost tests to test the new refactored function
This data corruption fix should be ported back to 2024.2, 2025.1, 2025.2, 2025.3 and 2025.4
Fixes: https://github.com/scylladb/scylladb/issues/26979Closesscylladb/scylladb#26980
* github.com:scylladb/scylladb:
streaming: fix loop break condition in tablet_sstable_streamer::stream
streaming: add pytest case to reproduce mutation loss issue