Files
scylladb/cache_flat_mutation_reader.hh
Tomasz Grabiec 6863a5e43b row_cache: Avoid generating overlapping range tombstones
Row cache reader can produce overlapping range tombstones in the
mutation fragment stream even if there is only a single range
tombstone in sstables, due to #2581. For every range between two rows,
the row cache reader queries for tombstones relevant for that
range. The result of the query is trimmed to the current position of
the reader (=position of the previous row) to satisfy key
monotonicity. The end position of range tombstones is left
unchanged. So cache reader will split a single range tombstone around
rows. Those range tombstones are transient, they will be only
materialized in the reader's stream, they are not persisted anywhere.

That is not a problem in itself, but it interacts badly with mutation
compactor due to #8625. The range_tombstone_accumulator which is used
to compact the mutation fragment stream needs to accumulate all
tombstones which are relevant for the current clustering position in
the stream. Adding a new range tombstone is O(N) in the number of
currently active tombstones. This means that producing N rows will be
O(N^2).

In a unit test, I saw reading 137'248 rows which overlap with a range
tombstone take 245 seconds. Almost all of CPU time is in
drop_unneeded_tombstones().

The solution is to make the cache reader trim range tombstone end to
the currently emited sub-range, so that it emits non-overlapping range
tombstones.

Fixes #8626.
2021-05-12 00:10:24 +02:00

35 KiB