This change inserts preemption points between removal of partitions. The main complication is in maintaining consitency in the face of concurrent population or eviction. We use the same mechanism which is used by memtable updates. _prev_snapshot_pos is the ring position which partitions the ring into the part which is already updated in cache and the one which is yet to be updated. That position should be set accordingly on preemption. In case of invalidation, updating means removing all entries in the range and marking the range as discontinuous. When resuming invalidation of a range we continue from _prev_snapshot_pos as the lower bound. This affects high-level operations like nodetool refresh, table truncation, repair and streaming. Fixes #2683 The improvement on stalls was measured using tests/perf_row_cache_update: Before Small partitions, no overwrites: invalidation: 339.420624 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 339.422144 [ms]} Small partition with a few rows: invalidation: 191.855331 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 191.856816 [ms]} Large partition, lots of small rows: invalidation: 0.959328 [ms], preemption: {count: 2, 99%: 0.008239 [ms], max: 0.961453 [ms]} After: Small partitions, no overwrites: invalidation: 400.505554 [ms], preemption: {count: 843, 99%: 0.545791 [ms], max: 0.502340 [ms]} Small partition with a few rows: invalidation: 306.352600 [ms], preemption: {count: 644, 99%: 0.545791 [ms], max: 0.506464 [ms]} Large partition, lots of small rows: invalidation: 0.963660 [ms], preemption: {count: 2, 99%: 0.009887 [ms], max: 0.963264 [ms]} The maximum scheduling latency went down form 339 ms to 0.5 ms (task quota).
56 KiB
56 KiB