Files
scylladb/dht
Raphael S. Carvalho 8d58ff1be6 compaction: Make SSTable cleanup more efficient by fast forwarding to next owned range
Today, SSTable cleanup skips to the next partition, one at a time, when it finds
that the current partition is no longer owned by this node.

That's very inefficient because when a cluster is growing in size, existing
nodes lose multiple sequential tokens in its owned ranges. Another inefficiency
comes from fetching index pages spanning all unowned tokens, which was described
in #14317.

To solve both problems, cleanup will now use multi range reader, to guarantee
that it will only process the owned data and as a result skip unowned data.
This results in cleanup scanning an owned range and then fast forwarding to the
next one, until it's done with them all. This reduces significantly the amount
of data in the index caching, as index will only be invoked at each range
boundary instead.

Without further ado,

before:

... 2GB to 1GB (~50% of original) in 26248ms = 81MB/s. ~9443072 total partitions merged to 4750028.

after:

... 2GB to 1GB (~50% of original) in 17424ms = 123MB/s. ~9443072 total partitions merged to 4750028.

Fixes #12998.
Fixes #14317.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-07-11 13:56:24 -03:00
..