Files
scylladb/sstables
Raphael S. Carvalho 99b75d1f63 compaction: Improve compaction efficiency by killing the procedure that trims jobs
This procedure consists of trimming SSTables off a compaction job until its weight[1]
is smaller than one already taken by a running compaction. Min threshold is respected
though, we only trim a job while its size is > min threshold.

[1]: this value is a logarithimic function of the total size of the SSTables in a
given job, and it's used to control the compaction parallelism.

It's intended to improve the compaction efficiency by allowing more jobs to run in
parallel, but it turns out that this can have an opposite effect because the write
amplification can be significantly increased.

Take STCS for example, the more similar-sized SSTables you compact together, the
higher the compaction efficiency will be. With the trimming procedure, we're aiming
at running smaller jobs, thinking that running more parallel compactions will provide
us with better performance, but that's not true. Most of the efficiency comes from
making informed decisions when selecting candidates for compaction.

Similarly, this will also hurt TWCS, which does STCS in current window, and a sort
of major compaction when the current window closes. If the TWCS jobs are trimmed,
we'll likely need another compaction to get to the desired state, recompacting
the same data again.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20200728143648.31349-1-raphaelsc@scylladb.com>
2020-07-28 17:44:00 +03:00
..
2020-03-03 11:34:00 +01:00
2020-03-03 11:34:00 +01:00