mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-26 11:30:36 +00:00
Compaction manager allows compaction of different weights to proceed in
parallel. For example, a small-sized compaction job can happen in parallel to a
large-sized one, but similar-sized jobs are serialized.
The problem is the current definition of weight, which is the log (base 4) of
total size (size of all sstables) of a job.
This is what we get with the current weight definition:
weight=5 for sizes=[1K, 3K]
weight=6 for sizes=[4K, 15K]
weight=7 for sizes=[16K, 63K]
weight=8 for sizes=[64K, 255K]
weight=9 for sizes=[258K, 1019K]
weight=10 for sizes=[1M, 3M]
weight=11 for sizes=[4M, 15M]
weight=12 for sizes=[16M, 63M]
weight=13 for sizes=[64M, 254M]
weight=14 for sizes=[256M, 1022M]
weight=15 for sizes=[1033M, 4078M]
weight=16 for sizes=[4119M, 10188M]
total weights: 12
Note that for jobs smaller than 1MB, we have 5 different weights, meaning 5
jobs smaller than 1MB could proceed in parallel. High number of parallel
compactions can be observed after repair, which potentially produces tons of
small sstables of varying sizes. That causes compaction to use a significant
amount of resources.
To fix this problem, let's add a fixed tax to the size before taking the log,
so that jobs smaller than 1M will all have the same weight.
Look at what we get with the new weight definition:
weight=10 for sizes=[1K, 2M]
weight=11 for sizes=[3M, 14M]
weight=12 for sizes=[15M, 62M]
weight=13 for sizes=[63M, 254M]
weight=14 for sizes=[256M, 1022M]
weight=15 for sizes=[1033M, 4078M]
weight=16 for sizes=[4119M, 10188M]
total weights: 7
Fixes #8124.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210217123022.241724-1-raphaelsc@scylladb.com>