Files
scylladb/sstables
Raphael S. Carvalho 29c93ae592 compaction: Reduce backlog of compacting SSTable properly
It was observed that as compaction progresses the backlog of compacting SSTable
is being reduced very slowly, which causes shares to be higher than needed, and
consequently compaction acts much more aggressively than it has to.

https://user-images.githubusercontent.com/1409139/120237819-93dfc080-c232-11eb-9042-68114e285ea0.png

The graph above shows the amount of backlog that is reduced from a SSTable
being compacted. The red line denotes the total backlog of the SSTable, before
it's selected for compaction. The expectation is that the more a SSTable is
compacted the more backlog will be reduced from it. However, in the current
implementation, it can be seen that the backlog to be reduced, from the SSTable
being compacted, starts being inversely proportional to the amount of data
already compacted.

Turns out that this problem happens because the implementation of backlog
formula becomes incorrect when the SSTable is being compacted.

Backlog for a sstable is currently defined as:
    Bi = Ei * log (T / Ei)

    where Ei = Si - Ci (bytes left to be compacted)
        and Si = size of SStable
        and Ci = total bytes compacted
        and T = total size of table

The formula above can also be rewritten as follows:
    Bi = Ei * log (T) - Ei * log (Ei)

the second term `Ei * log (Ei)` can be rewritten as:
    = (Si - Ci) * log (Ei)
    = Si * log (Ei) - Ci * log (Ei)

However, digging backlog implementation, turns out that we're incorrectly
implementing that second term as:
    = Si * log (Si) - Ci * log (Ei)

Given that Si > Ei, for a SSTable being compacted, the backlog will be higher
than it should.

the following table shows how the backlog of a SSTable being compacted behaves
now versus how it's supposed to behave:
https://gist.github.com/raphaelsc/42e14be0d7d4ed264e538c2d217c8f95

Turns out that this is not the only problem. It was a mistake to change the
formula from `Ei * log(T / Si)` to `Ei * log(T / Ei)`, when fixing the
shrinking table issue, because that also causes the backlog of a compacting
SSTable to be incorrectly reduced.

With the formula rewritten as follows:
    Bi = Ei * log (T) - Ei * log (Ei)

It becomes clear that the more a SSTable is compacted, the slower it becomes
for backlog to be reduced, as T / Ei can increase considerably over time.

So we're reverting the formula back to `Ei * log(T / Si)`.

The graph below shows a better backlog behavior when table is shrinking:
https://user-images.githubusercontent.com/1409139/123495186-06a54700-d5f9-11eb-9386-3fcf4dd8e4d3.png

While analyzing the problem when table is shrinking, realized that it's because
T in the formula is implemented as the effective size (total + partial -
compacted).

With the new formula rewritten as follows:
    Bi = Ei * log (T) - Ei * log (Si)

It becomes clearer that T cannot be lower than Si whatsoever, otherwise the
backlog becomes negative. Also, while table is shrinking, it can happen that
the backlog will be so low that compaction will barely make any progress.
To fix both issues, let's implement T as total size (sum of all Si) rather than
effective size (sum of all Ei).

The graph below shows that this change prevents the backlog from going negative
while still providing similar and expected behavior as before, see:
https://user-images.githubusercontent.com/1409139/123495185-060cb080-d5f9-11eb-89f7-ed445729702a.png

Fixes #8768.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20210626003133.3011007-1-raphaelsc@scylladb.com>
2021-06-27 11:43:48 +03:00
..
2021-06-14 14:37:33 +02:00