mirror of
https://github.com/scylladb/scylladb.git
synced 2026-06-02 13:06:57 +00:00
It was observed that as compaction progresses the backlog of compacting SSTable is being reduced very slowly, which causes shares to be higher than needed, and consequently compaction acts much more aggressively than it has to. https://user-images.githubusercontent.com/1409139/120237819-93dfc080-c232-11eb-9042-68114e285ea0.png The graph above shows the amount of backlog that is reduced from a SSTable being compacted. The red line denotes the total backlog of the SSTable, before it's selected for compaction. The expectation is that the more a SSTable is compacted the more backlog will be reduced from it. However, in the current implementation, it can be seen that the backlog to be reduced, from the SSTable being compacted, starts being inversely proportional to the amount of data already compacted. Turns out that this problem happens because the implementation of backlog formula becomes incorrect when the SSTable is being compacted. Backlog for a sstable is currently defined as: Bi = Ei * log (T / Ei) where Ei = Si - Ci (bytes left to be compacted) and Si = size of SStable and Ci = total bytes compacted and T = total size of table The formula above can also be rewritten as follows: Bi = Ei * log (T) - Ei * log (Ei) the second term `Ei * log (Ei)` can be rewritten as: = (Si - Ci) * log (Ei) = Si * log (Ei) - Ci * log (Ei) However, digging backlog implementation, turns out that we're incorrectly implementing that second term as: = Si * log (Si) - Ci * log (Ei) Given that Si > Ei, for a SSTable being compacted, the backlog will be higher than it should. the following table shows how the backlog of a SSTable being compacted behaves now versus how it's supposed to behave: https://gist.github.com/raphaelsc/42e14be0d7d4ed264e538c2d217c8f95 Turns out that this is not the only problem. It was a mistake to change the formula from `Ei * log(T / Si)` to `Ei * log(T / Ei)`, when fixing the shrinking table issue, because that also causes the backlog of a compacting SSTable to be incorrectly reduced. With the formula rewritten as follows: Bi = Ei * log (T) - Ei * log (Ei) It becomes clear that the more a SSTable is compacted, the slower it becomes for backlog to be reduced, as T / Ei can increase considerably over time. So we're reverting the formula back to `Ei * log(T / Si)`. The graph below shows a better backlog behavior when table is shrinking: https://user-images.githubusercontent.com/1409139/123495186-06a54700-d5f9-11eb-9386-3fcf4dd8e4d3.png While analyzing the problem when table is shrinking, realized that it's because T in the formula is implemented as the effective size (total + partial - compacted). With the new formula rewritten as follows: Bi = Ei * log (T) - Ei * log (Si) It becomes clearer that T cannot be lower than Si whatsoever, otherwise the backlog becomes negative. Also, while table is shrinking, it can happen that the backlog will be so low that compaction will barely make any progress. To fix both issues, let's implement T as total size (sum of all Si) rather than effective size (sum of all Ei). The graph below shows that this change prevents the backlog from going negative while still providing similar and expected behavior as before, see: https://user-images.githubusercontent.com/1409139/123495185-060cb080-d5f9-11eb-89f7-ed445729702a.png Fixes #8768. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Message-Id: <20210626003133.3011007-1-raphaelsc@scylladb.com>