Files
scylladb/sstables
Raphael S. Carvalho 91260cf91b sstables: Fix Incremental Compaction Efficiency
Compaction prevents data resurrection from happening by checking that there's
no way a data shadowed by a GC'able tombstone will survive alone, after
a failure for example.

Consider the following scenario:
We have two runs A and B, each divided to 5 fragments, A1..A5, B1..B5.

They have the following token ranges:

 A:  A1=[0, 3]   A2=[4, 7]  A3=[8, 11]   A4=[12, 15]    A5=[16,18]
B is the same as A's ranges, offset by 1:

 B:  B1=[1,4]    B2=[5,8]  B3=[9,12]    B4=[13,16]    B5=[17,19]

Let's say we are finished flushing output until position 10 in the compaction.
We are currently working on A3 and B3, so obviously those cannot be deleted.
Because B2 overlaps with A3, we cannot delete B2 either.
Otherwise, B2 could have a GC'able tombstone that shadows data in A3, and after
B2 is gone, dead data in A3 could be resurrected *on failure*.
Now, A2 overlaps with B2 which we couldn't delete yet, so we can't delete A2.
Now A2 overlaps with B1 so we can't delete B1. And B1 overlaps with A1 so
we can't delete A1. So we can't delete any fragment.

The problem with this approach is obvious, fragments can potentially not be
released due to data dependency, so incremental compaction efficiency is
severely reduced.
To fix it, let's not purge GC'able tombstones right away in the mutation
compactor step. Instead, let's have compaction writing them to a separate
sstable run that would be deleted in the end of compaction.
By making sure that tombstone information from all compacting sstables is not
lost, we no longer need to have incremental compaction imposing lots of
restriction on which fragments could be released. Now, any sstable which data
is safe in a new sstable can be released right away. In addition, incremental
compaction will only take place if compaction procedure is working with one
multi-fragment sstable run at least.

Fixes #4531.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2019-10-12 21:36:03 -03:00
..
2018-11-21 00:01:44 +02:00
2018-11-21 00:01:44 +02:00
2019-01-22 15:34:32 +02:00
2019-04-12 09:33:40 +02:00