Files
scylladb/db/commitlog
Glauber Costa abe7358767 prevent commitlog replay position reordering during reserve refill
When requests hit the commitlog, each of them will be assigned a replay
position, which we expect to be ordered. If reorders happen, the request
will be discarded and re-applied. Although this is supposed to be rare,
it does increase our latencies, specially when big requests are
involved. Processing big requests is expensive and if we have to do it
twice that adds to the cost.

The commitlog is supposed to issue replay positions in order, and it
coudl be that the code that adds them to the memtables will reorder
them. However, there is one instance in which the commitlog will not
keep its side of the bargain.

That happens when the reserve is exhausted, and we are allocating a
segment directly at the same time the reserve is being replenished.  The
following sequence of events with its deferring points will ilustrate
it:

on_timer:

    return this->allocate_segment(false). // defer here // then([this](sseg_ptr s) {

At this point, the segment id is already allocated.

new_segment():

    if (_reserve_segments.empty()) {
	[ ... ]
        return allocate_segment(true).then ...

At this point, we have a new segment that has an id that is higher than
the previous id allocated.

Then we resume the execution from the deferring point in on_timer():

    i = _reserve_segments.emplace(i, std::move(s));

The next time we need to allocate a segment, we'll pick it from the
reserve. But the segment in the reserve has an id that is lower than the
id that we have already used.

Reorders are bad, but this one is particularly bad: because the reorder
happens with the segment id side of the replay position, that means that
every request that falls into that segment will have to be reinserted.

This bug can be a bit tricky to reproduce. To make it more common, we
can artificially add a sleep() fiber after the allocate_segment(false)
in on_timer(). If we do that, we'll see a sea of reinsertions going on
in the logs (if dblog is set to debug).

Applying this patch (keeping the sleep) will make them all disappear.
We do this by rewriting the reserve logic, so that the segments always
come from the reserve. If we draw from a single pool all the time, there
is no chance of reordering happening. To make that more amenable, we'll
have the reserve filler always running in the background and take it out
of the timer code.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49eb7edfcafaef7f1fdceb270639a9a8b50cfce7.1480531446.git.glauber@scylladb.com>
(cherry picked from commit 99a5a77234)
2016-12-01 13:33:35 +01:00
..
2016-09-28 17:34:16 +03:00
2016-10-19 13:56:36 -04:00