The segment heap is a max-heap, with sparser segments on the top. When
we free from a segment its occupancy is decreased, but its position in
the heap increases.
This bug caused that we picked up segments for compaction in the wrong
order. In extreme cases this can lead to a livelock, in some cases may
just increase compaction latency.
A region being merged can still be in use; but after merging, compaction_lock
and the reclaim counter will no longer work. This can lead to
use-after-compact-without-re-lookup errors.
Fix by making the source region be the same as the target region; they
will share compaction locks and reclaim counters, so lookup avoidance
will still work correctly.
Fixes#286.
In some cases region may be in a state where it is not empty and
nothing could be evicted from it. For example when creating the first
entry, reclaimer may get invoked during creation before it gets
linked. We therefore can't rely on emptiness as a stop condition for
reclamation, the evction function shall signal us if it made forward
progress.
Related to #259. In some cases we need to allocate memory and hold
reclaim lock at the same time. If that region holds most of the
reclaimable memory, allocations inside that code section may
fail. allocating_section is a work-around of the problem. It learns
how big reserves shold be from past execution of critical section and
tries to ensure proper reserves before entering the section.
Some times we may close an empty active segment, if all data in it was
evicted. Normally segments are removed as soon as the last object in
it is freed, but if the segment is already empty when closed, noone is
supposed to call free on it. Such segments would be quickly reclaimed
during compaction, but it's possible that we will destroy the region
before they're reclaimed by compaction. Currently we would fail on an
assertion which checks that there are no segments. This change fixes
the problem by handling empty closed segments when region is
destroyed.
Memory releasing is invoked from destructors so should not throw. As a
consequence it should not allocate memory, so emergency segment pool
was switched from std::deque<> to an alloc-free intrusive stack.
Disabling compaction of a region is currently done in order to keep
the references valid. But disabling only compaction is not enough, we
also need to disable eviction, as it also invalidates
references. Rather than introducing another type of lock, compaction
and eviction are controlled together, generalized as "reclaiming"
(hence the reclaim_lock).
The goal is to make allocation less likely to fail. With async
reclaimer there is an implicit bound on the amount of memory that can
be allocated between deferring points. This bound is difficult to
enforce though. Sync reclaimer lifts this limitation off.
Also, allocations which could not be satisfied before because of
fragmentation now will have higher chances of succeeding, although
depending on how much memory is fragmented, that could involve
evicting a lot of segments from cache, so we should still avoid them.
Downside of sync reclaiming is that now references into regions may be
invalidated not only across deferring points but at any allocation
site. compaction_lock can be used to pin data, preferably just
temporarily.
* seastar 5176352...68fee6c (1):
> Merge "Memory reclamation infrastructure follow-up" from Tomasz
Adjusted logalloc::tracker's reclaimer to fit new API
Instead of failing normal allocations when the seastar allocator cannot
allocate a segment, provide a generous reserve. An allocation failure
will now be satisified from the reserve, but it will still trigger a
reclaim. This allows hiding low-memory conditions from the user.
This heavily used function shows up in many places in the profile (as part
of other functions), so it's worth optimizing by eliminating the special
case for the standard allocator. Use a statically allocated object instead.
(a non-thread-local object is fine since it has no data members).
While #152 is still open, we need to allow for moderately sized allocations
to succeed. Extend the segment size to 256k, which allows for threads to
be allocated.
Fixes#151.
To free memory, we need to allocate memory. In lsa compaction, we convert
N segments with average occupancy of (N-1)/N into N-1 new segments. However,
to do that, we need to allocate segments, which we may not be able to do
due to the low memory condition which caused us to compact anyway.
Fix by introducing a segment reserve, which we normally try to ensure is
full. During low memory conditions, we temporarily allow allocating from
the emergency reserve.
When LSA reclaimer cannot reclaim more space by compaction, it
will reclaim data by evicting from evictable regions.
Currently the only evictable region is the one owned by the row cache.
Requiring alignment means that there must be 64K of contiguous space
to allocate each 32K segment. When memory is fragmented, we may fail
to allocate such segment, even though there's plenty of free space.
This especially hurts forward progress of compaction, which frees
segments randomly and relies on the fact that freeing a segment will
make it available to the next segment request.