database: Make soft-pressure memtable flusher not consider already flushed memtables

The flusher picks the memtable list which contains the largest region
according to region_impl::evictable_occupancy().total_space(), which
follows region::occupancy().total_space(). But only the latest
memtable in the list can start flushing. It can happen that the
memtable corresponding to the largest region was already flushed to an
sstable (flush permit released), but not yet fsynced or moved to
cache, so it's still in the memtable list.

The latest memtable in the winning list may be small, or empty, in
which case the soft pressure flusher will not be able to make much
progress. There could be other memtable lists with non-empty
(flushable) latest memtables. This can lead to writes unnecessarily
blocking on dirty.

I observed this for the system memtable group, where it's easy for the
memtables to overshoot small soft pressure limits. The flusher kept
trying to flush empty memtables, while the previous non-empty memtable
was still in the group.

The CPU scheduler makes this worse, because it runs memtable_to_cache
in a separate scheduling group, so it further defers in time the
removal of the flushed memtable from the memtable list.

This patch fixes the problem by making regions corresponding to
memtables which started flushing report evictable_occupancy() as 0, so
that they're picked by the flusher last.

Fixes #3716.
Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>
This commit is contained in:
Tomasz Grabiec
2018-08-23 18:02:12 +02:00
committed by Avi Kivity
parent 364418b5c5
commit 1e50f85288
3 changed files with 29 additions and 2 deletions

View File

@@ -969,6 +969,11 @@ table::seal_active_memtable(flush_permit&& permit) {
}
_memtables->add_memtable();
_stats.memtable_switch_count++;
// This will set evictable occupancy of the old memtable region to zero, so that
// this region is considered last for flushing by dirty_memory_manager::flush_when_needed().
// If we don't do that, the flusher may keep picking up this memtable list for flushing after
// the permit is released even though there is not much to flush in the active memtable of this list.
old->region().ground_evictable_occupancy();
auto previous_flush = _flush_barrier.advance_and_await();
auto op = _flush_barrier.start();

View File

@@ -1140,6 +1140,9 @@ private:
// occupancy. We could actually just present this as a scalar as well and never use occupancies,
// but consistency is good.
size_t _evictable_space = 0;
// This is a mask applied to _evictable_space with bitwise-and before it's returned from evictable_space().
// Used for forcing the result to zero without using conditionals.
size_t _evictable_space_mask = std::numeric_limits<size_t>::max();
bool _evictable = false;
region_sanitizer _sanitizer;
uint64_t _id;
@@ -1349,8 +1352,16 @@ public:
}
occupancy_stats evictable_occupancy() const {
return occupancy_stats(0, _evictable_space);
return occupancy_stats(0, _evictable_space & _evictable_space_mask);
}
void ground_evictable_occupancy() {
_evictable_space_mask = 0;
if (_group) {
_group->decrease_evictable_usage(_heap_handle);
}
}
//
// Returns true if this region can be compacted and compact() will make forward progress,
// so that this will eventually stop:
@@ -1739,6 +1750,10 @@ void region::make_evictable(eviction_fn fn) {
get_impl().make_evictable(std::move(fn));
}
void region::ground_evictable_occupancy() {
get_impl().ground_evictable_occupancy();
}
const eviction_fn& region::evictor() const {
return get_impl().evictor();
}

View File

@@ -295,8 +295,12 @@ public:
update(delta);
}
void decrease_usage(region_heap::handle_type& r_handle, ssize_t delta) {
void decrease_evictable_usage(region_heap::handle_type& r_handle) {
_regions.decrease(r_handle);
}
void decrease_usage(region_heap::handle_type& r_handle, ssize_t delta) {
decrease_evictable_usage(r_handle);
update(delta);
}
@@ -621,6 +625,9 @@ public:
return allocator().invalidate_counter();
}
// Will cause subsequent calls to evictable_occupancy() to report empty occupancy.
void ground_evictable_occupancy();
// Makes this region an evictable region. Supplied function will be called
// when data from this region needs to be evicted in order to reclaim space.
// The function should free some space from this region.