Commit Graph

38 Commits

Author SHA1 Message Date
Glauber Costa
2bffa8af74 logalloc: make sure allocations in release_requests don't recurse back into the allocator
Calls like later() and with_gate() may allocate memory, although that is not
very common. This can create a problem in the sense that it will potentially
recurse and bring us back to the allocator during free - which is the very thing
we are trying to avoid with the call to later().

This patch wraps the relevant calls in the reclaimer lock. This do mean that the
allocation may fail if we are under severe pressure - which includes having
exhausted all reserved space - but at least we won't recurse back to the
allocator.

To make sure we do this as early as possible, we just fold both release_requests
and do_release_requests into a single function

Thanks Tomek for the suggestion.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <980245ccc17960cf4fcbbfedb29d1878a98d85d8.1470254846.git.glauber@scylladb.com>
(cherry picked from commit fe6a0d97d1)
2016-08-04 11:17:54 +02:00
Glauber Costa
4a6d0d503f logalloc: make sure blocked requests memory allocations are served from the standar allocator
Issue 1510 describes a scenario in which, under load, we allocate memory within
release_requests() leading to a reentry into an invalid state in our
blocked requests' shared_promise.

This is not easy to trigger since not all allocations will actually get to the
point in which they need a new segment, let alone have that happening during
another allocator call.

Having those kinds of reentry is something we have always sought to avoid with
release_requests(): this is the reason why most of the actual routine is
deferred after a call to later().

However, that is a trick we cannot use for updating the state of the blocked
requests' shared_promise: we can't guarantee when is that going to run, and we
always need a valid shared_promise, in a valid state, waiting for new requests
to hook into.

The solution employed by this patch is to make sure that no allocation
operations whatsoever happen during the initial part of release_requests on
behalf of the shared promise.  Allocation is now deferred to first use, which
relieves release_requests() from all allocation duties. All it needs to do is
free the old object and signal to the its user that an allocation is needed (by
storing {} into the shared_promise).

Fixes #1510

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49771e51426f972ddbd4f3eeea3cdeef9cc3b3c6.1470238168.git.glauber@scylladb.com>
(cherry picked from commit ad58691afb)
2016-08-04 11:17:49 +02:00
Tomasz Grabiec
e783b58e3b Merge branch 'glommer/LSA-throttler-v6' from git@github.com:glommer/scylla.gi
From Glauber:

This is my new take at the "Move throttler to the LSA" series, except
this one don't actually move anything anywhere: I am leaving all
memtable conversion out, and instead I am sending just the LSA bits +
LSA active reclaim. This should help us see where we are going, and
then we can discuss all memtable changes in a series on its own,
logically separated (and hopefully already integrated with virtual
dirty).

[tgrabiec: trivial merge conflicts in logalloc.cc]
2016-06-21 10:22:26 +02:00
Glauber Costa
579d121db8 LSA: export largest region
We now keep the regions sorted by size, and the children region groups as well.
Internally, the LSA has all information it needs to make size-based reclaim
decisions. However, we don't do reclaim internally, but rather warn our user
that a pressure situation is mounted.

The user of a region_group doesn't need to evict the largest region in case of
pressure and is free to do whatever it chooses - including nothing. But more
likely than not, taking into account which region is the largest makes sense.

This patch puts together this last missing piece of the puzzle, and exports the
information we have internally to the user.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:51:00 -04:00
Glauber Costa
38a402307d LSA: enhance region_group reclaimer
We are currently just allowing the region_group to specify a throttle_threshold,
that triggers throttling when a certain amount of memory is reached. We would
like to notify the callers that such condition is reached, so that the callers
can do something to alleviate it - like triggering flushes of their structures.

The approach we are taking here is to pass a reclaimer instance. Any user of a
region_group can specialize its methods start_reclaiming and stop_reclaiming
that will be called when the region_group becomes under pressure or ceases to
be, respectively.

Now that we have such facility, it makes more sense to move the
throttle_threshold here than having it separately.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:59 -04:00
Glauber Costa
6404028c6a LSA: move subgroups to a heap as well
When we decide to evict from a specific region_group due to excessive memory
usage, we must also consider looking at each of their children (subgroups). It
could very well be that most of memory is used by one of the subgroups, and
we'll have to evict from there.

We also want to make sure we are evicting from the biggest region of all, and
not the biggest region in the biggest region_group. To understand why this is
important, consider the case in which the regions are memtables associated with
dirty region groups. It could be that a very big memtable was recently flushed,
and a fairly small one took its place. That region group is still quite large
because the memtable hasn't finished flushing yet, but that doesn't mean we
should evict from it.

To allow us to efficiently pick which region is the largest, each root of each
subtree will keep track of its maximal score, defined as the maximum between our
largest region total_space and the maximum maximal score of subtrees.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:13 -04:00
Glauber Costa
e1eab5c845 LSA: store regions in a heap for regions_group
Currently, the regions in a region group are organized in a simple vector.
We can do better by using a binomial heap, as we do for segments, and then
updating when there is change. Internally to the LSA, we are in good position
to always know when change happens, so that's really the best way to do it.

The end game here, is to easily call for the reclaim of the largest offending
region (potentially asynchronously). Because of that, we aren't really interested
in the region occupancy, but in the region reclaimable occuppancy instead: that's
simply equal to the occupancy if the region is reclaimable, and 0 otherwise. Doing
that effectively lists all non reclaimable regions in the end of the heap, in no
particular order.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:13 -04:00
Glauber Costa
54d4d46cf7 LSA: move throttling code to LSA.
The database code uses a throttling function to make sure that memory
used for the dirty region never is over the limit. We track that with
a region group, so it makes sense to move this as generic functionality
into LSA.

This patch implements the LSA-side functionality and a later patch will
convert the current memtable throttler to use it.

Unlike the current throttling mechanism, we'll not use a timer-based
mechanism here. Aside from being more generic and friendlier towards
other users, this is a good change for current memtable by itself.

The constants - 10ms and 1MB chosen by the current throttler are arbitrary, and we
would be better off without them. Let's discuss the merits of each separately:

1) 10ms timer: If we are throttling, we expect somebody to flush the memtables
for memory to be released. Since we are in position to know exactly when a memtable
was written, thus releasing memory, we can just call unthrottle at that point, instead
of using a timer.

2) 1MB release threshold: we do that because we have no idea how much memory a request
will use, so we put the cut somehow. However, because of 1) we don't call unthrottle
through a timer anymore, and do it directly instead. This means that we can just execute
the request and see how much memory it has used, with no need to guess. So we'll call
unthrottle at the end of every request that was previously throttled.

Writing the code this way also has the advantage that we need one less continuation in
the common case of the database not being throttled.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:34:19 -04:00
Glauber Costa
01a658f51d LSA: helper function for region_group
current hierarchy walk converted, but more users will come.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Glauber Costa
741aa16748 LSA: allow a region_group to have a threshold for throttling specified
Allocations will still be allowed if made directly, but callers will have the
choice (in an upcoming patch) to proceed only if memory is below this threshold.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Glauber Costa
7cd0c0731e region_group: delete move constructor
Tomek correctly points out that since we are now using "this" in lambda
captures, we should make the region_group not movable. We currently define a
move constructor, but there are no users. So we should just remove them.

copy constructor is already deleted, and so are the copy and move assignment
operators. So by removing the move constructor, we should be fine.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Tomasz Grabiec
86b76171a8 lsa: Use the same step in both internal and external reclamations 2016-06-14 15:13:15 +02:00
Tomasz Grabiec
d74d902a01 lsa: Make reclamation step configurable 2016-06-14 15:13:14 +02:00
Piotr Jastrzebski
136b8148d2 Use idle CPU to compact LSA memory
Register an idle CPU handler that compacts a single segment
every time there's nothing better to execute on CPU.

Signed-off-by: Piotr Jastrzebski <piotr@scylladb.com>
Message-Id: <c26aa608a1e0752fb9e6db1833ef3ba1de95f161.1464169748.git.piotr@scylladb.com>
2016-05-26 12:43:53 +03:00
Pekka Enberg
38a54df863 Fix pre-ScyllaDB copyright statements
People keep tripping over the old copyrights and copy-pasting them to
new files. Search and replace "Cloudius Systems" with "ScyllaDB".

Message-Id: <1460013664-25966-1-git-send-email-penberg@scylladb.com>
2016-04-08 08:12:47 +03:00
Tomasz Grabiec
a0cba3c86f logalloc: Introduce tracker::occupancy()
Returns occupancy information for all memory allocated by LSA, including
segment pools / zones.
2016-03-22 16:28:10 +01:00
Tomasz Grabiec
529c8b8858 logalloc: Rename tracker::occupancy() to region_occupancy() 2016-03-22 14:56:44 +01:00
Paweł Dziepak
83b004b2fb lsa: avoid fragmenting memory
Originally, lsa allocated each segment independently what could result
in high memory fragmentation. As a result many compaction and eviction
passes may be needed to release a sufficiently big contiguous memory
block.

These problems are solved by introduction of segment zones, contiguous
groups of segments. All segments are allocated from zones and the
algorithm tries to keep the number of zones to a minimum. Moreover,
segments can be migrated between zones or inside a zone in order to deal
with fragmentation inside zone.

Segment zones can be shrunk but cannot grow. Segment pool keeps a tree
containing all zones ordered by their base addresses. This tree is used
only by the memory reclamer. There is also a list of zones that have
at least one free segments that is used during allocation.

Segment allocation doesn't have any preferences which segment (and zone)
to choose. Each zone contains a free list of unused segments. If there
are no zones with free segments a new one is created.

Segment reclamation migrates segments from the zones higher in memory
to the ones at lower addresses. The remaining zones are shrunk until the
requested number of segments is reclaimed.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
2015-12-08 19:31:40 +01:00
Avi Kivity
1c425d6b50 logalloc: allow allocating_section code blocks to return references 2015-11-15 19:10:24 +02:00
Avi Kivity
d5cf0fb2b1 Add license notices 2015-09-20 10:43:39 +03:00
Tomasz Grabiec
8e1b3e5475 lsa: Remove underscore from local variable names 2015-09-10 12:40:12 +03:00
Avi Kivity
6d0a2b5075 logalloc: don't invalidate merged region
A region being merged can still be in use; but after merging, compaction_lock
and the reclaim counter will no longer work.  This can lead to
use-after-compact-without-re-lookup errors.

Fix by making the source region be the same as the target region; they
will share compaction locks and reclaim counters, so lookup avoidance
will still work correctly.

Fixes #286.
2015-09-08 08:55:44 +02:00
Tomasz Grabiec
3b441416fa lsa: Make segment size publicly accessible
Some tests depend on segment size.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
c82325a76c lsa: Make region evictor signal forward progress
In some cases region may be in a state where it is not empty and
nothing could be evicted from it. For example when creating the first
entry, reclaimer may get invoked during creation before it gets
linked. We therefore can't rely on emptiness as a stop condition for
reclamation, the evction function shall signal us if it made forward
progress.
2015-09-06 21:25:44 +02:00
Tomasz Grabiec
d022a1a4a3 lsa: Introduce allocating_section
Related to #259. In some cases we need to allocate memory and hold
reclaim lock at the same time. If that region holds most of the
reclaimable memory, allocations inside that code section may
fail. allocating_section is a work-around of the problem. It learns
how big reserves shold be from past execution of critical section and
tries to ensure proper reserves before entering the section.
2015-09-06 21:24:59 +02:00
Tomasz Grabiec
870e9e5729 lsa: Replace compaction_lock with broader reclaim_lock
Disabling compaction of a region is currently done in order to keep
the references valid. But disabling only compaction is not enough, we
also need to disable eviction, as it also invalidates
references. Rather than introducing another type of lock, compaction
and eviction are controlled together, generalized as "reclaiming"
(hence the reclaim_lock).
2015-09-01 17:29:04 +03:00
Tomasz Grabiec
d20fae96a2 lsa: Make reclaimer run synchronously with allocations
The goal is to make allocation less likely to fail. With async
reclaimer there is an implicit bound on the amount of memory that can
be allocated between deferring points. This bound is difficult to
enforce though. Sync reclaimer lifts this limitation off.

Also, allocations which could not be satisfied before because of
fragmentation now will have higher chances of succeeding, although
depending on how much memory is fragmented, that could involve
evicting a lot of segments from cache, so we should still avoid them.

Downside of sync reclaiming is that now references into regions may be
invalidated not only across deferring points but at any allocation
site. compaction_lock can be used to pin data, preferably just
temporarily.
2015-08-31 21:50:18 +02:00
Tomasz Grabiec
6105c05dbe lsa: Introduce compaction_lock helper 2015-08-31 21:50:17 +02:00
Tomasz Grabiec
42dce17c82 lsa: Fix documentation for eviction functions 2015-08-31 21:50:17 +02:00
Tomasz Grabiec
110a55886c lsa: Introduce region::compaction_counter() 2015-08-31 13:58:42 +02:00
Tomasz Grabiec
9ad3dbe592 lsa: Add region::compaction_enabled() 2015-08-31 13:58:42 +02:00
Tomasz Grabiec
048387782a lsa: Rename region::set_compactible() to set_compaction_enabled()
To avoid confusion with region_impl::is_compactible() when the getter
is added.
2015-08-31 13:58:42 +02:00
Avi Kivity
9ed2bbb25c lsa: introduce region_group
A region_group is a nestable group of regions, for cumulative statistics
purposes.
2015-08-19 19:36:40 +03:00
Avi Kivity
71aad57ca8 lsa: make region::impl a top-level class
Makes using forward declarations possible.
2015-08-19 14:43:17 +03:00
Tomasz Grabiec
ef549ae5a5 lsa: Reclaim space from evictable regions incrementally
When LSA reclaimer cannot reclaim more space by compaction, it
will reclaim data by evicting from evictable regions.

Currently the only evictable region is the one owned by the row cache.
2015-08-08 09:59:24 +02:00
Tomasz Grabiec
6ae0747fe5 lsa: Use size_t for sizes 2015-08-06 18:40:06 +02:00
Tomasz Grabiec
df6f0c35df utils: lsa: Add reclaimer hook which compacts regions 2015-08-06 14:05:15 +02:00
Tomasz Grabiec
5a9e296803 utils: lsa: Introduce log-structured allocator 2015-08-06 14:05:15 +02:00