Commit Graph

75 Commits

Author SHA1 Message Date
Tomasz Grabiec
1e50f85288 database: Make soft-pressure memtable flusher not consider already flushed memtables
The flusher picks the memtable list which contains the largest region
according to region_impl::evictable_occupancy().total_space(), which
follows region::occupancy().total_space(). But only the latest
memtable in the list can start flushing. It can happen that the
memtable corresponding to the largest region was already flushed to an
sstable (flush permit released), but not yet fsynced or moved to
cache, so it's still in the memtable list.

The latest memtable in the winning list may be small, or empty, in
which case the soft pressure flusher will not be able to make much
progress. There could be other memtable lists with non-empty
(flushable) latest memtables. This can lead to writes unnecessarily
blocking on dirty.

I observed this for the system memtable group, where it's easy for the
memtables to overshoot small soft pressure limits. The flusher kept
trying to flush empty memtables, while the previous non-empty memtable
was still in the group.

The CPU scheduler makes this worse, because it runs memtable_to_cache
in a separate scheduling group, so it further defers in time the
removal of the flushed memtable from the memtable list.

This patch fixes the problem by making regions corresponding to
memtables which started flushing report evictable_occupancy() as 0, so
that they're picked by the flusher last.

Fixes #3716.
Message-Id: <1535040132-11153-2-git-send-email-tgrabiec@scylladb.com>
2018-08-26 11:02:34 +03:00
Avi Kivity
0fc54aab98 logalloc: run releaser() in user-provided scheduling group
Let the user specify which scheduling group should run the
releaser, since it is running functions on the user's behalf.

Perhaps a cleaner interface is to require the user to call
a long-running function for the releaser, and so we'd just
inherit its scheduling group, but that's a much bigger change.
2018-07-31 11:57:58 +03:00
Gleb Natapov
b38ced0fcd Configure logalloc memory size during initialization 2018-06-11 15:34:14 +03:00
Tomasz Grabiec
4fdd61f1b0 lsa: Expose counters for allocation and compaction throughput
Allow observing amplification induced by segment compaction.
2018-05-15 21:49:01 +02:00
Avi Kivity
2baa16b371 logalloc: introduce prime_segment_pool()
To segregate std and lsa allocations, we prime the segment pool
during initialization so that lsa will release lower-addressed
memory to std, rather than lsa and std competing for memory at
random addresses.

However, tests often evict all of lsa memory for their own
purposes, which defeats this priming.

Extract the functionality into a new prime_segment_pool()
function for use in tests that rely on allocation segregation.
2018-04-07 14:52:58 +03:00
Avi Kivity
54db0f3d30 logalloc: reduce segment size to 128k
Reducing the segment size reduces the time needed to compact segments,
and increases the number of segments that can be compacted (and so
the probability of finding low-occupancy segments).

128k is the size of I/O buffers and of thread stacks, so we can't
go lower than that without more significant changes.
2018-04-07 14:52:58 +03:00
Avi Kivity
c9aa9f0d86 Revert "logalloc: capture current scheduling group for deferring function"
This reverts commit 3b53f922a3. It's broken
in two ways:

 1. concrete_allocating_function::allocate()'ss caller,
    region_group::start_releaser() loop, will delete the object
    as soon as it returns; however we scheduled some work depending
    on `this` in a separate continuation (via with_scheduling_group())
 2. the calling loop's termination condition depends on the work being
    done immediately, not later.
2018-03-29 16:08:12 +03:00
Glauber Costa
3b53f922a3 logalloc: capture current scheduling group for deferring function
When we call run_when_memory_available, it is entirely possible that
the caller is doing that inside a scheduling_group. If we don't defer
we will execute correctly. But if we do defer, the current code will
execute - in the future - with the default scheduling group.

This patch fixes that by capturing the caller scheduling group and
making sure the function is executed later using it.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-03-20 16:58:35 -04:00
Paweł Dziepak
dcd79af8ed lsa: optimise disabling reclamation and invalidation counter
Most of the lsa gory details are hidden in utils/logalloc.cc. That
includes the actual implementation of a lsa region: region_impl.

However, there is code in the hot path that often accesses the
_reclaiming_enabled member as well as its base class
allocation_strategy.

In order to optimise those accesses another class is introduced:
basic_region_impl that inherits from allocation_strategy and is a base
of region_impl. It is defined in utils/logalloc.hh so that it is
publicly visible and its member functions are inlineable from anywhere
in the code. This class is supposed to be as small as possible, but
contain all members and functions that are accessed from the fast path
and should be inlined.
2018-01-30 18:33:26 +01:00
Paweł Dziepak
d825ae37bf lsa: split alloc section into reserving and reclamation-disabled parts
Allocating sections reserves certain amount of memory, then disables
reclamation and attempts to perform given operation. If that fails due
to std::bad_alloc the reserve is increased and the operation is retried.

Reserving memory is expensive while just disabling reclamation isn't.
Moreover, the code that runs inside the section needs to be safely
retryable. This means that we want the amount of logic running with
reclamation disabled as small as possible, even if it means entering and
leaving the section multiple times.

In order to reduce the performance penalty of such solution the memory
reserving and reclamation disabling parts of the allocating sections are
separated.
2018-01-30 18:33:26 +01:00
Tomasz Grabiec
5c85e9c2db lsa: Expose max_zone_segments for tests 2018-01-16 13:17:20 +01:00
Tomasz Grabiec
99708cc498 lsa: Expose tracker::non_lsa_used_space()
So that it can be used in unit tests.
2018-01-16 13:17:20 +01:00
Glauber Costa
80c4a211d8 consolidate timeout_clock
At the moment, various different subsystems use their different
ideas of what a timeout_clock is. This makes it a bit harder to pass
timeouts between them because although most are actually a lowres_clock,
that is not guaranteed to be the case. As a matter of fact, the timeout
for restricted reads is expressed as nanoseconds, which is not a valid
duration in the lowres_clock.

As a first step towards fixing this, we'll consolidate all of the
existing timeout_clocks in one, now called db::timeout_clock. Other
things that tend to be expressed in terms of that clock--like the fact
that the maximum time_point means no timeout and a semaphore that
wait()s with that resolution are also moved to the common header.

In the upcoming patch we will fix the restricted reader timeouts to
be expressed in terms of the new timeout_clock.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2018-01-11 12:07:41 -05:00
Tomasz Grabiec
8d69d217af lsa: Guarantee invalidated references on allocating section retry
There is existing code (e.g. use of partition_snapshot_row_cursor in
cache_streamed_mutation) which assumes that references will be
invalidated when bad_alloc is thrown from allocating_section. That is
currently the case because on retry we will attempt memory reclamation
which will invalidate references either through compaction or
eviction. Make this guarantee explicit.
2017-11-13 20:55:13 +01:00
Tomasz Grabiec
87be474c19 lsa: Move reclaim counter concept to allocation_strategy level
So that generic code can detect invalidation of references. Also, to
allow reusing the same mechanism for signalling external reference
invalidation.
2017-09-13 17:38:08 +02:00
Tomasz Grabiec
5d2f2bc90b lsa: Mark region::merge() as noexcept
It seems to satisfy this, and row_cache::do_update() will rely on it to simplify
error handling.

Message-Id: <1504023113-30374-1-git-send-email-tgrabiec@scylladb.com>
2017-08-29 19:17:17 +03:00
Avi Kivity
ebaeefa02b Merge seatar upstream (seastar namespace)
- introcduced "seastarx.hh" header, which does a "using namespace seastar";
 - 'net' namespace conflicts with seastar::net, renamed to 'netw'.
 - 'transport' namespace conflicts with seastar::transport, renamed to
   cql_transport.
 - "logger" global variables now conflict with logger global type, renamed
   to xlogger.
 - other minor changes
2017-05-21 12:26:15 +03:00
Tomasz Grabiec
7aa286439f lsa: Add getter for region's eviction function 2017-04-20 14:51:42 +02:00
Avi Kivity
844529fe33 logalloc: avoid auto in function argument declaration
'auto' in a non-lambda function argument is not legal C++, and is hard
to read besides.  Replace with the right type.

Since the right type is private, add some friendship.
2017-04-17 23:18:44 +03:00
Tomasz Grabiec
4ab8b255da lsa: Allow adjusting reserves in allocating_section 2017-03-16 10:21:10 +01:00
Avi Kivity
7a00dd6985 Merge "Avoid avalanche of tasks after memtable flush" from Tomasz
"Before, the logic for releasing writes blocked on dirty worked like this:

  1) When region group size changes and it is not under pressure and there
     are some requests blocked, then schedule request releasing task

  2) request releasing task, if no pressure, runs one request and if there are
     still blocked requests, schedules next request releasing task

If requests don't change the size of the region group, then either some request
executes or there is a request releasing task scheduled. The amount of scheduled
tasks is at most 1, there is a single releasing thread.

However, if requests themselves would change the size of the group, then each
such change would schedule yet another request releasing thread, growing the task
queue size by one.

The group size can also change when memory is reclaimed from the groups (e.g.
when contains sparse segments). Compaction may start many request releasing
threads due to group size updates.

Such behavior is detrimental for performance and stability if there are a lot
of blocked requests. This can happen on 1.5 even with modest concurrency
because timed out requests stay in the queue. This is less likely on 1.6 where
they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in the
system. When the amount of scheduled tasks reaches 1000, polling stops and
server becomes unresponsive until all of the released requests are done, which
is either when they start to block on dirty memory again or run out of blocked
requests. It may take a while to reach pressure condition after memtable flush
if it brings virtual dirty much below the threshold, which is currently the
case for workloads with overwrites producing sparse regions.

I saw this happening in a write workload from issue #2021 where the number of
request releasing threads grew into thousands.

Fix by ensuring there is at most one request releasing thread at a time. There
will be one releasing fiber per region group which is woken up when pressure is
lifted. It executes blocked requests until pressure occurs."

* tag 'tgrabiec/lsa-single-threaded-releasing-v2' of github.com:cloudius-systems/seastar-dev:
  tests: lsa: Add test for reclaimer starting and stopping
  tests: lsa: Add request releasing stress test
  lsa: Avoid avalanche releasing of requests
  lsa: Move definitions to .cc
  lsa: Simplify hard pressure notification management
  lsa: Do not start or stop reclaiming on hard pressure
  tests: lsa: Adjust to take into account that reclaimers are run synchronously
  lsa: Document and annotate reclaimer notification callbacks
  tests: lsa: Use with_timeout() in quiesce()
2017-02-02 17:49:31 +02:00
Tomasz Grabiec
e40fb438f5 lsa: Avoid avalanche releasing of requests
Before, the logic for releasing writes blocked on dirty worked like
this:

  1) When region group size changes and it is not under pressure and
     there are some requests blocked, then schedule request releasing
     task

  2) request releasing task, if no pressure, runs one request and if
     there are still blocked requests, schedules next request
     releasing task

If requests don't change the size of the region group, then either
some request executes or there is a request releasing task
scheduled. The amount of scheduled tasks is at most 1, there is a
single thread of excution.

However, if requests themselves would change the size of the group,
then each such change would schedule yet another request releasing
thread, growing the task queue size by one.

The group size can also change when memory is reclaimed from the
groups (e.g.  when contains sparse segments). Compaction may start
many request releasing threads due to group size updates.

Such behavior is detrimental for performance and stability if there
are a lot of blocked requests. This can happen on 1.5 even with modest
concurrency becuase timed out requests stay in the queue. This is less
likely on 1.6 where they are dropped from the queue.

The releasing of tasks may start to dominate over other processes in
the system. When the amount of scheduled tasks reaches 1000, polling
stops and server becomes unresponsive until all of the released
requests are done, which is either when they start to block on dirty
memory again or run out of blocked requests. It may take a while to
reach pressure condition after memtable flush if it brings virtual
dirty much below the threshold, which is currently the case for
workloads with overwrites producing sparse regions.

Refs #2021.

Fix by ensuring there is at most one request releasing thread at a
time. There will be one releasing fiber per region group which is
woken up when pressure is lifted. It executes blocked requests until
pressure occurs.

The logic for notification across hierachy was replaced by calling
region_group::notify_relief() from region_group::update() on the
broadest relieved group.
2017-02-01 17:41:55 +01:00
Tomasz Grabiec
d55baa0cd1 lsa: Move definitions to .cc 2017-02-01 17:41:55 +01:00
Tomasz Grabiec
8f8b111b33 lsa: Simplify hard pressure notification management
The hard pressure was only signalled on region group when
run_when_memory_available() was called after the pressure condition
was met.

So the following loop is always an infinite loop rather than stopping
when engouh is allocated to cause pressure:

   while (!gr.under_pressure()) {
       region.allocate(...);
   }

It's cleaner if pressure notification works not only if
run_when_memory_available() is used but whenever conditino changes,
like we do for the soft pressure.

There is comment in run_when_memory_available() which gives reasons
why notifications are called from there, but I think those reasons no
longer hold:

 - we already notify on soft pressure conditions from update(), and if
   that is safe, notifying about hard pressure should also be safe. I
   checked and it looks safe to me.

 - avoiding notification in the rare case when we stopped writing
   right after crossing the threshold doesn't seem benefitial. It's
   unlikely in the first place, and one could argue it's better to
   actually flush now so that when writes resume they will not block.
2017-02-01 17:41:55 +01:00
Tomasz Grabiec
9aa1be5d08 lsa: Do not start or stop reclaiming on hard pressure
We already call these when crossing the soft threshold. We shouldn't
stop reclaiming when hard pressure is gone because soft pressure may
still be present. Calling start_reclaiming() on hard pressure is
unnecessary because soft pressure also starts it, and when there is
hard pressure there is also soft pressure.
2017-02-01 17:40:15 +01:00
Amnon Heiman
45b6070832 Merge seastar upstream
* seastar 397685c...c1dbd89 (13):
  > lowres_clock: drop cache-line alignment for _timer
  > net/packet: add missing include
  > Merge "Adding histogram and description support" from Amnon
  > reactor: Fix the error: cannot bind 'std::unique_ptr' lvalue to 'std::unique_ptr&&'
  > Set the option '--server' of tests/tcp_sctp_client to be required
  > core/memory: Remove superfluous assignment
  > core/memory: Remove dead code
  > core/reactor: Use logger instead of cerr
  > fix inverted logic in overprovision parameter
  > rpc: fix timeout checking condition
  > rpc: use lowres_clock instead of high resolution one
  > semaphore: make semaphore's clock configurable
  > rpc: detect timedout outgoing packets earlier

Includes treewide change to accomodate rpc changing its timeout clock
to lowres_clock.

Includes fixup from Amnon:

collectd api should use the metrics getters

As part of a preperation of the change in the metrics layer, this change
the way the collectd api uses the metrics value to use the getters
instead of calling the member directly.

This will be important when the internal implementation will changed
from union to variant.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1485457657-17634-1-git-send-email-amnon@scylladb.com>
2017-02-01 14:39:08 +02:00
Tomasz Grabiec
ed9ff19467 lsa: Document and annotate reclaimer notification callbacks
They are called from region_group::update(), so must be alloc-free and
noexcept.
2017-01-30 19:18:07 +01:00
Amnon Heiman
e19fa02a17 remove scollectd from headers
As the metrics migration progressed, some include to scollectd.hh left
behind.

Because of the nature of the scollecd implementation those include
brings alot of code with them to the header files and eventually to many
source file.

This patch remove those include and add a missing include to
storage_proxy.cc.

The reason the compiler didn't complain is an indication to the
problematic nature of those include in the first place.

Before this patch, change in metrics.hh would cause 169 files to
compile, after this change 17.

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Message-Id: <1484667536-2185-1-git-send-email-amnon@scylladb.com>
2017-01-17 17:39:47 +02:00
Tomasz Grabiec
e14caaef60 utils/logalloc: Add ability to timeout run_when_memory_available() task 2016-11-29 16:40:58 +01:00
Avi Kivity
176fca5775 logalloc: use correct header for unique_ptr
<bits/unique_ptr.hh> is a libstdc++ internal header.  USe <memory> instead.
2016-11-27 23:08:04 +02:00
Glauber Costa
f86c9e36f4 logalloc: allow region group reclaimer to specify a soft limit
The region_group_reclaimer will let us know every time we are over the
limit we have specified for memory usage.

However, For some applications, we would be interested in knowing about
memory build up earlier, so we can start doing something about it before
we reach that condition.

This patch introduce soft limit notifications for the
region_group_reclaimer. After this patch is applied, start_reclaim() is
called earlier, and stop_reclaim() later, after the soft condition is
abated.

There are methods that allow one to easily test if the pressure
condition is a soft limit condition or a hard, threshold condition and
act accordingly. Whether to act on both conditions or just one of them
is up to the application.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-11-16 21:20:23 -05:00
Paweł Dziepak
b8d737ff0a tests/row_cache_test: verify that eviction follows lru
Refs #1847.

Signed-off-by: Paweł Dziepak <pdziepak@scylladb.com>
Message-Id: <1479231555-28191-1-git-send-email-pdziepak@scylladb.com>
2016-11-15 18:57:54 +01:00
Tomasz Grabiec
6548132423 lsa: Make logalloc::tracker::full_compaction() compact all reclaimable regions
is_compactible() will pass on very small regions. full_compaction() is
only used in tests to force objects to be moved due to compaction, so
we want all reclaimable regions to be compacted.
2016-10-18 11:16:08 +02:00
Tomasz Grabiec
4357d0a6d9 db: Add counter for writes blocked on dirty memory
There is already queue_length-requests_blocked_memory, but it's a
gauge so does not reflect what happened between the sampling points.

total_operations-requests_blocked_memory will allow to see if there
were any (and how many) requests which were blocked by dirty memory.

Message-Id: <1476098616-12682-1-git-send-email-tgrabiec@scylladb.com>
2016-10-10 14:25:22 +03:00
Glauber Costa
86aa0b830d LSA: allow a group to query its own region group
Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-10-04 10:39:10 -04:00
Glauber Costa
f5fd6bd714 LSA: export information about size of the throttle queue
Also add information about for how long has the oldest been sitting in the
queue. This is part of the backpressure work to allow us to throttle incoming
requests if we won't have memory to process them. Shortages can happen in all
sorts of places, and it is useful when designing and testing the solutions to
know where they are, and how bad they are.

This counter is named for consistency after similar counters from transport/.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-09-27 12:09:08 -04:00
Glauber Costa
fe6a0d97d1 logalloc: make sure allocations in release_requests don't recurse back into the allocator
Calls like later() and with_gate() may allocate memory, although that is not
very common. This can create a problem in the sense that it will potentially
recurse and bring us back to the allocator during free - which is the very thing
we are trying to avoid with the call to later().

This patch wraps the relevant calls in the reclaimer lock. This do mean that the
allocation may fail if we are under severe pressure - which includes having
exhausted all reserved space - but at least we won't recurse back to the
allocator.

To make sure we do this as early as possible, we just fold both release_requests
and do_release_requests into a single function

Thanks Tomek for the suggestion.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <980245ccc17960cf4fcbbfedb29d1878a98d85d8.1470254846.git.glauber@scylladb.com>
2016-08-04 11:16:53 +02:00
Glauber Costa
ad58691afb logalloc: make sure blocked requests memory allocations are served from the standar allocator
Issue 1510 describes a scenario in which, under load, we allocate memory within
release_requests() leading to a reentry into an invalid state in our
blocked requests' shared_promise.

This is not easy to trigger since not all allocations will actually get to the
point in which they need a new segment, let alone have that happening during
another allocator call.

Having those kinds of reentry is something we have always sought to avoid with
release_requests(): this is the reason why most of the actual routine is
deferred after a call to later().

However, that is a trick we cannot use for updating the state of the blocked
requests' shared_promise: we can't guarantee when is that going to run, and we
always need a valid shared_promise, in a valid state, waiting for new requests
to hook into.

The solution employed by this patch is to make sure that no allocation
operations whatsoever happen during the initial part of release_requests on
behalf of the shared promise.  Allocation is now deferred to first use, which
relieves release_requests() from all allocation duties. All it needs to do is
free the old object and signal to the its user that an allocation is needed (by
storing {} into the shared_promise).

Fixes #1510

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <49771e51426f972ddbd4f3eeea3cdeef9cc3b3c6.1470238168.git.glauber@scylladb.com>
2016-08-03 20:40:30 +02:00
Tomasz Grabiec
9476bc5a31 Introduce --abort-on-lsa-bad-alloc command line option
Useful for triggerring core dump on allocation failure inside LSA,
which makes it easier to debug allocation failures. They normally
don't cause aborts, just fail the current operation, which makes it
hard to figure out what was the cause of allocation failure.

Message-Id: <1470233631-18508-1-git-send-email-tgrabiec@scylladb.com>
2016-08-03 17:26:44 +03:00
Tomasz Grabiec
e783b58e3b Merge branch 'glommer/LSA-throttler-v6' from git@github.com:glommer/scylla.gi
From Glauber:

This is my new take at the "Move throttler to the LSA" series, except
this one don't actually move anything anywhere: I am leaving all
memtable conversion out, and instead I am sending just the LSA bits +
LSA active reclaim. This should help us see where we are going, and
then we can discuss all memtable changes in a series on its own,
logically separated (and hopefully already integrated with virtual
dirty).

[tgrabiec: trivial merge conflicts in logalloc.cc]
2016-06-21 10:22:26 +02:00
Glauber Costa
579d121db8 LSA: export largest region
We now keep the regions sorted by size, and the children region groups as well.
Internally, the LSA has all information it needs to make size-based reclaim
decisions. However, we don't do reclaim internally, but rather warn our user
that a pressure situation is mounted.

The user of a region_group doesn't need to evict the largest region in case of
pressure and is free to do whatever it chooses - including nothing. But more
likely than not, taking into account which region is the largest makes sense.

This patch puts together this last missing piece of the puzzle, and exports the
information we have internally to the user.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:51:00 -04:00
Glauber Costa
38a402307d LSA: enhance region_group reclaimer
We are currently just allowing the region_group to specify a throttle_threshold,
that triggers throttling when a certain amount of memory is reached. We would
like to notify the callers that such condition is reached, so that the callers
can do something to alleviate it - like triggering flushes of their structures.

The approach we are taking here is to pass a reclaimer instance. Any user of a
region_group can specialize its methods start_reclaiming and stop_reclaiming
that will be called when the region_group becomes under pressure or ceases to
be, respectively.

Now that we have such facility, it makes more sense to move the
throttle_threshold here than having it separately.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:59 -04:00
Glauber Costa
6404028c6a LSA: move subgroups to a heap as well
When we decide to evict from a specific region_group due to excessive memory
usage, we must also consider looking at each of their children (subgroups). It
could very well be that most of memory is used by one of the subgroups, and
we'll have to evict from there.

We also want to make sure we are evicting from the biggest region of all, and
not the biggest region in the biggest region_group. To understand why this is
important, consider the case in which the regions are memtables associated with
dirty region groups. It could be that a very big memtable was recently flushed,
and a fairly small one took its place. That region group is still quite large
because the memtable hasn't finished flushing yet, but that doesn't mean we
should evict from it.

To allow us to efficiently pick which region is the largest, each root of each
subtree will keep track of its maximal score, defined as the maximum between our
largest region total_space and the maximum maximal score of subtrees.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:13 -04:00
Glauber Costa
e1eab5c845 LSA: store regions in a heap for regions_group
Currently, the regions in a region group are organized in a simple vector.
We can do better by using a binomial heap, as we do for segments, and then
updating when there is change. Internally to the LSA, we are in good position
to always know when change happens, so that's really the best way to do it.

The end game here, is to easily call for the reclaim of the largest offending
region (potentially asynchronously). Because of that, we aren't really interested
in the region occupancy, but in the region reclaimable occuppancy instead: that's
simply equal to the occupancy if the region is reclaimable, and 0 otherwise. Doing
that effectively lists all non reclaimable regions in the end of the heap, in no
particular order.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:50:13 -04:00
Glauber Costa
54d4d46cf7 LSA: move throttling code to LSA.
The database code uses a throttling function to make sure that memory
used for the dirty region never is over the limit. We track that with
a region group, so it makes sense to move this as generic functionality
into LSA.

This patch implements the LSA-side functionality and a later patch will
convert the current memtable throttler to use it.

Unlike the current throttling mechanism, we'll not use a timer-based
mechanism here. Aside from being more generic and friendlier towards
other users, this is a good change for current memtable by itself.

The constants - 10ms and 1MB chosen by the current throttler are arbitrary, and we
would be better off without them. Let's discuss the merits of each separately:

1) 10ms timer: If we are throttling, we expect somebody to flush the memtables
for memory to be released. Since we are in position to know exactly when a memtable
was written, thus releasing memory, we can just call unthrottle at that point, instead
of using a timer.

2) 1MB release threshold: we do that because we have no idea how much memory a request
will use, so we put the cut somehow. However, because of 1) we don't call unthrottle
through a timer anymore, and do it directly instead. This means that we can just execute
the request and see how much memory it has used, with no need to guess. So we'll call
unthrottle at the end of every request that was previously throttled.

Writing the code this way also has the advantage that we need one less continuation in
the common case of the database not being throttled.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-20 18:34:19 -04:00
Glauber Costa
01a658f51d LSA: helper function for region_group
current hierarchy walk converted, but more users will come.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Glauber Costa
741aa16748 LSA: allow a region_group to have a threshold for throttling specified
Allocations will still be allowed if made directly, but callers will have the
choice (in an upcoming patch) to proceed only if memory is below this threshold.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Glauber Costa
7cd0c0731e region_group: delete move constructor
Tomek correctly points out that since we are now using "this" in lambda
captures, we should make the region_group not movable. We currently define a
move constructor, but there are no users. So we should just remove them.

copy constructor is already deleted, and so are the copy and move assignment
operators. So by removing the move constructor, we should be fine.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
2016-06-15 22:26:50 -04:00
Tomasz Grabiec
86b76171a8 lsa: Use the same step in both internal and external reclamations 2016-06-14 15:13:15 +02:00
Tomasz Grabiec
d74d902a01 lsa: Make reclamation step configurable 2016-06-14 15:13:14 +02:00