Right now the controller adjusts its shares based on how big the backlog
is in comparison to shard memory. We have seen in some tests that if the
dataset becomes too big, this may cause compactions to dominate.
While we may change the input altogether in future versions, I'd like to
propose a quick change for the time being: move the high point from 10x
memory size to 30x memory size. This will cause compactions to increase
in shares more slowly.
While this is as magic as the 10 before, they will allow us to err in
the side of caution, with compactions not becoming aggressive enough to
overly disrupt workloads.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There are situations in which we want the controllers to stop working
altogether. Usually that's when we have an unimplemented controller or
some exception.
We want to return fixed shares in this case, but this is a very
different situation from when we want fixed shares for *one* backlog
tracker: we want to return fixed shares, yes, but if we disable 200
backlog trackers (because they all failed, for instance), we don't want
that fixed number x 200 to be our backlog.
So the mechanism to globally disable the controller is still granted,
and infinity is a good way to represent that. It's a float that the
controller can easily test against. But actually using infinity in the
code is confusing. People reading it may interpret it as the other way
around from what it means, just meaning "a very large backlog".
Let's turn that into a constant instead. It will help us convey meaning.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There was recently an attempt to add minimum shares to major compactions
which ended up being harder than it should be due to all the plumbing
necessary to call the compaction controller from inside the compaction
manager-- since it is currently a database object. We had this problem
again when trying to return fixed shares in case of an exception.
Taking a step back, all of those problems stem from the fact that the
compaction controller really shouldn't be a part of the database: as it
deals with compactions and its consequences it is a lot more natural to
have it inside the compaction manager to begin with.
Once we do that, all the aforementioned problems go away. So let's move
there where it belongs.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There are some situations in which we want to force a specific amount of
shares and don't have a backlog. We can provide a function to get that
from the controller.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Commit 1671d9c433 (not on any release branch)
accidentally bumped the idle memtable flush cpu shares to 100 (representing
10%), causing flushes to be too when they don't comsume too much cpu.
Fixes#3243.
Message-Id: <20180408104601.9607-1-avi@scylladb.com>
We have had so far an I/O controller, for compactions and memtables, and
a CPU controller, for memtables only -- since the scheduling was still
quota-based.
Now that the CPU scheduler is fully functional, it is time to do away
with the differences and integrate them both into one. We now have a
memtable controller and a compaction controller, and they control both
CPU and I/O.
In the future, we may want to control processes that don't do one of
them, like cache updates. If that ever happens, we'll try to make
controlling one of them optional. But for now, since the I/O and CPU
controllers for our main two processes would look exactly the same we
should integrate them.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
We have merged the I/O controller without this, but we want to integrate
the CPU and I/O controllers into one. Currently, the quota can be
statically set for the CPU controller. For now, until we gain more
experience with it we should allow a static value to override the
controller's output as well.
That is particularly important since we don't yet control some
strategies like LCS and the time-based ones. Users in the field may be
using one of those strategies with a static value for background quota.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Right now CPU and I/O controllers have slightly different control points
for no good reason. Let's use the CPU controller ones as the standard, as
we have been using it in the field for longer and trust it more.
The end goal is to fully integrate them.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
It no longer makes sense now that we have the full scheduler +
controllers. In its lieu, we will provide an option to statically set
the controller's shares as a safe guard against us getting this wrong.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Set up scheduling groups for streaming, compaction, memtable flush, query,
and commitlog.
The background writer scheduling group is retired; it is split into
the memtable flush and compaction groups.
Comments from Glauber:
This patch is based in a patch from Avi with the same subject, but the
differences are signficant enough so that I reset authorship. In
particular:
1) A bug/regression is fixed with the boundary calculations for the
memtable controller sampling function.
2) A leftover is removed, where after flushing a memtable we would
go back to the main group before going to the cache group again
3) As per Tomek's suggestion, now the submission of compactions
themselves are run in the compaction scheduling group. Having that
working is what changes this patch the most: we now store the
scheduling group in the compaction manager and let the compaction
manager itself enforce the scheduling group.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
thread_scheduling_groups are converted to plain scheduling_group. Due to
differences in initialization (scheduling_group initializtion defers), we
create the scheduling_groups in main.cc and propagate them to users via
a new class database_config.
The sstable writer loses its thread_scheduling_group parameter and instead
inherits scheduling from its caller.
Since shares are in the 1-1000 range vs. 0-1 for thread scheduling quotas,
the flush controller was adjusted to return values within the higher ranges.
The algorithm and principle of operation is the same as the CPU
controller. It is, however, always enabled and we will operate on
I/O shares.
I/O-bound workloads are expected to hit the maximum once virtual
dirty fills up and stay there while the load is steady.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Compactions can be a heavy disk user and the I/O scheduler can always
guarantee that it uses its fair share of disk.
Such fair share can, however, be a lot more than what compaction indeed
need. This patch draws on the controllers infrastructure to adjust the
I/O shares that the compaction class will get so that compaction
bandwidth is dynamically adjusted.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
The control algorithm we are using for memtables have proven itself
quite successful. We will very likely use the same for other processes,
like compactions.
Make the code a bit more generic, so that a new controller has to only
set the desired parameters
Signed-off-by: Glauber Costa <glauber@scylladb.com>