Commit Graph

14 Commits

Author SHA1 Message Date
Takuya ASADA
6eb9344cb3 dist: introduce scylla-tune-sched.service to tune kernel scheduler
On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to
tune the scheduler for lower latency.
This is mostly to prevent softirq threads processing tcp and reactor threads
from injecting latency into each other.
However, these parameters are moved to debugfs from linux-5.13+, so we lost
scheduler tuneing on recent kernels.

To support tuning recent kernel, let's add a new service which support
to configure both sysctl and debugfs.
The service named scylla-tune-sched.service
The service will unconditionally enables when installed, on older kernel
it will tune via sysctl, on recent kernel it will tune via debugfs.

Fixes #16077

Closes scylladb/scylladb#16122
2023-12-04 19:29:46 +02:00
Avi Kivity
2de168e568 dist: sysctl: increase vm.vfs_cache_pressure
Our usage of inodes is dual:

 - the Index.db and Data.db components are pinned in memory as
   the files are open
 - all other components are read once and never looked at again

As such, tune the kernel to prefer evicting dcache/inodes to
memory pages. The default is 100, so the value of 2000 increases
it by a factor of 20.

Ref https://github.com/scylladb/scylladb/issues/14506

Closes #14509
2023-07-10 21:24:57 +03:00
Takuya ASADA
06c28585f9 dist: raise fs.file-max and fs.nr_open to enough size for scylla
Currently, we configure LimitNOFILE on scylla-server.service, but we
don't configure fs.nr_open and fs.file-max.
When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail
to allocate FDs.
To fix this issue, raise fs.file-max and fs.nr_open to enogh size for
scylla.

Fixes #9461

Closes #9461
2021-10-12 12:47:35 +03:00
Avi Kivity
2cfc517874 main, test: adjust number of networking iocbs
Seastar's default limit of 10,000 iocbs per shard is too low for
some workload (it places an upper bound on the number of idle
connections, above which a crash occurs). Use the new Seastar
feature to raise the default to 50000.

Also multiply the global reservation by 5, and round it upwards
so the number is less weird. This prevents io_setup() from failing.

For tests, the reservation is reduced since they don't create large
numbers of connections. This reduces surprise test failures when they
are run on machines that haven't been adjusted.

Fixes #9051

Closes #9052
2021-07-18 14:38:44 +03:00
Yaron Kaikov
dd453ffe6a install.sh: Setup aio-max-nr upon installation
This is a follow up change to #8512.

Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla

As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.

Closes #8650
2021-05-24 14:24:20 +03:00
Takuya ASADA
d0297c599a dist: tune fs.aio-max-nr based on the number of cpus
Current aio-max-nr is set up statically to 1048576 in
/etc/sysctl.d/99-scylla-aio.conf.
This is sufficient for most use cases, but falls short on larger machines
such as i3en.24xlarge on AWS that has 96 vCPUs.

We need to tune the parameter based on the number of cpus, instead of
static setting.

Fixes #8133

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #8188
2021-03-01 14:18:24 +02:00
Avi Kivity
390e07d591 dist: sysctl: configure more inotify instances
Since f3bcd4d205 ("Merge 'Support SSL Certificate Hot
Reloading' from Calle"), we reload certificates as they are
modified on disk. This uses inotify, which is limited by a
sysctl fs.inotify.max_user_instances, with a default of 128.

This is enough for 64 shards only, if both rpc and cql are
encrypted; above that startup fails.

Increase to 1200, which is enough for 6 instances * 200 shards.

Fixes #7700.

Closes #7701
2020-11-26 23:44:48 +02:00
Avi Kivity
9c63cd8da5 sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417)
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.

This setting has been used in production in Scylla Cloud for a while
with good results.

Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
2019-12-08 13:04:25 +02:00
Takuya ASADA
950dbdb466 dist/common/sysctl.d: add new conf file to set fs.aio-max-nr
We need raise fs.aio-max-nr to larger value since Seastar may allocates
more then 65535 AIO events (= kernel default value)

Fixes #3842

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20181023030449.15445-1-syuu@scylladb.com>
2018-10-23 11:01:07 +03:00
Glauber Costa
14b9aa2285 reduce kernel scheduler wakeup granularity
We set the scheduler wakeup granularity to 500usec, because that is the
difference in runtime we want to see from a waking task before it
preempts the running task (which will usually be Scylla). Scheduling
other processes less often is usually good for Scylla, but in this case,
one of the "other processes" is also a Scylla thread, the one we have
been using for marking ticks after we have abandoned signals.

However, there is an artifact from the Linux scheduler that causes those
preemption to be missed if the wakeup granularity is exactly twice as
small as the sched_latency. Our sched_latency is set to 1ms, which
represents the maximum time period in which we will run all runnable
tasks.

We want to keep the sched_latency at 1ms, so we will reduce the wakeup
granularity so to something slightly lower than 500usec, to make sure
that such artifact won't affect the scheduler calculations. 499.99usec
will do - according to my tests, but we will reduce it to a round
number.

Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20170427135039.8350-1-glauber@scylladb.com>
2017-04-27 18:11:35 +03:00
Benoît Canet
4def1f4524 dist: sysctl.d: Disable automatic numa balancing
On NUMA hardware, autonuma may reduce performance by
unmapping memory.

Since we do manual NUMA placement, autonuma will not
help anything.

We ought to disable it by setting the kernel.numa_balancing
sysctl to 0.

Fixes: #1120

Signed-of-by: Benoît Canet <benoit@scylladb.com>
Message-Id: <1466006345-9972-1-git-send-email-benoit@scylladb.com>
2016-06-15 19:11:00 +03:00
Avi Kivity
e515933c70 dist: tune scheduler for lower latency
Scylla-jmx and collectd can preempt scylla and induce long latencies.  Tune
the scheduler to provide lower latencies.

Since when the support processes are not running we normally do not context
switch (one thread per core, remember?), there should be no effect on
throughput.

The tunings are provided in a separate package, which can be uninstalled
if the server is shared with other applications which are negatively
affected by the tuning.

Fixes #1218.
Message-Id: <1464529625-12825-1-git-send-email-avi@scylladb.com>
2016-05-30 08:42:19 +03:00
Takuya ASADA
8886fe7393 dist: use systemd-coredump on Fedora/CentOS, create symlink /var/lib/scylla/coredump -> /var/lib/systemd/coredump when we mounted RAID
Use systemd-coredump for coredump if distribution is CentOS/RHEL/Fedora, and make symlink from RAID to /var/lib/systemd/coredump if RAID is mounted.
2016-01-11 14:20:50 +00:00
Takuya ASADA
9b4d0592fa dist: enable coredump, save it to /var/lib/scylla/coredump
Enables coredump, save it to /var/lib/scylla/coredump

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2015-12-17 18:20:27 +09:00