scylladb

Author	SHA1	Message	Date
Takuya ASADA	6eb9344cb3	dist: introduce scylla-tune-sched.service to tune kernel scheduler On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to tune the scheduler for lower latency. This is mostly to prevent softirq threads processing tcp and reactor threads from injecting latency into each other. However, these parameters are moved to debugfs from linux-5.13+, so we lost scheduler tuneing on recent kernels. To support tuning recent kernel, let's add a new service which support to configure both sysctl and debugfs. The service named scylla-tune-sched.service The service will unconditionally enables when installed, on older kernel it will tune via sysctl, on recent kernel it will tune via debugfs. Fixes #16077 Closes scylladb/scylladb#16122	2023-12-04 19:29:46 +02:00
Avi Kivity	2de168e568	dist: sysctl: increase vm.vfs_cache_pressure Our usage of inodes is dual: - the Index.db and Data.db components are pinned in memory as the files are open - all other components are read once and never looked at again As such, tune the kernel to prefer evicting dcache/inodes to memory pages. The default is 100, so the value of 2000 increases it by a factor of 20. Ref https://github.com/scylladb/scylladb/issues/14506 Closes #14509	2023-07-10 21:24:57 +03:00
Takuya ASADA	06c28585f9	dist: raise fs.file-max and fs.nr_open to enough size for scylla Currently, we configure LimitNOFILE on scylla-server.service, but we don't configure fs.nr_open and fs.file-max. When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail to allocate FDs. To fix this issue, raise fs.file-max and fs.nr_open to enogh size for scylla. Fixes #9461 Closes #9461	2021-10-12 12:47:35 +03:00
Avi Kivity	2cfc517874	main, test: adjust number of networking iocbs Seastar's default limit of 10,000 iocbs per shard is too low for some workload (it places an upper bound on the number of idle connections, above which a crash occurs). Use the new Seastar feature to raise the default to 50000. Also multiply the global reservation by 5, and round it upwards so the number is less weird. This prevents io_setup() from failing. For tests, the reservation is reduced since they don't create large numbers of connections. This reduces surprise test failures when they are run on machines that haven't been adjusted. Fixes #9051 Closes #9052	2021-07-18 14:38:44 +03:00
Yaron Kaikov	dd453ffe6a	install.sh: Setup aio-max-nr upon installation This is a follow up change to #8512. Let's add aio conf file during scylla installation process and make sure we also remove this file when uninstall Scylla As per Avi Kivity's suggestion, let's set aio value as static configuration, and make it large enough to work with 500 cpus. Closes #8650	2021-05-24 14:24:20 +03:00
Takuya ASADA	d0297c599a	dist: tune fs.aio-max-nr based on the number of cpus Current aio-max-nr is set up statically to 1048576 in /etc/sysctl.d/99-scylla-aio.conf. This is sufficient for most use cases, but falls short on larger machines such as i3en.24xlarge on AWS that has 96 vCPUs. We need to tune the parameter based on the number of cpus, instead of static setting. Fixes #8133 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes #8188	2021-03-01 14:18:24 +02:00
Avi Kivity	390e07d591	dist: sysctl: configure more inotify instances Since `f3bcd4d205` ("Merge 'Support SSL Certificate Hot Reloading' from Calle"), we reload certificates as they are modified on disk. This uses inotify, which is limited by a sysctl fs.inotify.max_user_instances, with a default of 128. This is enough for 64 shards only, if both rpc and cql are encrypted; above that startup fails. Increase to 1200, which is enough for 6 instances * 200 shards. Fixes #7700. Closes #7701	2020-11-26 23:44:48 +02:00
Avi Kivity	9c63cd8da5	sysctl: reduce kernel tendency to swap anonymous pages relative to page cache (#5417 ) The vm.swappiness sysctl controls the kernel's prefernce for swapping anonymous memory vs page cache. Since Scylla uses very large amounts of anonymous memory, and tiny amounts of page cache, the correct setting is to prefer swapping page cache. If the kernel swaps anonymous memory the reactor will stall until the page fault is satisfied. On the other hand, page cache pages usually belong to other applications, usually backup processes that read Scylla files. This setting has been used in production in Scylla Cloud for a while with good results. Users can opt out by not installing the scylla-kernel-conf package (same as with the other kernel tunables).	2019-12-08 13:04:25 +02:00
Takuya ASADA	950dbdb466	dist/common/sysctl.d: add new conf file to set fs.aio-max-nr We need raise fs.aio-max-nr to larger value since Seastar may allocates more then 65535 AIO events (= kernel default value) Fixes #3842 Signed-off-by: Takuya ASADA <syuu@scylladb.com> Message-Id: <20181023030449.15445-1-syuu@scylladb.com>	2018-10-23 11:01:07 +03:00
Glauber Costa	14b9aa2285	reduce kernel scheduler wakeup granularity We set the scheduler wakeup granularity to 500usec, because that is the difference in runtime we want to see from a waking task before it preempts the running task (which will usually be Scylla). Scheduling other processes less often is usually good for Scylla, but in this case, one of the "other processes" is also a Scylla thread, the one we have been using for marking ticks after we have abandoned signals. However, there is an artifact from the Linux scheduler that causes those preemption to be missed if the wakeup granularity is exactly twice as small as the sched_latency. Our sched_latency is set to 1ms, which represents the maximum time period in which we will run all runnable tasks. We want to keep the sched_latency at 1ms, so we will reduce the wakeup granularity so to something slightly lower than 500usec, to make sure that such artifact won't affect the scheduler calculations. 499.99usec will do - according to my tests, but we will reduce it to a round number. Signed-off-by: Glauber Costa <glauber@scylladb.com> Message-Id: <20170427135039.8350-1-glauber@scylladb.com>	2017-04-27 18:11:35 +03:00
Benoît Canet	4def1f4524	dist: sysctl.d: Disable automatic numa balancing On NUMA hardware, autonuma may reduce performance by unmapping memory. Since we do manual NUMA placement, autonuma will not help anything. We ought to disable it by setting the kernel.numa_balancing sysctl to 0. Fixes: #1120 Signed-of-by: Benoît Canet <benoit@scylladb.com> Message-Id: <1466006345-9972-1-git-send-email-benoit@scylladb.com>	2016-06-15 19:11:00 +03:00
Avi Kivity	e515933c70	dist: tune scheduler for lower latency Scylla-jmx and collectd can preempt scylla and induce long latencies. Tune the scheduler to provide lower latencies. Since when the support processes are not running we normally do not context switch (one thread per core, remember?), there should be no effect on throughput. The tunings are provided in a separate package, which can be uninstalled if the server is shared with other applications which are negatively affected by the tuning. Fixes #1218. Message-Id: <1464529625-12825-1-git-send-email-avi@scylladb.com>	2016-05-30 08:42:19 +03:00
Takuya ASADA	8886fe7393	dist: use systemd-coredump on Fedora/CentOS, create symlink /var/lib/scylla/coredump -> /var/lib/systemd/coredump when we mounted RAID Use systemd-coredump for coredump if distribution is CentOS/RHEL/Fedora, and make symlink from RAID to /var/lib/systemd/coredump if RAID is mounted.	2016-01-11 14:20:50 +00:00
Takuya ASADA	9b4d0592fa	dist: enable coredump, save it to /var/lib/scylla/coredump Enables coredump, save it to /var/lib/scylla/coredump Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2015-12-17 18:20:27 +09:00

14 Commits