Currently, we configure LimitNOFILE on scylla-server.service, but we
don't configure fs.nr_open and fs.file-max.
When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail
to allocate FDs.
To fix this issue, raise fs.file-max and fs.nr_open to enogh size for
scylla.
Fixes#9461Closes#9461
Listing /etc/systemd/system/*.mount as ghost file seems incorrect,
since user may want to keep using RAID volume / coredump directory after
uninstalling Scylla, or user may want to upgrade enterprise version.
Also, we mixed two types of files as ghost file, it should handle differently:
1. automatically generated by postinst scriptlet
2. generated by user invoked scylla_setup
The package should remove only 1, since 2 is generated by user decision.
However, just dropping .mount from %files section causes another
problem, rpm will remove these files during upgrade, instead of
uninstall (#8924).
To fix both problem, specify .mount files as "%ghost %config".
It will keep files both package upgrade and package remove.
See scylladb/scylla-enterprise#1780Closes#8810Closes#8924Closes#8959
This reverts commit a677c46672. It causes
upgrade from a version that did not have a commit to a version that
does have the commit to lose the .mount files, since they change
from being owned by the package (via %ghost) to not being owned.
Fixes#8924.
Listing /etc/systemd/system/*.mount as ghost file seems incorrect,
since user may want to keep using RAID volume / coredump directory after
uninstalling Scylla, or user may want to upgrade enterprise version.
Also, we mixed two types of files as ghost file, it should handle differently:
1. automatically generated by postinst scriptlet
2. generated by user invoked scylla_setup
The package should remove only 1, since 2 is generated by user decision.
See scylladb/scylla-enterprise#1780
Closes#8810
The Red Hat packages were missing two things, first the metapackage
wasn't dependant at all in the python3 package and second, the
scylla-server package dependencies didn't contain a version as part
of the dependency which can cause to some problems during upgrade.
Doing both of the things listed here is a bit of an overkill as either
one of them separately would solve the problem described in #XXXX
but both should be applied in order to express the correct concept.
Fixes#8829Closes#8832
This is a follow up change to #8512.
Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla
As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.
Closes#8650
Current aio-max-nr is set up statically to 1048576 in
/etc/sysctl.d/99-scylla-aio.conf.
This is sufficient for most use cases, but falls short on larger machines
such as i3en.24xlarge on AWS that has 96 vCPUs.
We need to tune the parameter based on the number of cpus, instead of
static setting.
Fixes#8133
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closes#8188
Fedora version of systemd macros does not work correctly on CentOS7,
since CentOS7 does not support "file trigger" feature.
To fix the issue we need to stop using systemd macros, call systemctl
directly.
See scylladb/scylla-jmx#94
Closes#8005
To connection-less environment, we need to add node_exporter binary
to scylla-server package, not downloading it from internet.
Related #7765Fixes#2190Closes#7796
Add the seastar-cpu-map.sh to the SBINFILES variable, which is used to
create symbolic links to scripts so that they appear in $PATH.
Please note that there are additional Python scripts (like perftune.py),
which are not in $PATH. That's because Python scripts are handled
separately in "install.sh" and no Python script has a "sbin" symlink. We
might want to change this in the future, though.
Fixes#6731Closes#7809
tuned 2.11.0-9 and later writes to kerned.sched_wakeup_granularity_ns
and other sysctl tunables that we so laboriously tuned, dropping
performance by a factor of 5 (due to increased latency). Fix by
obsoleting tuned during install (in effect, we are a better tuned,
at least for us).
Not needed for .deb, since debian/ubunto do not install tuned by
default.
Fixes#7696Closes#7776
We have "Conflicts: kernel < 3.10.0-514" on rpm package to make sure
the environment is running newer kernel.
However, user may use non-standard kernel which has different package name,
like kernel-ml or kernel-uek.
On such environment Conflicts tag does not works correctly.
Even the system running with newer kernel, rpm only checks "kernel" package
version number.
To avoid such issue, we need to drop Conflicts tag.
Fixes#7675
Since f3bcd4d205 ("Merge 'Support SSL Certificate Hot
Reloading' from Calle"), we reload certificates as they are
modified on disk. This uses inotify, which is limited by a
sysctl fs.inotify.max_user_instances, with a default of 128.
This is enough for 64 shards only, if both rpc and cql are
encrypted; above that startup fails.
Increase to 1200, which is enough for 6 instances * 200 shards.
Fixes#7700.
Closes#7701
When we introduced dependencies.conf, we mistakenly added it on rpm as %ghost,
but it should be normal file, should be installed normally on package installation.
Fixes#7703Closes#7704
We require a kernel that is at least 3.10.0-514, because older
kernel have an XFS related bug that causes data corruption. However
this Requires: clause pulls in a kernel even in Docker installation,
where it (and especially the associated firmware) occupies a lot of
space.
Change to a Conflicts: instead. This prevents installation when
the really old kernel is present, but doesn't pull it in for the
Docker image.
Closes#7502
Except scylla-python3, each scylla package has its own git repository, same package script filename, same build directory structure.
To put python3 thing on scylla repo, we created 'python3' directory on multiple locations, made '-python3' suffixed files, dig deeper build directory not to conflict scylla-server package build.
We should move all scylla-python3 related files to new repository, scylla-python3.
To keep compatibility with current Jenkins script, provide packages on
build/ directory for now.
Fixes#6751
Since scylla-cpupower.service isn't installed by .rpm package, but created
in the setup script, it's better to not use /usr/lib directory, use /etc.
We already doing same way for scylla-server.service.d/*.conf, *.mount, and
*.swap created by setup scripts.
Amazon Linux 2 has /usr/bin/cpupower, but does not have cpupower.service
unlike CentOS7.
We need to provide the .service file when distribution is Amazon Linux 2.
Fixes#5977
To make unified relocatable package easily, we may want to merge tarballs to single tarball like this:
zcat *.tar.gz | gzip -c > scylla-unified.tar.xz
But it's not possible with current relocatable package format, since there are multiple files conflicts, install.sh, SCYLLA-*-FILE, dist/, README.md, etc..
To support this, we need to archive everything in the directory when building relocatable package.
This is modifying relocatable package format, we need to provide a way to
detect the format version.
To do this, we added a new file ".relocatable_package_version" on the top of the
archive, and set version number "2" to the file.
Fixes#6315
We use pystache to parametrize our scylla.spec, but pystache is not
present in Fedora 32. Fortunately rpm provides its own template mechanism,
and this patch switches to using it:
- no longer install pystache
- pass parameters via rpm "-D" options
- use 0/1 for conditionals instead of true/false as per rpm conventions
- sanitize the "product" variable to not contain dashes
- change the .spec file to use rpm templating: %{...} and %if ... %endif
instead of mustache templating
To use install.sh as Scylla install script w/o using .rpm/.deb package,
we need to provide a way to upgrade Scylla version, not just install.
With --upgrade option, install.sh does not overwrite config files.
It will install <filename>.new file on same directory, when old config file and
new config file does not contain same data.
If old one and new one is exactly same, it will nothing.
To implement this, rewriting api_ui_dir/api_doc_dir path on scylla.yaml
moved from .rpm/.deb scriptlet to install.sh.
Fixes#5874
On some environment systemd-coredump does not work with symlink directory,
we can use bind-mount instead.
Also, it's better to check systemd-coredump is working by generating coredump.
To fix#5916, drop scylla_coredump_setup from .rpm %post scriptlet.
Fixes#5753Fixes#5916
This reverts commit 65aadad9a6. It causes
crashes (due to the coredump test) during package install, since scylla_coredump_setup
is called from rpm postinstall. The test should be done only from scylla_setup (and
the user should be warned).
Fixes#5916.
To install scylla using install.sh easily, we need to run following things:
- add scylla user/group
- configure scylla.yaml
- run scylla_post_install.sh
But we don't want to run them when we build .rpm/.deb package,
we also need to add --packaging option to skip them.
Fixes#5830
On some environment systemd-coredump does not work with symlink directory,
we can use bind-mount instead.
Also, it's better to check systemd-coredump is working by generating coredump.
Fixes#5753
By default, `/usr/lib/rpm/find-debuginfo.sh` will temper with
the binary's build-id when stripping its debug info as it is passed
the `--build-id-seed <version>.<release>` option.
To prevent that we need to set the following macros as follows:
unset `_unique_build_ids`
set `_no_recompute_build_ids` to 1
Fixes#5881
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
On Debian, we don't add xfsprogs/mdadm on package dependency, install on
scylla_raid_setup script instead.
Since xfsprogs/mdadm only needed for constructing RAID, we can move
dependencies to scylla_raid_setup too.
rpm compression uses xz, which is painfully slow. Adjust the
compression settings to run on all threads.
The xz utility documentation suggests that 0 threads is
equivalent to all CPUs, but apparently the library interface
(which rpmbuild uses) doesn't think the same way.
Message-Id: <20200101141544.1054176-1-avi@scylladb.com>
Since we merged /usr/lib/scylla with /opt/scylladb, we removed
/usr/lib/scylla and replace it with the symlink point to /opt/scylladb.
However, RPM does not support replacing a directory with a symlink,
we are doing some dirty hack using RPM scriptlet, but it causes
multiple issues on upgrade/downgrade.
(See: https://docs.fedoraproject.org/en-US/packaging-guidelines/Directory_Replacement/)
To minimize Scylla upgrading/downgrade issues on user side, it's better
to keep /usr/lib/scylla directory.
Instead of creating single symlink /usr/lib/scylla -> /opt/scylladb,
we can create symlinks for each setup scripts like
/usr/lib/scylla/<script> -> /opt/scylladb/scripts/<script>.
Fixes#5522Fixes#4585Fixes#4611
The vm.swappiness sysctl controls the kernel's prefernce for swapping
anonymous memory vs page cache. Since Scylla uses very large amounts
of anonymous memory, and tiny amounts of page cache, the correct setting
is to prefer swapping page cache. If the kernel swaps anonymous memory
the reactor will stall until the page fault is satisfied. On the other
hand, page cache pages usually belong to other applications, usually
backup processes that read Scylla files.
This setting has been used in production in Scylla Cloud for a while
with good results.
Users can opt out by not installing the scylla-kernel-conf package
(same as with the other kernel tunables).
Merged pull request https://github.com/scylladb/scylla/pull/5310 from
Avi Kivity:
This is a minor update as gcc and boost versions did not change. A noteable
update is patchelf 0.10, which adds support to large binaries.
A few minor issues exposed by the update are fixed in preparatory patches.
Patches:
dist: rpm: correct systemd post-uninstall scriptlet
build: force xz compression on rpm binary payload
tools: toolchain: update to Fedora 31
By default rpm uses dwz to merge the debug info from various
binaries. Unfortunately, it looks like addr2line has not been updated
to handle this:
// This works
$ addr2line -e build/release/scylla 0x1234567
$ dwz -m build/release/common.debug build/release/scylla.debug build/release/iotune.debug
// now this fails
$ addr2line -e build/release/scylla 0x1234567
I think the issue is
https://sourceware.org/bugzilla/show_bug.cgi?id=23652Fixes#5289
Signed-off-by: Rafael Ávila de Espíndola <espindola@scylladb.com>
Message-Id: <20191123015734.89331-1-espindola@scylladb.com>
Fedora 31 switched the default compression to zstd, which isn't readable
by some older rpm distributions (CentOS 7 in particular). Tell it to use
the older xz compression instead, so packages produced on Fedora 31 can
be installed on older distributions.
The post-uninstall scriptlet requires a parameter, but older versions
of rpm survived without it. Fedora 31's rpm is more strict, so supply
this parameter.
Since systemd unit can override parameters using drop-in unit, we don't need
mustache template for them.
Also, drop --disttype and --target options on install.sh since it does not
required anymore, introduce --sysconfdir instead for non-redhat distributions.
It is well known that seastar applications, like Scylla, do not play
well with external processes: CPU usage from external processes may
confuse the I/O and CPU schedulers and create stalls.
We have also recently seen that memory usage from other application's
anonymous and page cache memory can bring the system to OOM.
Linux has a very good infrastructure for resource control contributed by
amazingly bright engineers in the form of cgroup controllers. This
infrastructure is exposed by SystemD in the form of slices: a
hierarchical structure to which controllers can be attached.
In true systemd way, the hierarchy is implicit in the filenames of the
slice files. a "-" symbol defines the hierarchy, so the files that this
patch presents, scylla-server and scylla-helper, essentially create a
"scylla" cgroup at the top level with "server" and "helper" children.
Later we mark the Services needed to run scylla as belonging to one
or the other through the Slice= directive.
Scylla DBAs can benefit from this setup by using the systemd-run
utility to fire ad-hoc commands.
Let's say for example that someone wants to hypothetically run a backup
and transfer files to an external object store like S3, making sure that
the amount of page cache used won't create swap pressure leading to
database timeouts.
One can then run something like:
```
sudo systemd-run --uid=`id -u scylla` --gid=`id -g scylla` -t --slice=scylla-helper.slice /path/to/my/magical_backup_tool
```
(or even better, the backup tool can itself be a systemd timer)
Changes from last version:
- No longer use the CPUQuota
- Minor typo fixes
- postinstall fixup for small machines
Benchmark results:
==================
Test: read from disk, with 100% disk util using a single i3.xlarge (4 vCPUs).
We have to fill the cache as we read, so this should stress CPU, memory and
disk I/O.
cassandra-stress command:
```
cassandra-stress read no-warmup duration=5m -rate threads=20 -node 10.2.209.188 -pop dist=uniform\(1..150000000\)
```
Baseline results:
```
Results:
Op rate : 13,830 op/s [READ: 13,830 op/s]
Partition rate : 13,830 pk/s [READ: 13,830 pk/s]
Row rate : 13,830 row/s [READ: 13,830 row/s]
Latency mean : 1.4 ms [READ: 1.4 ms]
Latency median : 1.4 ms [READ: 1.4 ms]
Latency 95th percentile : 2.4 ms [READ: 2.4 ms]
Latency 99th percentile : 2.8 ms [READ: 2.8 ms]
Latency 99.9th percentile : 3.4 ms [READ: 3.4 ms]
Latency max : 12.0 ms [READ: 12.0 ms]
Total partitions : 4,149,130 [READ: 4,149,130]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:05:00
```
Question 1:
===========
Does putting scylla in a special slice affect its performance ?
Results with Scylla running in a slice:
```
Results:
Op rate : 13,811 op/s [READ: 13,811 op/s]
Partition rate : 13,811 pk/s [READ: 13,811 pk/s]
Row rate : 13,811 row/s [READ: 13,811 row/s]
Latency mean : 1.4 ms [READ: 1.4 ms]
Latency median : 1.4 ms [READ: 1.4 ms]
Latency 95th percentile : 2.2 ms [READ: 2.2 ms]
Latency 99th percentile : 2.6 ms [READ: 2.6 ms]
Latency 99.9th percentile : 3.3 ms [READ: 3.3 ms]
Latency max : 23.2 ms [READ: 23.2 ms]
Total partitions : 4,151,409 [READ: 4,151,409]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:05:00
```
*Conclusion* : No significant change
Question 2:
===========
What happens when there is a CPU hog running in the same server as scylla?
CPU hog:
```
taskset -c 0 /bin/sh -c "while true; do true; done" &
taskset -c 1 /bin/sh -c "while true; do true; done" &
taskset -c 2 /bin/sh -c "while true; do true; done" &
taskset -c 3 /bin/sh -c "while true; do true; done" &
sleep 330
```
Scenario 1: CPU hog runs freely:
```
Results:
Op rate : 2,939 op/s [READ: 2,939 op/s]
Partition rate : 2,939 pk/s [READ: 2,939 pk/s]
Row rate : 2,939 row/s [READ: 2,939 row/s]
Latency mean : 6.8 ms [READ: 6.8 ms]
Latency median : 5.3 ms [READ: 5.3 ms]
Latency 95th percentile : 11.0 ms [READ: 11.0 ms]
Latency 99th percentile : 14.9 ms [READ: 14.9 ms]
Latency 99.9th percentile : 17.1 ms [READ: 17.1 ms]
Latency max : 26.3 ms [READ: 26.3 ms]
Total partitions : 884,460 [READ: 884,460]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:05:00
```
Scenario 2: CPU hog runs inside scylla-helper slice
```
Results:
Op rate : 13,527 op/s [READ: 13,527 op/s]
Partition rate : 13,527 pk/s [READ: 13,527 pk/s]
Row rate : 13,527 row/s [READ: 13,527 row/s]
Latency mean : 1.5 ms [READ: 1.5 ms]
Latency median : 1.4 ms [READ: 1.4 ms]
Latency 95th percentile : 2.4 ms [READ: 2.4 ms]
Latency 99th percentile : 2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile : 3.8 ms [READ: 3.8 ms]
Latency max : 18.7 ms [READ: 18.7 ms]
Total partitions : 4,069,934 [READ: 4,069,934]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:05:00
```
*Conclusion*: With systemd slice we can keep the performance very close to
baseline
Question 3:
===========
What happens when there is a CPU hog running in the same server as scylla?
I/O hog: (Data in the cluster is 2x size of memory)
```
while true; do
find /var/lib/scylla/data -type f -exec grep glauber {} +
done
```
Scenario 1: I/O hog runs freely:
```
Results:
Op rate : 7,680 op/s [READ: 7,680 op/s]
Partition rate : 7,680 pk/s [READ: 7,680 pk/s]
Row rate : 7,680 row/s [READ: 7,680 row/s]
Latency mean : 2.6 ms [READ: 2.6 ms]
Latency median : 1.3 ms [READ: 1.3 ms]
Latency 95th percentile : 7.8 ms [READ: 7.8 ms]
Latency 99th percentile : 10.9 ms [READ: 10.9 ms]
Latency 99.9th percentile : 16.9 ms [READ: 16.9 ms]
Latency max : 40.8 ms [READ: 40.8 ms]
Total partitions : 2,306,723 [READ: 2,306,723]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:05:00
```
Scenario 2: I/O hog runs in the scylla-helper systemd slice:
```
Results:
Op rate : 13,277 op/s [READ: 13,277 op/s]
Partition rate : 13,277 pk/s [READ: 13,277 pk/s]
Row rate : 13,277 row/s [READ: 13,277 row/s]
Latency mean : 1.5 ms [READ: 1.5 ms]
Latency median : 1.4 ms [READ: 1.4 ms]
Latency 95th percentile : 2.4 ms [READ: 2.4 ms]
Latency 99th percentile : 2.9 ms [READ: 2.9 ms]
Latency 99.9th percentile : 3.5 ms [READ: 3.5 ms]
Latency max : 183.4 ms [READ: 183.4 ms]
Total partitions : 3,984,080 [READ: 3,984,080]
Total errors : 0 [READ: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 00:05:00
```
*Conclusion*: With systemd slice we can keep the performance very close to
baseline
Signed-off-by: Glauber Costa <glauber@scylladb.com>
There are systemd-related steps done in both rpm and deb builds.
Move that to a script so we avoid duplication.
The tests are so far a bit specific to the distributions, so it
needs to be adapted a bit.
Also note that this also fixes a bug with rpm as a side-effect:
rpm does not call daemon-reload after potentially changing the
systemd files (it is only implied during postun operations, that
happen during uninstall). daemon-reload was called explicitly for
debian packages, and now it is called for both.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
On previous commit ac9b115a8f, install.sh requires to specify single package using --pkg, there is no way to select all.
It should be select all packages when running install.sh without --pkg.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190731013245.5857-1-syuu@scylladb.com>