The BlockIOWeight and CPUShares are deprecated. They are only used on
RHEL 7, which has reached end-of-life. Their replacements, IOWeight
and CPUWeight, are already set in the file.
Remove the deprecated settings to reduce noise in the logs.
Closesscylladb/scylladb#27222
Python 3.14 changed the multiprocessing fork mode to "forkserver",
presumably for good reasons. However, it conflicts with our
relocatable Python system. "forkserver" forks and execs a Python
process at startup, but it does this without supplying our relocated
ld.so. The system ld.so detects a conflict and crashes.
Fix this by switching back to "fork", which is sufficient for
housekeeping's modest needs.
Closesscylladb/scylladb#26831
There are some environment which has corrupted NUMA topology
information, such as some instance types on AWS EC2 with specific Linux
kernel images.
On such environment, we cannot get HW information correctly from hwloc,
so we cannot proceed optimization on perftune.
To avoid causing script error, check NUMA topology information and skip
running perftune if the information corrupted.
Related scylladb/seastar#2925Closesscylladb/scylladb#26344
In f828fe0d59 ("setup: add the lazytime XFS version") we added the
lazytime mount option to /var/lib/scylla, but it was quickly reverted
(8f5e80e61a) as it caused a regression on CentOS 7.
We reinstate it now with a kernel version check. This will avoid
the lazytime mount option on CentOS 7, which is unsupported anyway.
The lazytime option avoids marking the inode as dirty if it's only for the
purpose of updating mtime/ctime. This won't help much while writing sstables
(since the write also updates extent information), but may help a little
with with commitlog writes, since those are pure overwrites.
It likely won't help with the RWF_NOWAIT violations seen in [1], since
those are likely due to in-memory locking, not flushing dirty inodes
to disk.
Tested with an install to Ubuntu 24.04 LTS followed by a scylla_setup run.
The lazytime option was added the the .mount file and showed up in
the live mount.
[1] https://github.com/scylladb/seastar/issues/2974
Closes scylladb/scylladb#26436
Fixes#26002
We noticed during work on scylladb/seastar#2802 that on i7i family
(later proved that it's valid for i4i family as well),
the disks are reporting the physical sector sizes incorrectly
as 512bytes, whilst we proved we can render much better write IOPS with
4096bytes.
This is not the case on AWS i3en family where the reported 512bytes
physical sector size is also the size we can achieve the best write IOPS.
This patch works around this issue by changing `scylla_io_setup` to parse
the instance type out of `/sys/devices/virtual/dmi/id/product_name`
and run iotune with the correct request size based on the instance type.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#25315
This is not needed on clean disks, which is often the case with cloud instances, but can be useful on bare metal servers with disks that were used before.
Therefore, the default is to skip blkdiscard operation, which makes overall installation faster.
If the user wishes to run it anyway, use the newly introduced --blkdiscard option of scylla_raid_setup to perform it.
Note: since we either perform online discard or schedule fstrim, the (previously used) space will gradually get trimmed, this way or another.
Fixes: https://github.com/scylladb/scylladb/issues/24470
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Closesscylladb/scylladb#24579
Since it is requirement for Red Hat OpenShift Certification, we need to
run the container as non-root user.
Related scylladb/scylla-pkg#4858
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
The Intel Optimizaton Manual states that branches with relative offsets
greater than 2GB suffer a penalty. They cite a 6% improvement when this
is avoided. Our code doesn't rely heavily on dynamically linked
libraries, so I don't expect a similar win, but it's still better to do
it than not.
Eliminate long branches by asking the dynamic linker to restrict itself
to the lower 4GB of the address space. I saw that it maps libraries
at 1GB+ addresses, so this satisfies the limitation.
Fix is from the Intel Optimization Manual as well.
This change was ported from ScyllaDB Enterprise.
Closesscylladb/scylladb#22498
bash error handling and reporting is atrocious. Without -e it will
just ignore errors. With -e it will stop on errors, but not report
where the error happened (apart from exiting itself with an error code).
Improve that with the `trap ERR` command. Note that this won't be invoked
on intentional error exit with `exit 1`.
We apply this on every bash script that contains -e or that it appears
trivial to set it in. Non-trivial scripts without -e are left unmodified,
since they might intentionally invoke failing scripts.
Closesscylladb/scylladb#22747
The block size of 1k is significantly increasing metadata overhead with xfs since it reserves space upfront for btree expansion. With CRC disabled, this reservation doesn't happen. Smaller btree blocks reduce the fanout factor, increasing btree height and the reservation size. So block size implies a trade-off between write amplification and metadata size. Bigger blocks, smaller metadata, more write ampl. Smaller blocks, more metadata, and less write ampl.
Let's disable both `rmapbt` and `relink` since we replicate data, and we can afford to rebuild a replica on local corruption.
Fixes: https://github.com/scylladb/scylladb/issues/22028Closesscylladb/scylladb#22072
Use raw string literals to prevent syntax warnings when using regular
expressions with backslash-based patterns.
The original code triggered a SyntaxWarning in developer mode (`python3 -Xdev`)
due to unescaped backslash characters in regex patterns like '\s'. While
CPython typically interprets these silently, strict Python parsing modes
raise warnings about potentially unintended escape sequences.
This change adds the `r` prefix to string literals containing regex patterns,
ensuring consistent behavior across different Python runtime configurations
and eliminating unnecessary syntax warning like:
```
/opt/scylladb/scripts/libexec/scylla_io_setup:41: SyntaxWarning: invalid escape sequence '\s'
pattern = re.compile(_nocomment + r"CPUSET=\s*\"" + _reopt(_cpuset) + _reopt(_smp) + "\s*\"")
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21839
Since we dropped scylla-jmx at 3cd2a61, Wants=scylla-jmx.service is not
needed anymore.
Also we have issue on nonroot mode installation with this line (#21720),
we need to drop this now.
Fixes#21720Closesscylladb/scylladb#21721
After merged 5a470b2, we found that scylla_raid_setup fails on offline mode
installation.
This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect
internet.
Seems like it occur because of missing "policycoreutils-python-utils"
package, which is the package for "semange" command.
So we need to implement the relabeling patch without using the command.
Fixes#21441
before this change, we specify the KillMode of the scylla-service
service unit explicitly to "process". according to
according to
https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html,
> If set to process, only the main process itself is killed (not recommended!).
and the document suggests use "control-group" over "process".
but scylla server is not a multi-process server, it is a multi-threaded
server. so it should not make any difference even if we switch to
the recommended "control-group".
in the light that we've been seeing "defunct" scylla process after
stopping the scylla service using systemd. we are wondering if we should
try to change the `KillMode` to "control-group", which is the default
value of this setting.
in this change, we just drop the setting so that the systemd stops the
service by stopping all processes in the control group of this unit
are stopped.
Refs scylladb/scylladb#21507
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21508
This collector reads nvme temperature sensor, which was observed to
cause bad performance on Azure cloud following the reading of the
sensor for ~6 seconds. During the event, we can see elevated system
time (up to 30%) and softirq time. CPU utilization is high, with
nvm_queue_rq taking several orders of magnitude more time than
normally. There are signs of contention, we can see
__pv_queued_spin_lock_slowpath in the perf profile, called. This
manifests as latency spikes and potentially also throughput drop due
to reduced CPU capacity.
By default, the monitoring stack queries it once every 60s.
Closesscylladb/scylladb#21165
On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump because the service only have write acess with systemd_coredump_var_lib_t. To make it writable, we need to add file context rule for /var/lib/scylla/coredump, and run restorecon on /var/lib/scylla.
Fixes#19325Closesscylladb/scylladb#20528
* github.com:scylladb/scylladb:
scylla_raid_setup: configure SELinux file context
scylla_coredump_setup: fix SELinux configuration for RHEL9
Since JMX server is deprecated, drop them from submodule, build system
and package definition.
Related scylladb/scylla-tools-java#370
Related #14856
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closesscylladb/scylladb#17969
On RHEL9, systemd-coredump fails to coredump on /var/lib/scylla/coredump
because the service only have write acess with systemd_coredump_var_lib_t.
To make it writable, we need to add file context rule for
/var/lib/scylla/coredump, and run restorecon on /var/lib/scylla.
Fixes#20573
Seems like specific version of systemd pacakge on RHEL9 has a bug on
SELinux configuration, it introduced "systemd-container-coredump" module
to provide rule for systemd-coredump, but not enabled by default.
We have to manually load it, otherwise it causes permission error.
Fixes#19325
On very large node, LimitNOFILES=80000 may not enough size, it can cause
"Too many files" error.
To avoid that, let's increase LimitNOFILES on scylla_setup stage,
generate optimal value calurated from memory size and number of cpus.
Closesscylladb/scylla-enterprise#4304Closesscylladb/scylladb#20443
before this change, when running `scylla-housekeeping`:
```
/opt/scylladb/scripts/libexec/scylla-housekeeping:122: SyntaxWarning: invalid escape sequence '\s'
match = re.search(".*http.?://repositories.*/scylladb/([^/\s]+)/.*/([^/\s]+)/scylladb-.*", line)
```
we could have the warning above. because `\s` is not a valid escape
sequence, but the Python interpreter accepts it as two separated
characters of `\s` after complaining. but it's still annoying.
so, let's use a raw string here.
Refs scylladb/scylladb#20317
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#20359
as we have an API for restore a keyspace / table, let's expose this feature
with nodetool. so we can exercise it without the help of scylla-manager
or 3rd-party tools with a user-friendly interface.
in this change:
* add a new subcommand named "restore" to nodetool
* add test to verify its interaction with the API server
* update the document accordingly.
* the bash completion script is updated accordingly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
as we have an API for backup a keyspace, let's expose this feature
with nodetool. so we can exercise it without the help of scylla-manager
or 3rd-party tools with a user-friendly interface.
in this change:
* add a new subcommand named "backup" to nodetool
* add test to verify its interaction with the API server
* add two more route to the REST API mock server, as
the test is using /task_manager/wait_task/{task_id} API.
for the sake of completeness, the route for
/task_manager/{part1} is added as well.
* update the document accordingly.
* the bash completion script is updated accordingly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This reverts commit c3bea539b6.
Since it breaking offline-installer artifact-tests. Also, it seems that we should have merged it in the first place since we don't need scylla-housekeeping checks for offline-installer
Closesscylladb/scylladb#19976
Since Python 3.12, version parsing becomes strict, parse_version() does
not accept the version string like '6.1.0~dev'.
To fix this, we need to pass acceptable version string to parse_version() like
'6.1.0.dev0', which is allowed on Python version scheme.
Also, release canditate version like '6.0.0~rc3' has same issue, it
should be replaced to '6.0.0rc3' to compare in parse_version().
reference: https://packaging.python.org/en/latest/specifications/version-specifiers/Fixes#19564Closesscylladb/scylladb#19572
This reverts commit 65fbf72ed0, since
it breaks scylla-housekeeping and SCT because the patch modified
version string.
We shoudn't modify version string directly, need to pass
modified string just for parse_version() instead.
On Ubuntu/Debian, we have to install systemd-coredump before
running has_ztd(), since it detect ZSTD support by running coredumpctl.
Move pkg_install('systemd-coredump') to the head of the script.
Fixes#19643Closesscylladb/scylladb#19648
We disabled coredump compression by default because it was too slow,
but recent versions of systemd-coredump supports faster zstd based compression,
so let's enable compression by default when zstd support detected.
Related scylladb/scylla-machine-image#462Closesscylladb/scylladb#18854
ConfigParser.readfp was deprecated in Python 3.2 and removed in Python 3.12.
Under Fedora 40, the container fails to launch because it cannot parse its
configuration.
Fix by using the newer read_file().
Closesscylladb/scylladb#19236
The default limit of open file descriptors
per process may be too small for iotune on
certain machines with large number of cores.
In such case iotune reports failure due to
unability to create files or to set up seastar
framework.
This change configures the limit of open file
descriptors before running iotune to ensure
that the failure does not occur.
The limit is set via 'resource.setrlimit()' in
the parent process. The limit is then inherited
by the child process.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#18546
Following
b8634fb244
machine image started to fail with the following error:
```
10:44:59 ␛[0;32m googlecompute.gce: scylla-jmx package is not installed.␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: Traceback (most recent call last):␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: File "/home/ubuntu/scylla_install_image", line 135, in <module>␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: run('/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup', shell=True, check=True)␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: File "/usr/lib/python3.10/subprocess.py", line 526, in run␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: raise CalledProcessError(retcode, process.args,␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: subprocess.CalledProcessError: Command '/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup' returned non-zero exit status 1.␛[0m
```
It seems we no longer need to verify that jmx and tools-java packages are installed.
Closesscylladb/scylladb#18494
* tools/java b810e8b00e...4ee15fd9ea (1):
> install.sh: don't install nodetool into /usr/bin
Add a bin/nodetool and install it to bin/ in install.sh. This script
simply forwards to scylla nodetool and it is the replacement for the
Java nodetool, which is dropped from the java-tools's install.sh, in the
submodule update also included in this patch.
With this change, we now hardwire the usage of the native nodetool, as
*the* nodetool, with the intermediary nodetool wrapper script removed
from the picture.
Bash completion was copied from the java tools repository and it is now
installed by the scylla package, together with nodetool.
The Java nodetool is still available as as a fall-back, in case the
native nodetool has problems, at the path of
/opt/scylladb/share/cassandra/bin/nodetool.
Testing
I tested upgrades on a DEB and RPM distro: Ubuntu and Fedora.
First I installed scylla-5.4, then I installed the packages for this PR.
On Ubuntu, I had to use dpkg -i --auto-deconfigure, otherwise, dpkg would
refuse to install the new packages because they break the old ones. No
extra flags were required on Fedora.
In both cases, /usr/bin/nodetool was changed from a thunk calling the
Java nodetool (from 5.4) to the native launcher script from this PR.
/opt/scylladb/share/cassandra/bin/nodetool remained in place and still
works after the upgrade.
I also verified that --nonroot installs also work. Nodetool works both
when called with an absolute path, or when ~/scylladb/bin is added to
$PATH.
Fixes: #18226Fixes: #17412Closesscylladb/scylladb#18255
[avi: reset submodule to actual hash we ended up with]
the quote of "The minimum block size for crc enabled filesystems is
1024" comes from the output of mkfs.xfs, let's quote the source for
better maintainability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17094