scylla_raid_setup may fail on Ubuntu minimal image since it calls
update-initramfs without installing.
(cherry picked from commit b6dedf1ee1)
Closesscylladb/scylladb#19871
The default limit of open file descriptors
per process may be too small for iotune on
certain machines with large number of cores.
In such case iotune reports failure due to
unability to create files or to set up seastar
framework.
This change configures the limit of open file
descriptors before running iotune to ensure
that the failure does not occur.
The limit is set via 'resource.setrlimit()' in
the parent process. The limit is then inherited
by the child process.
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
Closesscylladb/scylladb#18546
Following
b8634fb244
machine image started to fail with the following error:
```
10:44:59 ␛[0;32m googlecompute.gce: scylla-jmx package is not installed.␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: Traceback (most recent call last):␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: File "/home/ubuntu/scylla_install_image", line 135, in <module>␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: run('/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup', shell=True, check=True)␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: File "/usr/lib/python3.10/subprocess.py", line 526, in run␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: raise CalledProcessError(retcode, process.args,␛[0m
10:44:59 ␛[1;31m==> googlecompute.gce: subprocess.CalledProcessError: Command '/opt/scylladb/scripts/scylla_setup --no-coredump-setup --no-sysconfig-setup --no-raid-setup --no-io-setup --no-ec2-check --no-swap-setup --no-cpuscaling-setup --no-ntp-setup' returned non-zero exit status 1.␛[0m
```
It seems we no longer need to verify that jmx and tools-java packages are installed.
Closesscylladb/scylladb#18494
* tools/java b810e8b00e...4ee15fd9ea (1):
> install.sh: don't install nodetool into /usr/bin
Add a bin/nodetool and install it to bin/ in install.sh. This script
simply forwards to scylla nodetool and it is the replacement for the
Java nodetool, which is dropped from the java-tools's install.sh, in the
submodule update also included in this patch.
With this change, we now hardwire the usage of the native nodetool, as
*the* nodetool, with the intermediary nodetool wrapper script removed
from the picture.
Bash completion was copied from the java tools repository and it is now
installed by the scylla package, together with nodetool.
The Java nodetool is still available as as a fall-back, in case the
native nodetool has problems, at the path of
/opt/scylladb/share/cassandra/bin/nodetool.
Testing
I tested upgrades on a DEB and RPM distro: Ubuntu and Fedora.
First I installed scylla-5.4, then I installed the packages for this PR.
On Ubuntu, I had to use dpkg -i --auto-deconfigure, otherwise, dpkg would
refuse to install the new packages because they break the old ones. No
extra flags were required on Fedora.
In both cases, /usr/bin/nodetool was changed from a thunk calling the
Java nodetool (from 5.4) to the native launcher script from this PR.
/opt/scylladb/share/cassandra/bin/nodetool remained in place and still
works after the upgrade.
I also verified that --nonroot installs also work. Nodetool works both
when called with an absolute path, or when ~/scylladb/bin is added to
$PATH.
Fixes: #18226Fixes: #17412Closesscylladb/scylladb#18255
[avi: reset submodule to actual hash we ended up with]
the quote of "The minimum block size for crc enabled filesystems is
1024" comes from the output of mkfs.xfs, let's quote the source for
better maintainability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17094
apt_install() / apt_uninstall() may fail if background process running
apt operation, such as unattended-upgrades.
To avoid this, we need to add two things:
1. For apt-get install / remove, we need to option "DPkg::Lock::Timeout=-1"
to wait for dpkg lock.
2. For apt-get update, there is no option to wait for cache lock.
Therefore, we need to implement retry-loop to wait for apt-get update
succeed.
Fixes#16537Closesscylladb/scylladb#16561
Since we decided to drop CentOS7 support from latest version of Scylla, now we can drop CentOS7 specific codes from packaging scripts and setup scripts.
Related scylladb/scylla-enterprise#3502Closesscylladb/scylladb#16365
* github.com:scylladb/scylladb:
scylla-server.service: switch deprecated PermissionsStartsOnly to ExecStartPre=+
dist: drop legacy control group parameters
scylla-server.slice: Drop workaround for MemorySwapMax=0 bug
dist: move AmbientCapabilities to scylla-server.service
Revert "scylla_setup: add warning for CentOS7 default kernel"
[avi: CentOS 7 reached EOL on June 2024]
On dffadabb94 we mistakenly added
"if args.overwrite_unit_file", but the option is comming from unmerged
patch.
So we need to drop this to fix script error.
Fixes#16331Closesscylladb/scylladb#16358
Since we dropped support CentOS7, now we always can use AmbientCapabilities
without systemd version check, so we can move it from capabilities.conf
to scylla-server.service.
Although, we still cannnot hardcode CAP_PERFMON since it is too new,
only newer kernel supported this, so keep it on scylla_post_install.sh
This patch fixes error check and speed up swap allocation.
Following patches are included:
- scylla_swap_setup: run error check before allocating swap
avoid create swapfile before running error check
- scylla_swap_setup: use fallocate on ext4
this inclease swap allocation speed on ext4
Closesscylladb/scylladb#12668
* github.com:scylladb/scylladb:
scylla_swap_setup: use fallocate on ext4
scylla_swap_setup: run error check before allocating swap
Add CAP_PERFMON to AmbientCapabilities in capabilities.conf, to enable
perf_event based stall detector in Seastar.
However, on Debian/Ubuntu CAP_PERFMON with non-root user does not work
because it sets kernel.perf_event_paranoid=4 which disallow all non-root
user access.
(On Debian it kernel.perf_event_paranoid=3)
So we need to configure kernel.perf_event_paranoid=2 on these distros.
see: https://askubuntu.com/questions/1400874/what-does-perf-paranoia-level-four-do
Also, CAP_PERFMON is only available on linux-5.8+, older kernel does not
have this capability.
To enable older kernel environment such as CentOS7, we need to configure
kernel.perf_event_paranoid=1 to allow non-root user access even without
the capability.
Fixes#15743Closesscylladb/scylladb#16070
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.
Refs: https://github.com/scylladb/scylladb/issues/16255Closesscylladb/scylladb#16289
* github.com:scylladb/scylladb:
Update unified/build_unified.sh
Update main.cc
Update dist/common/scripts/scylla-housekeeping
Typos: fix typos in code
On /usr/lib/sysctl.d/99-scylla-sched.conf, we have some sysctl settings to
tune the scheduler for lower latency.
This is mostly to prevent softirq threads processing tcp and reactor threads
from injecting latency into each other.
However, these parameters are moved to debugfs from linux-5.13+, so we lost
scheduler tuneing on recent kernels.
To support tuning recent kernel, let's add a new service which support
to configure both sysctl and debugfs.
The service named scylla-tune-sched.service
The service will unconditionally enables when installed, on older kernel
it will tune via sysctl, on recent kernel it will tune via debugfs.
Fixes#16077Closesscylladb/scylladb#16122
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
Currently, the NIC prompt on scylla_setupshows up virtual devices such as
VLAN devices and bridge devices, but perftune.py does not support them.
To prevent causing error while running scylla_setup, we should stop listing
these devices from the NIC prompt.
closes#6757Closesscylladb/scylladb#15958
On non-interactive mode setup, RHEL/CentOS7 old kernel check causes
"Setup aborted", this is not what we want.
We should keep warning but proceed setup, so default value of the kernel
check should be True, since it will automatically applied on
non-interactive mode.
Fixes#16045Closesscylladb/scylladb#16100
without adding `WantedBy=scylla-server.service` in
var-lib-systemd-coredump, if we starts `scylla-server.service`,
it does not necessarily starts `var-lib-systemd-coredump`
even if the latter is installed.
with `WantedBy=scylla-server.service` in var-lib-systemd-coredump,
if we starts `scylla-server.service`, var-lib-systemd-coredump
will be started also. and `Before=scylla-server.service` ensures
that, before `scylla-server.service` is started,
var-lib-systemd-coredump is already ready.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15984
Since CentOS7 default kernel is too old, has performance issues and also
has some bugs, we have been recommended to use kernel-ml kernel.
Let's check kernel version in scylla_setup and print warning if the
kernel is CentOS7 default one.
related #7365Closesscylladb/scylladb#15705
Unlike yum, "apt-get install" may fails because package cache is outdated.
Let's check package cache mtime and run "apt-get update" if it's too old.
Fixes#4059Closesscylladb/scylladb#15960
Currently, "yum install scylla" causes conflict when ABRT is installed.
To avoid this behavior and keep using systemd-coredump for scylla
coredump, let's drop "Conflicts: abrt" from rpm and
add "Conflicts=abrt-ccpp.service" to systemd unit.
Fixes#892Closesscylladb/scylladb#15691
systemd man page says:
systemd-fstab-generator(3) automatically adds dependencies of type Before= to
all mount units that refer to local mount points for this target unit.
So "Before=local-fs.taget" is the correct dependency for local mount
points, but we currently specify "After=local-fs.target", it should be
fixed.
Also replaced "WantedBy=multi-user.target" with "WantedBy=local-fs.target",
since .mount are not related with multi-user but depends local
filesystems.
Fixes#8761Closesscylladb/scylladb#15647
On some environment such as VMware instance, /dev/disk/by-uuid/<UUID> is
not available, scylla_raid_setup will fail while mounting volume.
To avoid failing to mount /dev/disk/by-uuid/<UUID>, fetch all available
paths to mount the disk and fallback to other paths like by-partuuid,
by-id, by-path or just using real device path like /dev/md0.
To get device path, and also to dumping device status when UUID is not
available, this will introduce UdevInfo class which communicate udev
using pyudev.
Related #11359Closesscylladb/scylladb#13803
Disabling fstrim.timer was for avoid running fstrim on /var/lib/scylla from
both scylla-fstrim.timer and fstrim.timer, but fstrim.timer actually never do
that, since it is only looking on fstab entries, not our systemd unit.
To run fstrim correctly on rootfs and other filesystems not related
scylla, we should stop disabling fstrim.timer.
Fixes#15176
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closes#15177
This argument was dead since its introduction and 'discard' was
always configured regardless of its value.
This patch allows actually configuring things using this argument.
Fixes#14963Closes#14964
Our usage of inodes is dual:
- the Index.db and Data.db components are pinned in memory as
the files are open
- all other components are read once and never looked at again
As such, tune the kernel to prefer evicting dcache/inodes to
memory pages. The default is 100, so the value of 2000 increases
it by a factor of 20.
Ref https://github.com/scylladb/scylladb/issues/14506Closes#14509
Currently, scylla_fstrim_setup does not start scylla-fstrim.timer and
just enables it, so the timer starts only after rebooted.
This is incorrect behavior, we start start it during the setup.
Also, unmask is unnecessary for enabling the timer.
Fixes#14249Closes#14252
The discussion on the thread says, when we reformat a volume with another
filesystem, kernel and libblkid may skip to populate /dev/disk/by-* since it
detected two filesystem signatures, because mkfs.xxx did not cleared previous
filesystem signature.
To avoid this, we need to run wipefs before running mkfs.
Note that this runs wipefs twice, for target disks and also for RAID device.
wipefs for RAID device is needed since wipefs on disks doesn't clear filesystem signatures on /dev/mdX (we may see previous filesystem signature on /dev/mdX when we construct RAID volume multiple time on same disks).
Also dropped -f option from mkfs.xfs, it will check wipefs is working as we
expected.
Fixes#13737
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closes#13738
The commitlog api originally implied that
the commitlog_directory would contain files
from a single commitlog instance. This is
checked in segment_manager::list_descriptors,
if it encounters a file with an unknown
prefix, an exception occurs in
commitlog::descriptor::descriptor, which is
logged with the WARN level.
A new schema commitlog was added recently,
which shares the filesystem directory with
the main commitlog. This causes warnings
to be emitted on each boot. This patch
solves the warnings problem by moving
the schema commitlog to a separate directory.
In addition, the user can employ the new
schema_commitlog_directory parameter to move
the schema commitlog to another disk drive.
By default, the schema commitlog directory is
nested in the commitlog_directory. This can help
avoid problems during an upgrade if the
commitlog_directory in the custom scylla.yaml
is located on a separate disk partition.
This is expected to be released in 5.3.
As #13134 (raft tables->schema commitlog)
is also scheduled for 5.3, and it already
requires a clean rolling restart (no cl
segments to replay), we don't need to
specifically handle upgrade here.
Fixes: #11867
We currently configure only TimeoutStartSec, but probably it's not
enough to prevent coredump timeout, since TimeoutStartSec is maximum
waiting time for service startup, and there is another directive to
specify maximum service running time (RuntimeMaxSec).
To fix the problem, we should specify RunTimeMaxSec and TimeoutSec (it
configures both TimeoutStartSec and TimeoutStopSec).
Fixes#5430Closes#12757
We stop using fallocate for allocating swap since it does not work on
xfs (#6650).
However, dd is much slower than fallocate since it filling data on the
file, let's use fallocate when filesystem is ext4 since it actually
works and faster.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
We should run error check before running dd, otherwise it will left
swapfile on disk without completing swap setup.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Since we moved all IaaS code to scylla-machine-image, we nolonger need
AMI variable on sysconfig file or --ami parameter on setup scripts,
and also never used /etc/scylla/ami_disabled.
So let's drop all of them from Scylla core core.
Related with scylladb/scylla-machine-image#61
Closes#12043
--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.
We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.
Fixes#11700Closes#11831
Currently, --disks options does not allow symlinks such as
/dev/disk/by-uuid/* or /dev/disk/azure/*.
To allow using them, is_unused_disk() should resolve symlink to
realpath, before evaluating the disk path.
Fixes#11634Closes#11646
Even we have __escape() for escaping " middle of the value to writing
sysconfig file, we didn't unescape for reading from sysconfig file.
So adding __unescape() and call it on get().
It seems like distribution original sysconfig files does not use double
quote to set the parameter when the value does not contain space.
Adding function to detect spaces in the value, don't usedouble quote
when it not detected.
Fixes#9149
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".
However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.
To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.
Fixes#11617Closes#11666
Fixes a regression introduced in 80917a1054:
"scylla_prepare: stop generating 'mode' value in perftune.yaml"
When cpuset.conf contains a "full" CPU set the negation of it from
the "full" CPU set is going to generate a zero mask as a irq_cpu_mask.
This is an illegal value that will eventually end up in the generated
perftune.yaml, which in line will make the scylla service fail to start
until the issue is resolved.
In such a case a irq_cpu_mask must represent a "full" CPU set mimicking
a former 'MQ' mode.
Fixes#11701
Tested:
- Manually on a 2 vCPU VM in an 'auto-selection' mode.
- Manually on a large VM (48 vCPUs) with an 'MQ' manually
enforced.
Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>
"
This series adds a long waited transition of our auto-generation
code to irq_cpu_mask instead of 'mode' in perftune.yaml.
And then it fixes a regression in scylla_prepare perftune.yaml
auto-generation logic.
"
* 'scylla_prepare_fix_regression-v1' of https://github.com/vladzcloudius/scylla:
scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
scylla_prepare: stop generating 'mode' value in perftune.yaml
Just like 4a8ed4c, we also need to wait for udev event completion to
create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the
disk just after formatting.
Fixes#11359