On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.
related #10390Closes#10414
We currently does not able to get any error message from subprocess when we specified capture_output=True on subprocess.run().
This is because CalledProcessError does not print stdout/stderr when it raised, and we don't catch the exception, we just let python to cause Traceback.
Result of that, we only able to know exit status and failed command but
not able to get stdout/stderr.
This is problematic especially working on perftune.py bug, since the
script should caused Traceback but we never able to see it.
To resolve this, add wrapper function "out()" for capture output, and
print stdout/stderr with error message inside the function.
Fixes#10390Closes#10391
Previous versions of Docker image runs scylla as root, but cb19048
accidently modified it to scylla user.
To keep compatibility we need to revert this to root.
Fixes#10261Closes#10325
We changed supervisor service name at cb19048, but this breaks
compatibility with scylla-operator.
To fix the issue we need to revert the service name to previous one.
Fixes#10269Closes#10323
Currently, we just passes entire output of perftune.py when getting CPU
mask from the script, but it may cause parse error since the script may
also print warning message.
To avoid that, we need to extract CPU mask from the output.
Fixes#10082Closes#10107
Since our Docker image moved to Ubuntu, we mistakenly copy
dist/docker/etc/sysconfig/scylla-server to /etc/sysconfig, which is not
used in Ubuntu (it should be /etc/default).
So /etc/default/scylla-server is just default configuration of
scylla-server .deb package, --log-to-stdout is 0, same as normal installation.
We don't want keep the duplicated configuration file anyway,
so let's drop dist/docker/etc/sysconfig/scylla-server and configure
/etc/default/scylla-server in build_docker.sh.
Fixes#10270Closes#10280
On e1b15ba, we introduce user-friendly error message when Exception
occured while generating perftune.yaml.
However, it becomes difficult to investigate bugs since we dropped
traceback.
To resolve this problem, let's print both traceback and user-friendly
messages.
Related #10050Closes#10140
If the Docker startup script is passed both "--alternator-port" and
"--alternator-https-port", a combination which is supposed to be
allowed, it passes to Scylla the "--alternator-address" option twice.
This isn't necessary, and worse - not allowed.
So this patch fixes the scyllasetup.py script to only pass this
parameter once.
Fixes#10016.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220202165814.1700047-1-nyh@scylladb.com>
Currently, cloud related code have cross-dependencies between
scylla and scylla-machine-image.
It is not good way to implement, and single change can break both
package.
To resolve the issue, we need to move all cloud related code to
scylla-machine-image, and remove them from scylla repository.
Change list:
- move cloud part of scylla_util.py to scylla-machine-image
- move cloud part of scylla_io_setup to scylla-machine-image
- move scylla_ec2_check to scylla-machine-image
- move cloud part of scylla_bootparam_setup to scylla-machine-image
Closes#9957
Currently, --swap-size does not able to specify exact file size because
the option takes parameter only in GB.
To fix the limitation, let's add --swpa-size-bytes to specify swap size
in bytes.
We need this to implement preallocate swapfile while building IaaS
image.
see scylladb/scylla-machine-image#285
Closes#9971
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.
Fixes#9540
This reverts commit 0d8f932f0b,
because RHEL developer explains this is not a bug, it's expected behavior.
(mdadm --monitor does not start when RAID level is 0)
see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936
So we should stop downgrade mdadm package and modify our script not to
enable mdmonitor.service on RAID0, use it only for RAID5.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.
So, make it return boolean in both cases.
Fixes https://github.com/scylladb/scylla/issues/9848Closes#9849
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.
Fixes#9837Closes#9841
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).
Fixes#9540Closes#9782
Currently, for AWS instances in `is_supported_instance_class()` other than
i3* and *gd (for example: m5d), scylla_io_setup neither provides
preconfigured values for io_properties.yaml nor runs iotune nor fails.
This silently results in a broken io_properties.yaml, like so:
disks:
- mountpoint: /var/lib/scylla
Fix that.
Closes#9660
The downloaded packages might be deleted autotically after installation,
then we will provide an incomplete installer to user.
This patch changed to config apt to keep the downloaded packages before
installation.
Signed-off-by: Amos Kong <kongjianjun@gmail.com>
Closes#9592
Online discard asks the disk to erase flash memory cells as soon
as files are deleted. This gives the disk more freedom to choose
where to place new files, so it improves performance.
On older kernel versions, and on really bad disks, this can reduce
performance so we add an option to disable it.
Since fstrim is pointless when online discard is enabled, we
don't configure it if online discard is selected.
I tested it on an AWS i3.large instance, the flag showd up in
`mount` after configuration.
Closes#9608
On scylla_unit.py, we provide `systemd_unit.is_active()` to return `systemctl is-active` output.
When we introduced systemd_unit class, we just returned `systemctl is-active` output as string, but we changed the return value to bool after that (2545d7fd43).
This was because `if unit.is_active():` always becomes True even it returns "failed" or "inactive", to avoid such scripting bug.
However, probably this was mistake.
Because systemd unit state is not 2 state, like "start" / "stop", there are many state.
And we already using multiple unit state ("activating", "failed", "inactive", "active") in our Cloud image login prompt:
https://github.com/scylladb/scylla-machine-image/blob/next/common/scylla_login#L135
After we merged 2545d7fd43, the login prompt is broken, because it does not return string as script expected (https://github.com/scylladb/scylla-machine-image/issues/241).
I think we should revert 2545d7fd43, it should return exactly same value as `systemctl is-active` says.
Fixes#9627Fixesscylladb/scylla-machine-image#241Closes#9628
* github.com:scylladb/scylla:
scylla_ntp_setup: use string in systemd_unit.is_active()
Revert "scylla_util.py: return bool value on systemd_unit.is_active()"
When building a docker we relay on `VERSION` value from
`SCYLLA-VERSION-GEN` . For `rc` releases only there is a different
between the configured version (X.X.rcX) and the actualy debian package
we generate (X.X~rcX)
Using a similar solution as i did in dcb10374a5Fixes: #9616Closes#9617
Since cqlsh requires UTF-8 locale, we should configure default locale
correctly, on both directly executed shell with docker and via SSH.
(Directly executed shell means "docker exec -ti <image> /bin/bash")
For SSH, we need to set correct parameter on /etc/default/locale, which
can set by update-locale command.
However, directly executed shell won't load this parameter, because it
configured at PAM but we skip login on this case.
To fix this issue, we also need to set locale variables on container
image configuration (ENV in Dockerfile, --env in buildah).
Fixes#9570Closes#9587
GKE metadata server does not provide same metadata as GCE, we should not
return True on is_gce().
So try to fetch machine-type from metadata server, return False if it
404 not found.
Fixes#9471
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closes#9582
Currently, scylla-server fails to start on ARM instances because scylla_io_setup does not have preset parameters even instance type added to 'supported instance'.
To fix this, we need to add io parameter preset on scylla_io_setup.
Also, we mistakenly added EBS only instances at a004b1da30, need to remove them.
Instrances does not have ephemeral disk should be 'unsupported instance', we still run our AMI on it, but we print warning message on login prompt, and user requires to run scylla_io_setup.
Fixes#9493Closes#9532
* github.com:scylladb/scylla:
scylla_util.py: remove EBS only ARM instances from support instance list
scylla_io_setup: support ARM instances on AWS
Since we required ephemeral disks for our AMI, these EBS only ARM
instances cannot add in it is 'supported instance' list.
We still able to run our AMI on these instance types but login message
warns it is 'unsupported instance type', and requires to run
scylla_io_setup manually.
Currently, aws_instance.ephemeral_disks() returns both ephemeral disks
and EBS disks on Nitro System.
This is because both are attached as NVMe disks, we need to add disk
type detection code on NVMe handle logic.
Fixes#9440Closes#9462
Currently, we configure LimitNOFILE on scylla-server.service, but we
don't configure fs.nr_open and fs.file-max.
When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail
to allocate FDs.
To fix this issue, raise fs.file-max and fs.nr_open to enogh size for
scylla.
Fixes#9461Closes#9461
Just like EBS disks for EC2, we want to use persistent disk on GCE.
We won't recommend to use it, but still need to support it.
Related scylladb/scylla-machine-image#215
Closes#9395
On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils.
This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave.
cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service.
To configure scaling_governor correctly, we have to disable this service.
Fixes#9324Closes#9325
To building Ubuntu AMI with CPU scaling configuration, we need force
running mode for scylla_cpuscaling_setup, which run setup without
checking scaling_governor support.
See scylladb/scylla-machine-image#204
Closes#9326
i3en.xlarge is currently not getting tuned properly. A quick test using
Scylla AMI ( ami-07a31481e4394d346 ) reveals that the storage
capabilities under this instance are greatly reduced:
$ grep iops /etc/scylla.d/io_properties.yaml
read_iops: 257024
write_iops: 174080
This patch corrects this typo, in such a way that iotune now properly
tunes this instance type.
Closes#9298
The (oddly-placed) document dist/docker/debian/README.md explains how a
developer can build a Scylla docker image using a self-built Scylla
executable.
While the document begins by saying that you can "build your own
Scylla in whatever build mode you prefer, e.g., dev.", the rest of the
instructions don't fit this example mode "dev" - the second command does
"ninja dist-deb" which builds *all* modes, while the third command
forgets to pass the mode at all (and therefore defaults to "release").
The forth command doesn't work at all, and became irrelevant during
a recent rewrite in commit e96ff3d.
This patch modifies the document to fix those problems.
It ends with an example of how to run the resulting docker image
(this is usually the purpose of building a docker image - to run it
and test it). I did this example using podman because I couldn't get
it to work in docker. Later we can hopefully add the corresponding
docker example.
Fixes#9263.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210829182608.355748-1-nyh@scylladb.com>
Our docker image accepts various command-line arguments and translates
them into Scylla arguments. For example, Alternator's getting-started
document has the following example:
```
docker run --name scylla -d -p 8000:8000 scylladb/scylla-nightly:latest
--alternator-port=8000 --alternator-write-isolation=always```
Recently, this stopped working and the extra arguments at the end were
just ignored.
It turns out that this is a regression caused by commit
e96ff3d82d that changed our docker image
creation process from Dockerfile to buildah. While the entry point
specified in Dockerfile was a string, the same string in buildah has
a strange meaning (an entry point which can't take arguments) and to
get the original meaning, the entry point needs to be a JSON array.
This is kind-of explained in https://github.com/containers/buildah/issues/732.
So changing the entry point from a string to a JSON array fixes the
regression, and we can again pass arguments to Scylla's docker image.
Fixes#9247.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210829180328.354109-1-nyh@scylladb.com>
This is side effect of allowing to run scylla_io_setup in nonroot mode,
the script able to run in non-root user even the installation is not
nonroot mode.
Result of that, the script finally failed to write io_properties.yaml
and causes permission denied. Since the evaluation takes long time, we
should run permission check before starting it.
We need to add root privilege check again, but skip it on nonroot mode.
Fixes#8915Closes#8984
supervisor scripts for Docker and supervisor scripts for offline
installer are almost same, drop Docker one and share same code to
deduplicate them.
Closes#9143Fixes#9194
On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
does not exist even it supported CPU scaling.
Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
avaliable on both environment, so we should switch to it.
Fixes#9191Closes#9193
Even we switched to Ubuntu based container image, housekeeping still
using yum repository.
It should be switched to apt repository.
Fixes#9144Closes#9147