Commit Graph

1429 Commits

Author SHA1 Message Date
Takuya ASADA
48b6aec16a scripts: use "out()" function for all capture_output subprocesses
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.

related #10390

Closes #10414
2022-04-26 13:56:52 +03:00
Takuya ASADA
acaf0bb88a scripts: print perftune.py error message when capture_output=True
We currently does not able to get any error message from subprocess when we specified capture_output=True on subprocess.run().
This is because CalledProcessError does not print stdout/stderr when it raised, and we don't catch the exception, we just let python to cause Traceback.
Result of that, we only able to know exit status and failed command but
not able to get stdout/stderr.

This is problematic especially working on perftune.py bug, since the
script should caused Traceback but we never able to see it.

To resolve this, add wrapper function "out()" for capture output, and
print stdout/stderr with error message inside the function.

Fixes #10390

Closes #10391
2022-04-18 14:06:51 +03:00
Takuya ASADA
f95a531407 docker: run scylla as root
Previous versions of Docker image runs scylla as root, but cb19048
accidently modified it to scylla user.
To keep compatibility we need to revert this to root.

Fixes #10261

Closes #10325
2022-04-04 17:25:13 +03:00
Takuya ASADA
41edc045d9 docker: revert scylla-server.conf service name change
We changed supervisor service name at cb19048, but this breaks
compatibility with scylla-operator.
To fix the issue we need to revert the service name to previous one.

Fixes #10269

Closes #10323
2022-04-03 19:18:18 +03:00
Alexey Kartashov
d86c3a8061 dist/docker: fix incorrect locale value
Docker build script contains an incorrect locale specification for LC_ALL setting,
this commit fixes that.

Fixes #10310

Closes #10321
2022-04-03 14:24:54 +03:00
Takuya ASADA
59adf05951 scylla_sysconfig_setup: avoid perse error on perftune.py --get-cpu-mask
Currently, we just passes entire output of perftune.py when getting CPU
mask from the script, but it may cause parse error since the script may
also print warning message.

To avoid that, we need to extract CPU mask from the output.

Fixes #10082

Closes #10107
2022-03-28 16:31:14 +03:00
Takuya ASADA
bdefea7c82 docker: enable --log-to-stdout which mistakenly disabled
Since our Docker image moved to Ubuntu, we mistakenly copy
dist/docker/etc/sysconfig/scylla-server to /etc/sysconfig, which is not
used in Ubuntu (it should be /etc/default).
So /etc/default/scylla-server is just default configuration of
scylla-server .deb package, --log-to-stdout is 0, same as normal installation.

We don't want keep the duplicated configuration file anyway,
so let's drop dist/docker/etc/sysconfig/scylla-server and configure
/etc/default/scylla-server in build_docker.sh.

Fixes #10270

Closes #10280
2022-03-27 14:50:10 +03:00
Takuya ASADA
59c72d5d60 scylla_prepare: print Traceback with current user-friendly messages
On e1b15ba, we introduce user-friendly error message when Exception
occured while generating perftune.yaml.
However, it becomes difficult to investigate bugs since we dropped
traceback.
To resolve this problem, let's print both traceback and user-friendly
messages.

Related #10050

Closes #10140
2022-03-20 16:55:18 +02:00
Nadav Har'El
cb6630040d docker: don't repeat "--alternator-address" option twice
If the Docker startup script is passed both "--alternator-port" and
"--alternator-https-port", a combination which is supposed to be
allowed, it passes to Scylla the "--alternator-address" option twice.
This isn't necessary, and worse - not allowed.

So this patch fixes the scyllasetup.py script to only pass this
parameter once.

Fixes #10016.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220202165814.1700047-1-nyh@scylladb.com>
2022-02-02 23:26:11 +02:00
Takuya ASADA
c2ccdac297 move cloud related code from scylla repository to scylla-machine-image
Currently, cloud related code have cross-dependencies between
scylla and scylla-machine-image.
It is not good way to implement, and single change can break both
package.

To resolve the issue, we need to move all cloud related code to
scylla-machine-image, and remove them from scylla repository.

Change list:
 - move cloud part of scylla_util.py to scylla-machine-image
 - move cloud part of scylla_io_setup to scylla-machine-image
 - move scylla_ec2_check to scylla-machine-image
 - move cloud part of scylla_bootparam_setup to scylla-machine-image

Closes #9957
2022-02-01 11:26:59 +02:00
Takuya ASADA
218dd3851c scylla_swap_setup: add --swap-size-bytes
Currently, --swap-size does not able to specify exact file size because
the option takes parameter only in GB.
To fix the limitation, let's add --swpa-size-bytes to specify swap size
in bytes.
We need this to implement preallocate swapfile while building IaaS
image.

see scylladb/scylla-machine-image#285

Closes #9971
2022-01-31 18:32:32 +02:00
Takuya ASADA
32f2eb63ac scylla_raid_setup: use mdmonitor only when RAID level > 0
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.

Fixes #9540
2022-01-26 22:33:07 +09:00
Takuya ASADA
cd57815fff Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8"
This reverts commit 0d8f932f0b,
because RHEL developer explains this is not a bug, it's expected behavior.
(mdadm --monitor does not start when RAID level is 0)
see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936

So we should stop downgrade mdadm package and modify our script not to
enable mdmonitor.service on RAID0, use it only for RAID5.
2022-01-26 22:33:06 +09:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Valerii Ponomarov
12fa68fe67 scylla_util: return boolean calling systemd_unit.available
As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.

So, make it return boolean in both cases.

Fixes https://github.com/scylladb/scylla/issues/9848

Closes #9849
2021-12-28 15:14:04 +02:00
Takuya ASADA
6a834261fb scylla_coredump_setup: prevent coredump timeout on systemd-coredump@.service
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
  systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.

Fixes #9837

Closes #9841
2021-12-27 13:58:07 +02:00
Takuya ASADA
0d8f932f0b scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).

Fixes #9540

Closes #9782
2021-12-27 12:07:34 +02:00
Takuya ASADA
7064ae3d90 dist: fix scylla-housekeeping uuid file chmod call
Should use chmod() on a file, not fchmod()

Fixes #9683

Closes #9802
2021-12-27 11:47:06 +02:00
Takuya ASADA
6870938842 scylla_raid_setup: fix typo
Closes #9790
2021-12-14 11:15:23 +02:00
Takuya ASADA
ea20f89c56 dist: allow running scylla-housekeeping with strict umask setting
To avoid failing scylla-housekeeping in strict umask environment,
we need to chmod a+r on repository file and housekeeping.uuid.

Fixes #9683

Closes #9739
2021-12-05 20:46:46 +02:00
Takuya ASADA
097a6ee245 dist: add support im4gn/is4gen instance on AWS
Add support next-generation, storage-optimized ARM64 instance types.

Fixes #9711

Closes #9730
2021-12-05 13:20:01 +02:00
Michał Chojnowski
08f7b81b36 dist: scylla_io_setup: run iotune for supported but not preconfigured AWS instance types
Currently, for AWS instances in `is_supported_instance_class()` other than
i3* and *gd (for example: m5d), scylla_io_setup neither provides
preconfigured values for io_properties.yaml nor runs iotune nor fails.
This silently results in a broken io_properties.yaml, like so:

disks:
  - mountpoint: /var/lib/scylla

Fix that.

Closes #9660
2021-11-24 18:28:13 +02:00
Amos Kong
32e62252e1 debian/build_offline_installer.sh: config apt to keep downloaded packages
The downloaded packages might be deleted autotically after installation,
then we will provide an incomplete installer to user.

This patch changed to config apt to keep the downloaded packages before
installation.

Signed-off-by: Amos Kong <kongjianjun@gmail.com>

Closes #9592
2021-11-16 17:47:01 +02:00
Avi Kivity
a19d00ef9b dist: scylla_raid_setup: mount XFS with online discard
Online discard asks the disk to erase flash memory cells as soon
as files are deleted. This gives the disk more freedom to choose
where to place new files, so it improves performance.

On older kernel versions, and on really bad disks, this can reduce
performance so we add an option to disable it.

Since fstrim is pointless when online discard is enabled, we
don't configure it if online discard is selected.

I tested it on an AWS i3.large instance, the flag showd up in
`mount` after configuration.

Closes #9608
2021-11-15 14:16:08 +02:00
Avi Kivity
c17101604f Merge 'Revert "scylla_util.py: return bool value on systemd_unit.is_active()"' from Takuya ASADA
On scylla_unit.py, we provide `systemd_unit.is_active()` to return `systemctl is-active` output.
When we introduced systemd_unit class, we just returned `systemctl is-active` output as string, but we changed the return value to bool after that (2545d7fd43).
This was because `if unit.is_active():` always becomes True even it returns "failed" or "inactive", to avoid such scripting bug.
However, probably this was mistake.
Because systemd unit state is not 2 state, like "start" / "stop", there are many state.

And we already using multiple unit state ("activating", "failed", "inactive", "active") in our Cloud image login prompt:
https://github.com/scylladb/scylla-machine-image/blob/next/common/scylla_login#L135
After we merged 2545d7fd43, the login prompt is broken, because it does not return string as script expected (https://github.com/scylladb/scylla-machine-image/issues/241).

I think we should revert 2545d7fd43, it should return exactly same value as `systemctl is-active` says.

Fixes #9627
Fixes scylladb/scylla-machine-image#241

Closes #9628

* github.com:scylladb/scylla:
  scylla_ntp_setup: use string in systemd_unit.is_active()
  Revert "scylla_util.py: return bool value on systemd_unit.is_active()"
2021-11-15 13:56:28 +02:00
Takuya ASADA
279fabe9b4 scylla_ntp_setup: use string in systemd_unit.is_active()
Since we reverted 2545d7fd43, we need to
use string instead of bool value.
2021-11-15 19:50:31 +09:00
Takuya ASADA
d646673705 Revert "scylla_util.py: return bool value on systemd_unit.is_active()"
This reverts commit 2545d7fd43.

Fixes #9627
Fixes scylladb/scylla-machine-image#241
2021-11-15 19:50:31 +09:00
Yaron Kaikov
060a91431d dist/docker/debian/build_docker.sh: debian version fix for rc releases
When building a docker we relay on `VERSION` value from
`SCYLLA-VERSION-GEN` . For `rc` releases only there is a different
between the configured version (X.X.rcX) and the actualy debian package
we generate (X.X~rcX)

Using a similar solution as i did in dcb10374a5

Fixes: #9616

Closes #9617
2021-11-11 22:13:26 +02:00
Takuya ASADA
546e4adf9e dist/docker: configure default locale correctly
Since cqlsh requires UTF-8 locale, we should configure default locale
correctly, on both directly executed shell with docker and via SSH.
(Directly executed shell means "docker exec -ti <image> /bin/bash")

For SSH, we need to set correct parameter on /etc/default/locale, which
can set by update-locale command.
However, directly executed shell won't load this parameter, because it
configured at PAM but we skip login on this case.
To fix this issue, we also need to set locale variables on container
image configuration (ENV in Dockerfile, --env in buildah).

Fixes #9570

Closes #9587
2021-11-07 17:03:12 +02:00
Takuya ASADA
201a97e4a4 dist/docker: fix bashrc filename for Ubuntu
For Debian variants, correct filename is /etc/bash.bashrc.

Fixes #9588

Closes #9589
2021-11-07 17:01:13 +02:00
Takuya ASADA
9b4cf8c532 scylla_util.py: On is_gce(), return False when it's on GKE
GKE metadata server does not provide same metadata as GCE, we should not
return True on is_gce().
So try to fetch machine-type from metadata server, return False if it
404 not found.

Fixes #9471

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9582
2021-11-04 12:49:06 +02:00
Avi Kivity
075ceb8918 Merge 'AWS: add scylla_io_setup preset parameters for ARM instances' from Takuya ASADA
Currently, scylla-server fails to start on ARM instances because scylla_io_setup does not have preset parameters even instance type added to 'supported instance'.
To fix this, we need to add io parameter preset on scylla_io_setup.

Also, we mistakenly added EBS only instances at a004b1da30, need to remove them.
Instrances does not have ephemeral disk should be 'unsupported instance', we still run our AMI on it, but we print warning message on login prompt, and user requires to run scylla_io_setup.

Fixes #9493

Closes #9532

* github.com:scylladb/scylla:
  scylla_util.py: remove EBS only ARM instances from support instance list
  scylla_io_setup: support ARM instances on AWS
2021-11-03 10:19:59 +02:00
Takuya ASADA
4a96a8145e scylla_util.py: remove EBS only ARM instances from support instance list
Since we required ephemeral disks for our AMI, these EBS only ARM
instances cannot add in it is 'supported instance' list.
We still able to run our AMI on these instance types but login message
warns it is 'unsupported instance type', and requires to run
scylla_io_setup manually.
2021-11-03 10:26:42 +09:00
Takuya ASADA
4e8060ba72 scylla_io_setup: support ARM instances on AWS
Add preset parameters for AWS ARM intances.

Fixes #9493
2021-11-03 10:26:42 +09:00
Takuya ASADA
c9499230c3 docker: add stopwaitsecs
We need stopwaitsecs just like we do TimeoutStpSec=900 on
scylla-server.service, to avoid timeout on scylla-server shutdown.

Fixes #9485

Closes #9545
2021-10-31 20:38:10 +02:00
Takuya ASADA
13ffe3c094 scylla_util.py: detect ephemeral/EBS disks correctly on Nitro System
Currently, aws_instance.ephemeral_disks() returns both ephemeral disks
and EBS disks on Nitro System.
This is because both are attached as NVMe disks, we need to add disk
type detection code on NVMe handle logic.

Fixes #9440

Closes #9462
2021-10-28 08:58:25 +03:00
Takuya ASADA
06c28585f9 dist: raise fs.file-max and fs.nr_open to enough size for scylla
Currently, we configure LimitNOFILE on scylla-server.service, but we
don't configure fs.nr_open and fs.file-max.
When fs.nr_open or fs.file-max are smaller than LimitNOFILE, we may fail
to allocate FDs.
To fix this issue, raise fs.file-max and fs.nr_open to enogh size for
scylla.

Fixes #9461

Closes #9461
2021-10-12 12:47:35 +03:00
Takuya ASADA
3b798afc1e scylla_io_setup: handle nr_disks on GCP correctly
nr_disks is int, should not be string.

Fixes #9429

Closes #9430
2021-10-06 12:31:38 +03:00
Takuya ASADA
9c830297ac scylla_util.py: add persistent disk support for GCE
Just like EBS disks for EC2, we want to use persistent disk on GCE.
We won't recommend to use it, but still need to support it.

Related scylladb/scylla-machine-image#215

Closes #9395
2021-10-03 17:58:18 +03:00
Takuya ASADA
d87b80ad14 scylla_util.py: add persistent disk support for Azure Just like EBS disks for EC2, we want to use persistent disk on Azure. We won't recommend to use it, but still need to support it.
Related https://github.com/scylladb/scylla-machine-image/issues/218

Closes #9417
2021-10-03 17:56:31 +03:00
Takuya ASADA
cd7fe9a998 scylla_cpuscaling_setup: disable ondemand.service on Ubuntu
On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils.
This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave.
cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service.
To configure scaling_governor correctly, we have to disable this service.

Fixes #9324

Closes #9325
2021-09-29 10:32:34 +03:00
Beni Peled
e873bdbfe9 docker: fix entrypoint issue
This commit fixes [0] which is about extra (redundant) keyword adds to
the `--entrypoint` and causes scylla-server to fail to start

[0] https://github.com/scylladb/scylla-pkg/issues/2395

Closes #9350
Fixes #9355
2021-09-19 15:39:08 +03:00
Takuya ASADA
f928dced0c scylla_cpuscaling_setup: add --force option
To building Ubuntu AMI with CPU scaling configuration, we need force
running mode for scylla_cpuscaling_setup, which run setup without
checking scaling_governor support.

See scylladb/scylla-machine-image#204

Closes #9326
2021-09-13 18:45:46 +03:00
Felipe Mendes
1b8dff63c3 iotune - Fix i3en.xlarge check
i3en.xlarge is currently not getting tuned properly. A quick test using
Scylla AMI ( ami-07a31481e4394d346 ) reveals that the storage
capabilities under this instance are greatly reduced:

$ grep iops /etc/scylla.d/io_properties.yaml
  read_iops: 257024
  write_iops: 174080

This patch corrects this typo, in such a way that iotune now properly
tunes this instance type.

Closes #9298
2021-09-07 10:44:39 +03:00
Nadav Har'El
d7474ddff3 dist/docker: fix errors in README.md
The (oddly-placed) document dist/docker/debian/README.md explains how a
developer can build a Scylla docker image using a self-built Scylla
executable.

While the document begins by saying that you can "build your own
Scylla in whatever build mode you prefer, e.g., dev.", the rest of the
instructions don't fit this example mode "dev" - the second command does
"ninja dist-deb" which builds *all* modes, while the third command
forgets to pass the mode at all (and therefore defaults to "release").
The forth command doesn't work at all, and became irrelevant during
a recent rewrite in commit e96ff3d.

This patch modifies the document to fix those problems.

It ends with an example of how to run the resulting docker image
(this is usually the purpose of building a docker image - to run it
and test it). I did this example using podman because I couldn't get
it to work in docker. Later we can hopefully add the corresponding
docker example.

Fixes #9263.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210829182608.355748-1-nyh@scylladb.com>
2021-08-30 08:36:33 +03:00
Nadav Har'El
ed7106ebd7 docker: fix regression of docker image ignoring command-line arguments
Our docker image accepts various command-line arguments and translates
them into Scylla arguments. For example, Alternator's getting-started
document has the following example:

  ```
  docker run --name scylla -d -p 8000:8000 scylladb/scylla-nightly:latest
  --alternator-port=8000 --alternator-write-isolation=always```

Recently, this stopped working and the extra arguments at the end were
just ignored.

It turns out that this is a regression caused by commit
e96ff3d82d that changed our docker image
creation process from Dockerfile to buildah. While the entry point
specified in Dockerfile was a string, the same string in buildah has
a strange meaning (an entry point which can't take arguments) and to
get the original meaning, the entry point needs to be a JSON array.
This is kind-of explained in https://github.com/containers/buildah/issues/732.

So changing the entry point from a string to a JSON array fixes the
regression, and we can again pass arguments to Scylla's docker image.

Fixes #9247.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20210829180328.354109-1-nyh@scylladb.com>
2021-08-30 08:26:15 +03:00
Takuya ASADA
5b62bebbb6 scylla_io_setup: check root privilege on root mode
This is side effect of allowing to run scylla_io_setup in nonroot mode,
the script able to run in non-root user even the installation is not
nonroot mode.

Result of that, the script finally failed to write io_properties.yaml
and causes permission denied.  Since the evaluation takes long time, we
should run permission check before starting it.

We need to add root privilege check again, but skip it on nonroot mode.

Fixes #8915

Closes #8984
2021-08-22 16:49:40 +03:00
Takuya ASADA
cb19048186 docker: use dist/common/supervisor script for docker
supervisor scripts for Docker and supervisor scripts for offline
installer are almost same, drop Docker one and share same code to
deduplicate them.

Closes #9143

Fixes #9194
2021-08-16 13:36:14 +03:00
Takuya ASADA
e5bb88b69a scylla_cpuscaling_setup: change scaling_governor path
On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
does not exist even it supported CPU scaling.
Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
avaliable on both environment, so we should switch to it.

Fixes #9191

Closes #9193
2021-08-11 15:31:14 +03:00
Takuya ASADA
b822c642e5 docker: fix housekeeping --repo-files to apt repository
Even we switched to Ubuntu based container image, housekeeping still
using yum repository.
It should be switched to apt repository.

Fixes #9144

Closes #9147
2021-08-09 07:47:03 +03:00