Commit Graph

603 Commits

Author SHA1 Message Date
Takuya ASADA
48b6aec16a scripts: use "out()" function for all capture_output subprocesses
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.

related #10390

Closes #10414
2022-04-26 13:56:52 +03:00
Takuya ASADA
acaf0bb88a scripts: print perftune.py error message when capture_output=True
We currently does not able to get any error message from subprocess when we specified capture_output=True on subprocess.run().
This is because CalledProcessError does not print stdout/stderr when it raised, and we don't catch the exception, we just let python to cause Traceback.
Result of that, we only able to know exit status and failed command but
not able to get stdout/stderr.

This is problematic especially working on perftune.py bug, since the
script should caused Traceback but we never able to see it.

To resolve this, add wrapper function "out()" for capture output, and
print stdout/stderr with error message inside the function.

Fixes #10390

Closes #10391
2022-04-18 14:06:51 +03:00
Takuya ASADA
59adf05951 scylla_sysconfig_setup: avoid perse error on perftune.py --get-cpu-mask
Currently, we just passes entire output of perftune.py when getting CPU
mask from the script, but it may cause parse error since the script may
also print warning message.

To avoid that, we need to extract CPU mask from the output.

Fixes #10082

Closes #10107
2022-03-28 16:31:14 +03:00
Takuya ASADA
59c72d5d60 scylla_prepare: print Traceback with current user-friendly messages
On e1b15ba, we introduce user-friendly error message when Exception
occured while generating perftune.yaml.
However, it becomes difficult to investigate bugs since we dropped
traceback.
To resolve this problem, let's print both traceback and user-friendly
messages.

Related #10050

Closes #10140
2022-03-20 16:55:18 +02:00
Takuya ASADA
c2ccdac297 move cloud related code from scylla repository to scylla-machine-image
Currently, cloud related code have cross-dependencies between
scylla and scylla-machine-image.
It is not good way to implement, and single change can break both
package.

To resolve the issue, we need to move all cloud related code to
scylla-machine-image, and remove them from scylla repository.

Change list:
 - move cloud part of scylla_util.py to scylla-machine-image
 - move cloud part of scylla_io_setup to scylla-machine-image
 - move scylla_ec2_check to scylla-machine-image
 - move cloud part of scylla_bootparam_setup to scylla-machine-image

Closes #9957
2022-02-01 11:26:59 +02:00
Takuya ASADA
218dd3851c scylla_swap_setup: add --swap-size-bytes
Currently, --swap-size does not able to specify exact file size because
the option takes parameter only in GB.
To fix the limitation, let's add --swpa-size-bytes to specify swap size
in bytes.
We need this to implement preallocate swapfile while building IaaS
image.

see scylladb/scylla-machine-image#285

Closes #9971
2022-01-31 18:32:32 +02:00
Takuya ASADA
32f2eb63ac scylla_raid_setup: use mdmonitor only when RAID level > 0
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.

Fixes #9540
2022-01-26 22:33:07 +09:00
Takuya ASADA
cd57815fff Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8"
This reverts commit 0d8f932f0b,
because RHEL developer explains this is not a bug, it's expected behavior.
(mdadm --monitor does not start when RAID level is 0)
see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936

So we should stop downgrade mdadm package and modify our script not to
enable mdmonitor.service on RAID0, use it only for RAID5.
2022-01-26 22:33:06 +09:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Valerii Ponomarov
12fa68fe67 scylla_util: return boolean calling systemd_unit.available
As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.

So, make it return boolean in both cases.

Fixes https://github.com/scylladb/scylla/issues/9848

Closes #9849
2021-12-28 15:14:04 +02:00
Takuya ASADA
6a834261fb scylla_coredump_setup: prevent coredump timeout on systemd-coredump@.service
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
  systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.

Fixes #9837

Closes #9841
2021-12-27 13:58:07 +02:00
Takuya ASADA
0d8f932f0b scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).

Fixes #9540

Closes #9782
2021-12-27 12:07:34 +02:00
Takuya ASADA
7064ae3d90 dist: fix scylla-housekeeping uuid file chmod call
Should use chmod() on a file, not fchmod()

Fixes #9683

Closes #9802
2021-12-27 11:47:06 +02:00
Takuya ASADA
6870938842 scylla_raid_setup: fix typo
Closes #9790
2021-12-14 11:15:23 +02:00
Takuya ASADA
ea20f89c56 dist: allow running scylla-housekeeping with strict umask setting
To avoid failing scylla-housekeeping in strict umask environment,
we need to chmod a+r on repository file and housekeeping.uuid.

Fixes #9683

Closes #9739
2021-12-05 20:46:46 +02:00
Takuya ASADA
097a6ee245 dist: add support im4gn/is4gen instance on AWS
Add support next-generation, storage-optimized ARM64 instance types.

Fixes #9711

Closes #9730
2021-12-05 13:20:01 +02:00
Michał Chojnowski
08f7b81b36 dist: scylla_io_setup: run iotune for supported but not preconfigured AWS instance types
Currently, for AWS instances in `is_supported_instance_class()` other than
i3* and *gd (for example: m5d), scylla_io_setup neither provides
preconfigured values for io_properties.yaml nor runs iotune nor fails.
This silently results in a broken io_properties.yaml, like so:

disks:
  - mountpoint: /var/lib/scylla

Fix that.

Closes #9660
2021-11-24 18:28:13 +02:00
Avi Kivity
a19d00ef9b dist: scylla_raid_setup: mount XFS with online discard
Online discard asks the disk to erase flash memory cells as soon
as files are deleted. This gives the disk more freedom to choose
where to place new files, so it improves performance.

On older kernel versions, and on really bad disks, this can reduce
performance so we add an option to disable it.

Since fstrim is pointless when online discard is enabled, we
don't configure it if online discard is selected.

I tested it on an AWS i3.large instance, the flag showd up in
`mount` after configuration.

Closes #9608
2021-11-15 14:16:08 +02:00
Takuya ASADA
279fabe9b4 scylla_ntp_setup: use string in systemd_unit.is_active()
Since we reverted 2545d7fd43, we need to
use string instead of bool value.
2021-11-15 19:50:31 +09:00
Takuya ASADA
d646673705 Revert "scylla_util.py: return bool value on systemd_unit.is_active()"
This reverts commit 2545d7fd43.

Fixes #9627
Fixes scylladb/scylla-machine-image#241
2021-11-15 19:50:31 +09:00
Takuya ASADA
9b4cf8c532 scylla_util.py: On is_gce(), return False when it's on GKE
GKE metadata server does not provide same metadata as GCE, we should not
return True on is_gce().
So try to fetch machine-type from metadata server, return False if it
404 not found.

Fixes #9471

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9582
2021-11-04 12:49:06 +02:00
Avi Kivity
075ceb8918 Merge 'AWS: add scylla_io_setup preset parameters for ARM instances' from Takuya ASADA
Currently, scylla-server fails to start on ARM instances because scylla_io_setup does not have preset parameters even instance type added to 'supported instance'.
To fix this, we need to add io parameter preset on scylla_io_setup.

Also, we mistakenly added EBS only instances at a004b1da30, need to remove them.
Instrances does not have ephemeral disk should be 'unsupported instance', we still run our AMI on it, but we print warning message on login prompt, and user requires to run scylla_io_setup.

Fixes #9493

Closes #9532

* github.com:scylladb/scylla:
  scylla_util.py: remove EBS only ARM instances from support instance list
  scylla_io_setup: support ARM instances on AWS
2021-11-03 10:19:59 +02:00
Takuya ASADA
4a96a8145e scylla_util.py: remove EBS only ARM instances from support instance list
Since we required ephemeral disks for our AMI, these EBS only ARM
instances cannot add in it is 'supported instance' list.
We still able to run our AMI on these instance types but login message
warns it is 'unsupported instance type', and requires to run
scylla_io_setup manually.
2021-11-03 10:26:42 +09:00
Takuya ASADA
4e8060ba72 scylla_io_setup: support ARM instances on AWS
Add preset parameters for AWS ARM intances.

Fixes #9493
2021-11-03 10:26:42 +09:00
Takuya ASADA
13ffe3c094 scylla_util.py: detect ephemeral/EBS disks correctly on Nitro System
Currently, aws_instance.ephemeral_disks() returns both ephemeral disks
and EBS disks on Nitro System.
This is because both are attached as NVMe disks, we need to add disk
type detection code on NVMe handle logic.

Fixes #9440

Closes #9462
2021-10-28 08:58:25 +03:00
Takuya ASADA
3b798afc1e scylla_io_setup: handle nr_disks on GCP correctly
nr_disks is int, should not be string.

Fixes #9429

Closes #9430
2021-10-06 12:31:38 +03:00
Takuya ASADA
9c830297ac scylla_util.py: add persistent disk support for GCE
Just like EBS disks for EC2, we want to use persistent disk on GCE.
We won't recommend to use it, but still need to support it.

Related scylladb/scylla-machine-image#215

Closes #9395
2021-10-03 17:58:18 +03:00
Takuya ASADA
d87b80ad14 scylla_util.py: add persistent disk support for Azure Just like EBS disks for EC2, we want to use persistent disk on Azure. We won't recommend to use it, but still need to support it.
Related https://github.com/scylladb/scylla-machine-image/issues/218

Closes #9417
2021-10-03 17:56:31 +03:00
Takuya ASADA
cd7fe9a998 scylla_cpuscaling_setup: disable ondemand.service on Ubuntu
On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils.
This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave.
cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service.
To configure scaling_governor correctly, we have to disable this service.

Fixes #9324

Closes #9325
2021-09-29 10:32:34 +03:00
Takuya ASADA
f928dced0c scylla_cpuscaling_setup: add --force option
To building Ubuntu AMI with CPU scaling configuration, we need force
running mode for scylla_cpuscaling_setup, which run setup without
checking scaling_governor support.

See scylladb/scylla-machine-image#204

Closes #9326
2021-09-13 18:45:46 +03:00
Felipe Mendes
1b8dff63c3 iotune - Fix i3en.xlarge check
i3en.xlarge is currently not getting tuned properly. A quick test using
Scylla AMI ( ami-07a31481e4394d346 ) reveals that the storage
capabilities under this instance are greatly reduced:

$ grep iops /etc/scylla.d/io_properties.yaml
  read_iops: 257024
  write_iops: 174080

This patch corrects this typo, in such a way that iotune now properly
tunes this instance type.

Closes #9298
2021-09-07 10:44:39 +03:00
Takuya ASADA
5b62bebbb6 scylla_io_setup: check root privilege on root mode
This is side effect of allowing to run scylla_io_setup in nonroot mode,
the script able to run in non-root user even the installation is not
nonroot mode.

Result of that, the script finally failed to write io_properties.yaml
and causes permission denied.  Since the evaluation takes long time, we
should run permission check before starting it.

We need to add root privilege check again, but skip it on nonroot mode.

Fixes #8915

Closes #8984
2021-08-22 16:49:40 +03:00
Takuya ASADA
e5bb88b69a scylla_cpuscaling_setup: change scaling_governor path
On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
does not exist even it supported CPU scaling.
Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
avaliable on both environment, so we should switch to it.

Fixes #9191

Closes #9193
2021-08-11 15:31:14 +03:00
Nadav Har'El
9662de85f5 Merge 'Azure snitch support' from Pekka Enberg
This add support for Azure snitch. The work is an adaptation of
AzureSnitch for Apache Cassandra by Yoshua Wakeham:

https://raw.githubusercontent.com/yoshw/cassandra/9387-trunk/src/java/org/apache/cassandra/locator/AzureSnitch.java

Also change `production_snitch_base` to protect against
a snitch implementation setting DC and rack to an empty string,
which Lubos' says can happen on Azure.

Fixes #8593

Closes #9084

* github.com:scylladb/scylla:
  scylla_util: Use AzureSnitch on Azure
  production_snitch_base: Fallback for empty DC or rack strings
  azure_snitch: Azure snitch support
2021-08-03 22:52:05 +03:00
Eduardo Benzecri
f196a4131a scylla_setup: Fix outdated message
Message changed according to what 'scylla_bootparam_setup' currently does
(set a clock source at boot time) instead of of what it used to do in
the past (setting huge pages).

Closes #9116.
2021-08-02 16:04:38 +03:00
Pekka Enberg
ef5b2934e8 scylla_util: Use AzureSnitch on Azure
Fixes #8593
2021-07-28 14:07:42 +03:00
Takuya ASADA
42fd73d033 scylla_setup: add RAID5 support
This supports optional RAID5 support on scylla_setup.

Fixes #9076

Closes #9093
2021-07-27 12:49:29 +03:00
Yaron Kaikov
a004b1da30 scylla_util:add AWS arm based instance to supported list
Today we have a Scylla AMI image based on x86 archituctre only.
Following the work we did in https://github.com/scylladb/scylla-machine-image/pull/153 we can build
ARM based AMI image

Let's add ARM based instance to supported list

Closes #9064
2021-07-22 15:48:29 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Yaron Kaikov
6a447db8a8 scylla_util.py: Fix Azure support for machine-image
In https://github.com/scylladb/scylla/pull/7807 we added support for
Azure instance in Scylla.

The following changes are required in order machine-image to work:
1) fix wrong metadata URL and updating metadata path values (was
   intreduce in
f627fcbb0c)
2) fix function naming which been used my machine image
3) add missing function which are reuqired by mahcine-image
4) cleanup unused functions

Closes #8596
2021-06-06 09:21:23 +03:00
Lubos Kosco
777771df34 scylla_util.py: Relax GCE setup NVMe device checks
We don't want to fail I/O setup if there are more than one NVMe devices
mounted as root nor if there are no NVMe devices.

Fixes #8032

Closes #8444
2021-06-06 09:21:23 +03:00
Yaron Kaikov
dd453ffe6a install.sh: Setup aio-max-nr upon installation
This is a follow up change to #8512.

Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla

As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.

Closes #8650
2021-05-24 14:24:20 +03:00
Takuya ASADA
3d307919c3 scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem
Currently, var-lib-scylla.mount may fails because it can start before
MDRAID volume initialized.
We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for
device become available, but systemd manual says it automatically
configure dependency for mount unit when we specify filesystem path by
"absolute path of a device node".

So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>.

Fixes #8279

Closes #8681
2021-05-24 14:24:08 +03:00
Yaron Kaikov
588a065304 scylla_io_setup: configure "aio-max-nr" before iotune
On severl instance types in AWS and Azure, we get the following failure
during scylla_io_setup process:
```
ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async
I/O: Resource temporarily unavailable. The most common cause is not
enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that
number or reducing the amount of logical CPUs available for your
application
```

We have scylla_prepare:configure_io_slots() running before the
scylla-server.service start, but the scylla_io_setup is taking place
before

1) Let's move configure_io_slots() to scylla_util.py since both
   scylla_io_setup and scylla_prepare are import functions from it
2) cleanup scylla_prepare since we don't need the same function twice
3) Let's use configure_io_slots() during scylla_io_setup to avoid such
failure

Fixes: #8587

Closes #8512
2021-05-11 18:39:10 +03:00
Avi Kivity
6977064693 dist: scylla_raid_setup: reduce xfs block size to 1k
Since Linux 5.12 [1], XFS is able to to asynchronously overwrite
sub-block ranges without stalling. However, we want good performance
on older Linux versions, so this patch reduces the block size to the
minimum possible.

That turns out to be 1024 for crc-protected filesystems (which we want)
and it can also not be smaller than the sector size. So we fetch the
sector size and set the block size to that if it is larger than 512.
Most SSDs have a sector size of 512, so this isn't a problem.

Tested on AWS i3.large.

Fixes #8156.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed1128c2d0c87e5ff49c40f5529f06bc35f4251b

Closes #8585
2021-05-05 16:07:50 +03:00
Lubos Kosco
c26bcf29f9 scylla_io_setup: add disk properties for L Azure instances 2021-05-04 13:13:05 +02:00
Lubos Kosco
f627fcbb0c scylla_util.py: add new class for Azure cloud support 2021-05-04 13:12:42 +02:00
Takuya ASADA
c9324634ca scylla_raid_setup: enabling mdmonitor.service on Debian variants
On Debian variants, mdmonitor.service cannnot enable because it missing
[Install] section, so 'systemctl enable mdmonitor.service' will fail,
not able to run mdmonitor after the system restarted.

To force running the service, add Wants=mdmonitor.service on
var-lib-scylla.mount.

Fixes #8494

Closes #8530
2021-04-28 11:32:27 +03:00
Peter Veentjer
c255903fb0 dist: Added r5b to ena instance_class.
The r5b instances also have ena support. For a confirmation
that all r5b instances have ena, go to the following page:

https://instances.vantage.sh/

Select the r5b and add the 'enhanced networking' column. Then
it will show that for every r5b type there is ena support

Closes #8546
2021-04-27 15:39:24 +03:00
Pekka Enberg
0ddbed2513 dist: Add support for disabling writeback cache
This adds support for disabling writeback cache by adding a new
DISABLE_WRITEBACK_CACHE option to "scylla-server" sysconfig file, which
makes the "scylla_prepare" script (that is run before Scylla starts up)
call perftune.py with appropriate parameters. Also add a
"--disable-writeback-cache" option to "scylla_sysconfig_setup", which
can be called by scylla-machine image scripts, for example.

Refs: #7341
Tests: dtest (next-gating)

Closes #8526
2021-04-22 11:24:49 +03:00