Commit Graph

625 Commits

Author SHA1 Message Date
Takuya ASADA
9acdd3af23 dist: drop deprecated AMI parameters on setup scripts
Since we moved all IaaS code to scylla-machine-image, we nolonger need
AMI variable on sysconfig file or --ami parameter on setup scripts,
and also never used /etc/scylla/ami_disabled.
So let's drop all of them from Scylla core core.

Related with scylladb/scylla-machine-image#61

Closes #12043
2022-11-23 17:56:13 +02:00
Takuya ASADA
acc408c976 scylla_setup: fix incorrect type definition on --online-discard option
--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.

We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.

Fixes #11700

Closes #11831
2022-11-08 08:40:44 +02:00
Takuya ASADA
464b5de99b scylla_setup: allow symlink to --disks option
Currently, --disks options does not allow symlinks such as
/dev/disk/by-uuid/* or /dev/disk/azure/*.

To allow using them, is_unused_disk() should resolve symlink to
realpath, before evaluating the disk path.

Fixes #11634

Closes #11646
2022-10-28 07:24:11 +03:00
Takuya ASADA
cd6030d5df scylla_util.py: adding unescape for sysconfig_parser
Even we have __escape() for escaping " middle of the value to writing
sysconfig file, we didn't unescape for reading from sysconfig file.
So adding __unescape() and call it on get().
2022-10-27 16:39:47 +09:00
Takuya ASADA
de57433bcf scylla_util.py: on sysconfig_parser, don't use double quote when it's possible
It seems like distribution original sysconfig files does not use double
quote to set the parameter when the value does not contain space.
Adding function to detect spaces in the value, don't usedouble quote
when it not detected.

Fixes #9149
2022-10-27 16:36:27 +09:00
Takuya ASADA
a938b009ca scylla_raid_setup: run uuidpath existance check only after mount failed
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".

However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.

To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.

Fixes #11617

Closes #11666
2022-10-25 08:54:21 +03:00
Vlad Zolotarov
8195dab92a scylla_prepare: correctly handle a former 'MQ' mode
Fixes a regression introduced in 80917a1054:
"scylla_prepare: stop generating 'mode' value in perftune.yaml"

When cpuset.conf contains a "full" CPU set the negation of it from
the "full" CPU set is going to generate a zero mask as a irq_cpu_mask.
This is an illegal value that will eventually end up in the generated
perftune.yaml, which in line will make the scylla service fail to start
until the issue is resolved.

In such a case a irq_cpu_mask must represent a "full" CPU set mimicking
a former 'MQ' mode.

Fixes #11701
Tested:
 - Manually on a 2 vCPU VM in an 'auto-selection' mode.
 - Manually on a large VM (48 vCPUs) with an 'MQ' manually
   enforced.
Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>
2022-10-06 17:43:37 +03:00
Avi Kivity
372eadf542 Merge "perftune related improvements in scylla_* scripts" from Vlad Zolotarov
"
This series adds a long waited transition of our auto-generation
code to irq_cpu_mask instead of 'mode' in perftune.yaml.

And then it fixes a regression in scylla_prepare perftune.yaml
auto-generation logic.
"

* 'scylla_prepare_fix_regression-v1' of https://github.com/vladzcloudius/scylla:
  scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
  scylla_prepare: stop generating 'mode' value in perftune.yaml
2022-10-02 13:25:13 +03:00
Takuya ASADA
8835a34ab6 scylla_raid_setup: prevent mount failed for /var/lib/scylla
Just like 4a8ed4c, we also need to wait for udev event completion to
create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the
disk just after formatting.

Fixes #11359
2022-08-27 03:27:44 +09:00
Takuya ASADA
40134efee4 scylla_raid_setup: check uuid and device path are valid
Added code to check make sure uuid and uuid based device path are valid.
2022-08-27 03:08:31 +09:00
Vlad Zolotarov
c538cc2372 scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.

And a call for scylla_cpuset_setup is such an explicit request.

The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.

And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.

The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.

This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.

Fixes #11385
Fixes #10121
2022-08-25 13:03:02 -04:00
Vlad Zolotarov
80917a1054 scylla_prepare: stop generating 'mode' value in perftune.yaml
Modern perftune.py supports a more generic way of defining IRQ CPUs:
'irq_cpu_mask'.

This patch makes our auto-generation code create a perftune.yaml
that uses this new parameter instead of using outdated 'mode'.

As a side effect, this change eliminates the notion of "incorrect"
value in cpuset.conf - every value is valid now as long as it fits into
the 'all' CPU set of the specific machine.

Auto-generated 'irq_cpu_mask' is going to include all bits from 'all'
CPU mask except those defined in cpuset.conf.

Fixes #9903
2022-08-25 13:02:57 -04:00
Takuya ASADA
ce87e15ecf scylla_prepare: fix Exception when SET_NIC_AND_DISKS=no and SET_CLOCKSOURCE=yes
We shouldn't call get_tune_mode() when NIC tuning is disabled.

fixes #10412

Closes #10959
2022-07-05 14:52:52 +03:00
Takuya ASADA
7501465b7c scylla_util.py: change debug log directory to /var/tmp/scylla
Current debug log is bit difficult to collect in CI, to find the debug log
we must know which script caused Exception.
Because the filename does not include prefix, and also specified
directory is shared with other programs.

To make things more easily, let's change debug log directory to /var/tmp/scylla.

Closes #10730
2022-07-05 14:49:00 +03:00
Takuya ASADA
3a51e7820a scylla_cpuset_setup: stop deleting perftune.yaml and skip update cpuset.conf when same parameter specified
To make scylla setup scripts easier to handle in Ansible, stop deleting
perftune.yaml and detect cpuset.conf changes by mtime of the file.
Also, skip update cpuset.conf when same parameter specified.

Fixes #10121

Closes #10312
2022-06-23 10:28:36 +03:00
Israel Fruchter
d2ca2455db scripts/scylla_util.py: introduce back user/group arguments for out()
since #10467 remove the user/group parameters needed for the housekeeping
call, need to introuce them back

Fixes: #10804

Closes #10818
2022-06-16 13:50:17 +03:00
Takuya ASADA
5643c6de56 scylla_util.py: fix "systemctl is-active" causes error
On 48b6aec16a we mistakenly allowed
check=True on systemd_unit.is_active(), it should be check=False.
We check unit's status by "systemctl is-active" output string,
it returns "active" or "inactive".
But systemctl command returns non-zero status when it returning
"inactive", so we are getting Exception here.
To fix this, we need new option "ignore_error=True" for out(),
and use it in systemd_unit.is_active().

Fixes #10455

Closes #10467
2022-06-13 13:45:50 +03:00
Takuya ASADA
ad2344a864 scylla_coredump_setup: support new format of Storage field
Storage field of "coredumpctl info" changed at systemd-v248, it added
"(present)" on the end of line when coredump file available.

Fixes #10669

Closes #10714
2022-06-07 02:21:32 +03:00
Takuya ASADA
b6003989f9 scylla_setup: stop using sudo -u, use user/group parameter on subprocess module
To run scylla-housekeeping we currently use "sudo -u scylla <cmd>" to switch
scylla user, but it fails on some environment.
Since recent version of Python 3 supports to switch user on subprocess module,
let's use python native way and drop sudo.

Fixes #10483

Closes #10538
2022-05-19 17:21:35 +03:00
Takuya ASADA
883b97d8b2 dist/common/scripts: generate debug log when exception occurred
Using traceback_with_variables module, generate more detail traceback
with variables into debug log.
This will help fixing bugs which is hard to reproduce.

Closes #10472

[avi: regenerate frozen toolchain]
2022-05-17 13:18:27 +03:00
Takuya ASADA
00ce34c29b scylla_prepare: describe error more correctly
Currently our error message on scylla_prepare says "Exception occurred
while creating perftune.yaml", even perftune.yaml is already generated,
and error occurred after that.
To describe error more correctly, add another error message after
perftune.yaml generated.

see scylladb/scylla-enterprise#2201

Closes #10575
2022-05-16 20:05:58 +03:00
Takuya ASADA
a9dfe5a8f4 scylla_sysconfig_setup: handle >=32CPUs correctly
Seems like 59adf05 has a bug, the regex pattern only handles first
32CPUs cpuset pattern, and ignores rest.
We should extend regex pattern to handle all CPUs.

Fixes #10523

Closes #10524
2022-05-11 14:46:30 +02:00
Takuya ASADA
48b6aec16a scripts: use "out()" function for all capture_output subprocesses
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.

related #10390

Closes #10414
2022-04-26 13:56:52 +03:00
Takuya ASADA
acaf0bb88a scripts: print perftune.py error message when capture_output=True
We currently does not able to get any error message from subprocess when we specified capture_output=True on subprocess.run().
This is because CalledProcessError does not print stdout/stderr when it raised, and we don't catch the exception, we just let python to cause Traceback.
Result of that, we only able to know exit status and failed command but
not able to get stdout/stderr.

This is problematic especially working on perftune.py bug, since the
script should caused Traceback but we never able to see it.

To resolve this, add wrapper function "out()" for capture output, and
print stdout/stderr with error message inside the function.

Fixes #10390

Closes #10391
2022-04-18 14:06:51 +03:00
Takuya ASADA
59adf05951 scylla_sysconfig_setup: avoid perse error on perftune.py --get-cpu-mask
Currently, we just passes entire output of perftune.py when getting CPU
mask from the script, but it may cause parse error since the script may
also print warning message.

To avoid that, we need to extract CPU mask from the output.

Fixes #10082

Closes #10107
2022-03-28 16:31:14 +03:00
Takuya ASADA
59c72d5d60 scylla_prepare: print Traceback with current user-friendly messages
On e1b15ba, we introduce user-friendly error message when Exception
occured while generating perftune.yaml.
However, it becomes difficult to investigate bugs since we dropped
traceback.
To resolve this problem, let's print both traceback and user-friendly
messages.

Related #10050

Closes #10140
2022-03-20 16:55:18 +02:00
Takuya ASADA
c2ccdac297 move cloud related code from scylla repository to scylla-machine-image
Currently, cloud related code have cross-dependencies between
scylla and scylla-machine-image.
It is not good way to implement, and single change can break both
package.

To resolve the issue, we need to move all cloud related code to
scylla-machine-image, and remove them from scylla repository.

Change list:
 - move cloud part of scylla_util.py to scylla-machine-image
 - move cloud part of scylla_io_setup to scylla-machine-image
 - move scylla_ec2_check to scylla-machine-image
 - move cloud part of scylla_bootparam_setup to scylla-machine-image

Closes #9957
2022-02-01 11:26:59 +02:00
Takuya ASADA
218dd3851c scylla_swap_setup: add --swap-size-bytes
Currently, --swap-size does not able to specify exact file size because
the option takes parameter only in GB.
To fix the limitation, let's add --swpa-size-bytes to specify swap size
in bytes.
We need this to implement preallocate swapfile while building IaaS
image.

see scylladb/scylla-machine-image#285

Closes #9971
2022-01-31 18:32:32 +02:00
Takuya ASADA
32f2eb63ac scylla_raid_setup: use mdmonitor only when RAID level > 0
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.

Fixes #9540
2022-01-26 22:33:07 +09:00
Takuya ASADA
cd57815fff Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8"
This reverts commit 0d8f932f0b,
because RHEL developer explains this is not a bug, it's expected behavior.
(mdadm --monitor does not start when RAID level is 0)
see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936

So we should stop downgrade mdadm package and modify our script not to
enable mdmonitor.service on RAID0, use it only for RAID5.
2022-01-26 22:33:06 +09:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Valerii Ponomarov
12fa68fe67 scylla_util: return boolean calling systemd_unit.available
As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.

So, make it return boolean in both cases.

Fixes https://github.com/scylladb/scylla/issues/9848

Closes #9849
2021-12-28 15:14:04 +02:00
Takuya ASADA
6a834261fb scylla_coredump_setup: prevent coredump timeout on systemd-coredump@.service
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
  systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.

Fixes #9837

Closes #9841
2021-12-27 13:58:07 +02:00
Takuya ASADA
0d8f932f0b scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).

Fixes #9540

Closes #9782
2021-12-27 12:07:34 +02:00
Takuya ASADA
7064ae3d90 dist: fix scylla-housekeeping uuid file chmod call
Should use chmod() on a file, not fchmod()

Fixes #9683

Closes #9802
2021-12-27 11:47:06 +02:00
Takuya ASADA
6870938842 scylla_raid_setup: fix typo
Closes #9790
2021-12-14 11:15:23 +02:00
Takuya ASADA
ea20f89c56 dist: allow running scylla-housekeeping with strict umask setting
To avoid failing scylla-housekeeping in strict umask environment,
we need to chmod a+r on repository file and housekeeping.uuid.

Fixes #9683

Closes #9739
2021-12-05 20:46:46 +02:00
Takuya ASADA
097a6ee245 dist: add support im4gn/is4gen instance on AWS
Add support next-generation, storage-optimized ARM64 instance types.

Fixes #9711

Closes #9730
2021-12-05 13:20:01 +02:00
Michał Chojnowski
08f7b81b36 dist: scylla_io_setup: run iotune for supported but not preconfigured AWS instance types
Currently, for AWS instances in `is_supported_instance_class()` other than
i3* and *gd (for example: m5d), scylla_io_setup neither provides
preconfigured values for io_properties.yaml nor runs iotune nor fails.
This silently results in a broken io_properties.yaml, like so:

disks:
  - mountpoint: /var/lib/scylla

Fix that.

Closes #9660
2021-11-24 18:28:13 +02:00
Avi Kivity
a19d00ef9b dist: scylla_raid_setup: mount XFS with online discard
Online discard asks the disk to erase flash memory cells as soon
as files are deleted. This gives the disk more freedom to choose
where to place new files, so it improves performance.

On older kernel versions, and on really bad disks, this can reduce
performance so we add an option to disable it.

Since fstrim is pointless when online discard is enabled, we
don't configure it if online discard is selected.

I tested it on an AWS i3.large instance, the flag showd up in
`mount` after configuration.

Closes #9608
2021-11-15 14:16:08 +02:00
Takuya ASADA
279fabe9b4 scylla_ntp_setup: use string in systemd_unit.is_active()
Since we reverted 2545d7fd43, we need to
use string instead of bool value.
2021-11-15 19:50:31 +09:00
Takuya ASADA
d646673705 Revert "scylla_util.py: return bool value on systemd_unit.is_active()"
This reverts commit 2545d7fd43.

Fixes #9627
Fixes scylladb/scylla-machine-image#241
2021-11-15 19:50:31 +09:00
Takuya ASADA
9b4cf8c532 scylla_util.py: On is_gce(), return False when it's on GKE
GKE metadata server does not provide same metadata as GCE, we should not
return True on is_gce().
So try to fetch machine-type from metadata server, return False if it
404 not found.

Fixes #9471

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9582
2021-11-04 12:49:06 +02:00
Avi Kivity
075ceb8918 Merge 'AWS: add scylla_io_setup preset parameters for ARM instances' from Takuya ASADA
Currently, scylla-server fails to start on ARM instances because scylla_io_setup does not have preset parameters even instance type added to 'supported instance'.
To fix this, we need to add io parameter preset on scylla_io_setup.

Also, we mistakenly added EBS only instances at a004b1da30, need to remove them.
Instrances does not have ephemeral disk should be 'unsupported instance', we still run our AMI on it, but we print warning message on login prompt, and user requires to run scylla_io_setup.

Fixes #9493

Closes #9532

* github.com:scylladb/scylla:
  scylla_util.py: remove EBS only ARM instances from support instance list
  scylla_io_setup: support ARM instances on AWS
2021-11-03 10:19:59 +02:00
Takuya ASADA
4a96a8145e scylla_util.py: remove EBS only ARM instances from support instance list
Since we required ephemeral disks for our AMI, these EBS only ARM
instances cannot add in it is 'supported instance' list.
We still able to run our AMI on these instance types but login message
warns it is 'unsupported instance type', and requires to run
scylla_io_setup manually.
2021-11-03 10:26:42 +09:00
Takuya ASADA
4e8060ba72 scylla_io_setup: support ARM instances on AWS
Add preset parameters for AWS ARM intances.

Fixes #9493
2021-11-03 10:26:42 +09:00
Takuya ASADA
13ffe3c094 scylla_util.py: detect ephemeral/EBS disks correctly on Nitro System
Currently, aws_instance.ephemeral_disks() returns both ephemeral disks
and EBS disks on Nitro System.
This is because both are attached as NVMe disks, we need to add disk
type detection code on NVMe handle logic.

Fixes #9440

Closes #9462
2021-10-28 08:58:25 +03:00
Takuya ASADA
3b798afc1e scylla_io_setup: handle nr_disks on GCP correctly
nr_disks is int, should not be string.

Fixes #9429

Closes #9430
2021-10-06 12:31:38 +03:00
Takuya ASADA
9c830297ac scylla_util.py: add persistent disk support for GCE
Just like EBS disks for EC2, we want to use persistent disk on GCE.
We won't recommend to use it, but still need to support it.

Related scylladb/scylla-machine-image#215

Closes #9395
2021-10-03 17:58:18 +03:00
Takuya ASADA
d87b80ad14 scylla_util.py: add persistent disk support for Azure Just like EBS disks for EC2, we want to use persistent disk on Azure. We won't recommend to use it, but still need to support it.
Related https://github.com/scylladb/scylla-machine-image/issues/218

Closes #9417
2021-10-03 17:56:31 +03:00