Commit Graph

560 Commits

Author SHA1 Message Date
Takuya ASADA
53b0aaa4e8 scylla_raid_setup: revert workaround patch and stop using mdmonitor
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should revert workaround patch which downgrades mdadm,
and stop enabling mdmonitor since we always use RAID0.

See #9540

Closes #10077
2022-02-17 18:36:42 +02:00
Avi Kivity
8f45f65b09 Merge 'scylla_raid_setup: use mdmonitor only when RAID level > 0' from Takuya ASADA
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.

Fixes #9540

----

This reverts 0d8f932 and introduce correct fix.

Closes #9970

* github.com:scylladb/scylla:
  scylla_raid_setup: use mdmonitor only when RAID level > 0
  Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8"

(cherry picked from commit df22396a34)
2022-01-27 10:24:53 +02:00
Takuya ASADA
b56b9f5ed5 scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).

Fixes #9540

Closes #9782

(cherry picked from commit 0d8f932f0b)
2021-12-28 11:38:24 +02:00
Takuya ASADA
f7e5339c14 scylla_io_setup: support ARM instances on AWS
Add preset parameters for AWS ARM intances.

Fixes #9493

(cherry picked from commit 4e8060ba72)
2021-11-15 13:35:15 +02:00
Takuya ASADA
50ce5bef2c scylla_util.py: On is_gce(), return False when it's on GKE
GKE metadata server does not provide same metadata as GCE, we should not
return True on is_gce().
So try to fetch machine-type from metadata server, return False if it
404 not found.

Fixes #9471

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9582

(cherry picked from commit 9b4cf8c532)
2021-11-15 13:18:02 +02:00
Takuya ASADA
766e16f19e scylla_io_setup: handle nr_disks on GCP correctly
nr_disks is int, should not be string.

Fixes #9429

Closes #9430

(cherry picked from commit 3b798afc1e)
2021-11-15 13:06:30 +02:00
Takuya ASADA
f5f5b9a307 scylla_raid_setup: enabling mdmonitor.service on Debian variants
On Debian variants, mdmonitor.service cannnot enable because it missing
[Install] section, so 'systemctl enable mdmonitor.service' will fail,
not able to run mdmonitor after the system restarted.

To force running the service, add Wants=mdmonitor.service on
var-lib-scylla.mount.

Fixes #8494

Closes #8530

(cherry picked from commit c9324634ca)
2021-10-12 13:58:49 +03:00
Takuya ASADA
18b8388958 scylla_cpuscaling_setup: add --force option
To building Ubuntu AMI with CPU scaling configuration, we need force
running mode for scylla_cpuscaling_setup, which run setup without
checking scaling_governor support.

See scylladb/scylla-machine-image#204

Closes #9326

(cherry picked from commit f928dced0c)
2021-10-05 16:20:10 +03:00
Takuya ASADA
56b24818ec scylla_cpuscaling_setup: disable ondemand.service on Ubuntu
On Ubuntu, scaling_governor becomes powersave after rebooted, even we configured cpufrequtils.
This is because ondemand.service, it unconditionally change scaling_governor to ondemand or powersave.
cpufrequtils will start before ondemand.service, scaling_governor overwrite by ondemand.service.
To configure scaling_governor correctly, we have to disable this service.

Fixes #9324

Closes #9325

(cherry picked from commit cd7fe9a998)
2021-10-03 14:08:22 +03:00
Takuya ASADA
9338f6b6b8 scylla_cpuscaling_setup: change scaling_governor path
On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
does not exist even it supported CPU scaling.
Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
avaliable on both environment, so we should switch to it.

Fixes #9191

Closes #9193

(cherry picked from commit e5bb88b69a)
2021-08-12 12:10:04 +03:00
Takuya ASADA
d92a26636a dist: add DefaultDependencies=no to .mount units
To avoid ordering cycle error on Ubuntu, add DefaultDependencies=no
on .mount units.

Fixes #8482

Closes #8495

(cherry picked from commit 0b01e1a167)
2021-05-31 12:11:51 +03:00
Yaron Kaikov
7445bfec86 install.sh: Setup aio-max-nr upon installation
This is a follow up change to #8512.

Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla

As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.

Closes #8650

Refs: #8713

(cherry picked from commit dd453ffe6a)
2021-05-27 14:12:19 +03:00
Yaron Kaikov
b077b198bf scylla_io_setup: configure "aio-max-nr" before iotune
On severl instance types in AWS and Azure, we get the following failure
during scylla_io_setup process:
```
ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async
I/O: Resource temporarily unavailable. The most common cause is not
enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that
number or reducing the amount of logical CPUs available for your
application
```

We have scylla_prepare:configure_io_slots() running before the
scylla-server.service start, but the scylla_io_setup is taking place
before

1) Let's move configure_io_slots() to scylla_util.py since both
   scylla_io_setup and scylla_prepare are import functions from it
2) cleanup scylla_prepare since we don't need the same function twice
3) Let's use configure_io_slots() during scylla_io_setup to avoid such
failure

Fixes: #8587

Closes #8512

Refs: #8713

(cherry picked from commit 588a065304)
2021-05-27 14:11:17 +03:00
Takuya ASADA
b81919dbe2 scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem
Currently, var-lib-scylla.mount may fails because it can start before
MDRAID volume initialized.
We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for
device become available, but systemd manual says it automatically
configure dependency for mount unit when we specify filesystem path by
"absolute path of a device node".

So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>.

Fixes #8279

Closes #8681

(cherry picked from commit 3d307919c3)
2021-05-24 17:23:56 +03:00
Takuya ASADA
6f678ab7ff aws: initialize self._disks['ebs'] when no EBS disks
Seems like aws_instance.ebs_disks() causes traceback when no EBS disks
available, need to initialize with empty list.

Fixes #8365

Closes #8366
2021-03-29 17:21:14 +03:00
Takuya ASADA
d9a625c842 scylla_setup: don't run node-exporter setup when it's not installed
We need to run package existance check before run setup of
node-exporter.

Fixes #8276

Closes #8278
2021-03-18 11:24:18 +01:00
Takuya ASADA
e8cfd5114f scylla_coredump_setup: support SLES
SLES requires to install systemd-coredump package and enable
systemd-coredump.socket to use systemd-coredump.
2021-03-15 19:19:56 +09:00
Takuya ASADA
13871ff1f8 scylla_setup: use rpm to check package availability for SLES
Use rpm to check scylla packages installed on SLES.
2021-03-15 19:18:44 +09:00
Takuya ASADA
e3b5ffcf14 dist: install optional packages for SLES
Support SUSE original package manager 'zypper' for pkg_install()
function.
2021-03-15 19:17:48 +09:00
Takuya ASADA
af8eae317b scylla_coredump_setup: avoid coredump failure when hard limit of coredump is set to zero
On the environment hard limit of coredump is set to zero, coredump test
script will fail since the system does not generate coredump.
To avoid such issue, set ulimit -c 0 before generating SEGV on the script.

Note that scylla-server.service can generate coredump even ulimit -c 0
because we set LimitCORE=infinity on its systemd unit file.

Fixes #8238

Closes #8245
2021-03-10 19:28:10 +02:00
Takuya ASADA
2d9feaacea scylla_raid_setup: don't abort using raiddev when array_state is 'clear'
On Ubuntu 20.04 AMI, scylla_raid_setup --raiddev /dev/md0 causes
'/dev/md0 is already using' (issue #7627).
So we merged the patch to find free mdX (587b909).

However, look into /proc/mdstat of the AMI, it actually says no active md device available:

ubuntu@ip-10-0-0-43:~$ cat /proc/mdstat
Personalities :
unused devices: <none>

We currently decide mdX is used when os.path.exists('/sys/block/mdX/md/array_state') == True,
but according to kernel doc, the file may available even array is STOPPED:

    clear

        No devices, no size, no level
        Writing is equivalent to STOP_ARRAY ioctl
https://www.kernel.org/doc/html/v4.15/admin-guide/md.html

So we should also check array_state != 'clear', not just array_state
existance.

Fixes #8219

Closes #8220
2021-03-07 18:30:11 +02:00
Takuya ASADA
53c7600da8 dist: increase fs.aio-max-nr value for other apps
Current fs.aio-max-nr value cpu_count() * 11026 is exact size of scylla
uses, if other apps on the environment also try to use aio, aio slot
will be run out.
So increase value +65536 for other apps.

Related #8133

Closes #8228
2021-03-07 12:11:36 +02:00
Takuya ASADA
870c3a28c1 scylla_setup: strip spaces of comma separated list
On RAID prompt, we can type disk list something like this:
 /dev/sda1,/dev/sdb1,/dev/sdc1,/dev/sdd1

However, if the list has spaces in the list, it doesn't work:
 /dev/sda1, /dev/sdb1, /dev/sdc1, /dev/sdd1

Because the script mistakenly recognize the space part of a device path.
So we need strip() the input for each item.

Fixes #8174

Closes #8190
2021-03-02 12:48:18 +02:00
Takuya ASADA
d0297c599a dist: tune fs.aio-max-nr based on the number of cpus
Current aio-max-nr is set up statically to 1048576 in
/etc/sysctl.d/99-scylla-aio.conf.
This is sufficient for most use cases, but falls short on larger machines
such as i3en.24xlarge on AWS that has 96 vCPUs.

We need to tune the parameter based on the number of cpus, instead of
static setting.

Fixes #8133

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #8188
2021-03-01 14:18:24 +02:00
Takuya ASADA
4cf9b6988e scylla_coredump_setup: don't run apt-get when systemd-coredump is already installed
Check systemd-coredump existance before running apt-get install
systemd-coredump.

Closes #8185
2021-03-01 09:38:51 +02:00
Takuya ASADA
f3a82f4685 scylla_setup: allow running scylla_setup with strict umask setting
We currently deny running scylla_setup when umask != 0022.
To remove this limitation, run os.chmod(0o644) on every file creation
to allow reading from scylla user.

Note that perftune.yaml is not really needed to set 0644 since perftune.py is
running in root user, but setting it to align permission with other files.

Fixes #8049

Closes #8119
2021-02-25 16:42:45 +02:00
Takuya ASADA
32d4ec6b8a scylla_util.py: resolve /dev/root to get actual device on aws
When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly
reports root partition is part of ephemeral disks, and RAID construction will
fail.
This prevents the error and reports correct free disks.

Fixes #8055

Closes #8040
2021-02-18 20:25:45 +02:00
Shlomi Livne
718976e794 scylla_io_setup did not configure pre tuned gce instances correctly
scylla_io_setup condition for nr_disks was using the bitwise operator
(&) instead of logical and operator (and) causing the io_properties
files to have incorrect values

Fixes #7341

Reviewed-by: Lubos Kosco <lubos@scylladb.com>
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>

Closes #8019
2021-02-11 11:06:00 +02:00
Benny Halevy
55e3df8a72 dist: scylla_util: prevent IndexError when no ephemeral_disks were found
Currently we call firstNvmeSize before checking that we have enough
(at least 1) ephemeral disks.  When none are found, we hit the following
error (see #7971):
```
File "/opt/scylladb/scripts/libexec/scylla_io_setup", line 239, in
if idata.is_recommended_instance():
File "/opt/scylladb/scripts/scylla_util.py", line 311, in is_recommended_instance
diskSize = self.firstNvmeSize
File "/opt/scylladb/scripts/scylla_util.py", line 291, in firstNvmeSize
firstDisk = ephemeral_disks[0]
IndexError: list index out of range
```

This change reverses the order and first checks that we found
enough disks before getting the fist disk size.

Fixes #7971

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #8027
2021-02-03 11:30:18 +02:00
Takuya ASADA
5d527bd17e scylla_ntp_setup: use chrony on all distributions
To simplify scylla_ntp_setup, use chrony on all distributions.

Closes #7922
2021-01-24 11:45:58 +02:00
Takuya ASADA
7a74f8cd2e dist: use sysconfig_parser to parse gentoo config file
Use sysconfig_parser instead of regex, to improve code readability.
2021-01-13 21:34:23 +09:00
Takuya ASADA
2a4d293841 dist: add package name translation
Translate package name from CentOS package to different distribution
package name, to use single package name for pkg_install().
2021-01-13 21:27:14 +09:00
Takuya ASADA
0a9843842d dist: support SLES/OpenSUSE
Add support SLES/OpenSUSE on setup script.
2021-01-13 19:32:46 +09:00
Takuya ASADA
e8f74e800c dist: show warning on unsupported distributions
Add warning message on unsupported distributions, for
scylla_cpuscaling_setup and scylla_ntp_setup.
2021-01-13 19:32:45 +09:00
Takuya ASADA
2f344cf50d dist: drop Ubuntu 14.04 code
We don't support Ubuntu 14.04 anymore, drop them
2021-01-13 19:32:45 +09:00
Takuya ASADA
8e59f70080 dist: move back is_amzn2() to scylla_util.py
Distribution detection functions should be placed same place,
so move back it to scylla_util.py
2021-01-13 19:32:45 +09:00
Takuya ASADA
921b1676c0 dist: rename is_gentoo_variant() to is_gentoo()
is_redhat_variant() is the function to detect RHEL/CentOS/Fedora/OEL,
and is_debian_variant() is the function to detect Debian/Ubuntu.
Unlike these functions, is_gentoo_variant() does not detect "Gentoo variants",
we should rename it to is_gentoo().
2021-01-13 19:32:45 +09:00
Takuya ASADA
fffa8f5ded dist: support Arch Linux
Add support Arch Linux on setup script.
2021-01-13 19:32:45 +09:00
Takuya ASADA
0d11f9463d dist: make sysconfig directory detectable
Currently, install.sh provide a way to customize sysconfig directory,
but sysconfig directory is hardcoded on script.
Also, /etc/sysconfig seems correct to use default value, but current
code specify /etc/default as non-redhat distributions.

Instead of hardcoding, generate generate python script in install.sh
to save specified sysconfig directory path in python code.
2021-01-13 19:32:45 +09:00
Amos Kong
9adc6f68ee scylla_setup: cleanup if judgments
This patch merged two nested if judgments.

Signed-off-by: Amos Kong <amos@scylladb.com>
2020-12-26 04:45:25 +08:00
Amos Kong
632b01ce4e scylla_setup: enable node_exporter for offline installation
node_exporter had been added to scylla-server package by commit
95197a09c9.

So we can enable it by default for offline installation.

Signed-off-by: Amos Kong <amos@scylladb.com>
2020-12-25 10:54:31 +08:00
Takuya ASADA
95197a09c9 dist: add node_exporter to scylla-server package
To connection-less environment, we need to add node_exporter binary
to scylla-server package, not downloading it from internet.

Related #7765
Fixes #2190

Closes #7796
2020-12-24 11:44:13 +02:00
Aleksandr Bykov
e74dc311e7 dist: scylla_util: fix aws_instance.ebs_disks method
aws_instance.ebs_disks() method should return ebs disk
instead of ephemeral

Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com>

Closes #7780
2020-12-13 17:33:37 +02:00
Avi Kivity
461c9826de Merge 'scylla_setup: fix wrong command suggestion' from Takuya ASADA
scylla_setup command suggestion does not shows an argument of --io-setup,
because we mistakely stores bool value on it (recognized as 'store_true').
We always need to print '--io-setup X' on the suggestion instead.

Also, --nic is currently ignored by command suggestion, need to print it just like other options.

Related #7395

Closes #7724

* github.com:scylladb/scylla:
  scylla_setup: print --swap-directory and --swap-size on command suggestion
  scylla_setup: print --nic on command suggestion
  scylla_setup: fix wrong command suggestion on --io-setup scylla_setup command suggestion does not shows an argument of --io-setup, because we mistakely stores bool value on it (recognized as 'store_true'). We always need to print '--io-setup X' on the suggestion instead.
2020-12-08 13:58:55 +02:00
Lubos Kosco
a0b1474bba scylla_util.py: Increase disk to ram ratio for GCP
Increase accepted disk-to-RAM ratio to 105 to accomodate even 7.5GB of
RAM for one NVMe log various reasons for not recommending the instance
type.

Fixes #7587

Closes #7600
2020-12-08 11:20:30 +02:00
Takuya ASADA
c3abba1913 scylla_setup: print --swap-directory and --swap-size on command suggestion
We need to print --swap-directory and --swap-size on command suggestion just like other options.

Related #7395
2020-12-07 18:40:59 +09:00
Takuya ASADA
582a3ffb2f scylla_setup: print --nic on command suggestion
We need to print --nic on command suggestion just like other options.

Related #7395
2020-12-07 18:40:59 +09:00
Avi Kivity
2fd895a367 Merge 'dist/common/scripts/scylla_setup: Optionally config rsyslog destination' from Amnon Heiman
This patch adds an option to scylla_setup to configure an rsyslog destination.

The monitoring stack has an option to get information from rsyslog it
requires that rsyslog on the scylla machines will send the trace line to
it.

The configuration will be in a Scylla configuration file, so it is safe to run it multiple times.

Fixes #7589

Signed-off-by: Amnon Heiman <amnon@scylladb.com>

Closes #7634

* github.com:scylladb/scylla:
  dist/common/scripts/scylla_setup: Optionally config rsyslog destination
  Adding dist/common/scripts/scylla_rsyslog_setup utility
2020-12-01 13:12:32 +02:00
Amnon Heiman
4036cecdea dist/common/scripts/scylla_setup: Optionally config rsyslog destination
This patch adds an option to scylla_setup to configure an rsyslog
destination.

The monitoring stack has an option to get information from rsyslog, it
requires that rsyslog on the scylla machines will send the trace line to
it.

If the /etc/rsyslog.d/ directory exists (that means the current system
runs rsyslog) it will ask if to add rsyslog configuration and if yes, it
would run scylla_rsyslog_setup.

Fixes #7589

Signed-off-by: Amnon Heiman <amnon@scylladb.com>
2020-12-01 12:33:37 +02:00
Takuya ASADA
572d6b2a4e scylla_setup: fix wrong command suggestion on --io-setup
scylla_setup command suggestion does not shows an argument of --io-setup,
because we mistakely stores bool value on it (recognized as 'store_true').
We always need to print '--io-setup X' on the suggestion instead.

Related #7395
2020-12-01 07:23:55 +09:00