Commit Graph

516 Commits

Author SHA1 Message Date
Takuya ASADA
e4b42e622e scylla_coredump_setup: avoid coredump failure when hard limit of coredump is set to zero
On the environment hard limit of coredump is set to zero, coredump test
script will fail since the system does not generate coredump.
To avoid such issue, set ulimit -c 0 before generating SEGV on the script.

Note that scylla-server.service can generate coredump even ulimit -c 0
because we set LimitCORE=infinity on its systemd unit file.

Fixes #8238

Closes #8245

(cherry picked from commit af8eae317b)
2021-06-13 18:27:13 +03:00
Yaron Kaikov
76ec7513f1 install.sh: Setup aio-max-nr upon installation
This is a follow up change to #8512.

Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla

As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.

Closes #8650

Refs: #8713

(cherry picked from commit dd453ffe6a)
2021-06-13 14:15:24 +03:00
Yaron Kaikov
f36f7035c8 scylla_io_setup: configure "aio-max-nr" before iotune
On severl instance types in AWS and Azure, we get the following failure
during scylla_io_setup process:
```
ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async
I/O: Resource temporarily unavailable. The most common cause is not
enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that
number or reducing the amount of logical CPUs available for your
application
```

We have scylla_prepare:configure_io_slots() running before the
scylla-server.service start, but the scylla_io_setup is taking place
before

1) Let's move configure_io_slots() to scylla_util.py since both
   scylla_io_setup and scylla_prepare are import functions from it
2) cleanup scylla_prepare since we don't need the same function twice
3) Let's use configure_io_slots() during scylla_io_setup to avoid such
failure

Fixes: #8587

Closes #8512

Refs: #8713

(cherry picked from commit 588a065304)
2021-06-13 14:14:59 +03:00
Takuya ASADA
e1c993fc13 scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem
Currently, var-lib-scylla.mount may fails because it can start before
MDRAID volume initialized.
We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for
device become available, but systemd manual says it automatically
configure dependency for mount unit when we specify filesystem path by
"absolute path of a device node".

So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>.

Fixes #8279

Closes #8681

(cherry picked from commit 3d307919c3)
2021-05-24 17:24:18 +03:00
Takuya ASADA
beb2bcb8bd dist: increase fs.aio-max-nr value for other apps
Current fs.aio-max-nr value cpu_count() * 11026 is exact size of scylla
uses, if other apps on the environment also try to use aio, aio slot
will be run out.
So increase value +65536 for other apps.

Related #8133

Closes #8228

(cherry picked from commit 53c7600da8)
2021-04-25 16:16:31 +03:00
Takuya ASADA
8255b7984d dist: tune fs.aio-max-nr based on the number of cpus
Current aio-max-nr is set up statically to 1048576 in
/etc/sysctl.d/99-scylla-aio.conf.
This is sufficient for most use cases, but falls short on larger machines
such as i3en.24xlarge on AWS that has 96 vCPUs.

We need to tune the parameter based on the number of cpus, instead of
static setting.

Fixes #8133

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #8188

(cherry picked from commit d0297c599a)
2021-04-25 16:16:26 +03:00
Takuya ASADA
c7781f8c9e node_exporter_install: fix bad owner of node_exporter
node_exporter files as installed with weird ownership, 3434:3434.
This is because upstream node_exporter tar.gz contain the owner
information, but we should overwrite it to valid one.

Fixes #6222

Closes #8379
2021-04-01 17:34:19 +03:00
Benny Halevy
2898e98733 dist: scylla_util: prevent IndexError when no ephemeral_disks were found
Currently we call firstNvmeSize before checking that we have enough
(at least 1) ephemeral disks.  When none are found, we hit the following
error (see #7971):
```
File "/opt/scylladb/scripts/libexec/scylla_io_setup", line 239, in
if idata.is_recommended_instance():
File "/opt/scylladb/scripts/scylla_util.py", line 311, in is_recommended_instance
diskSize = self.firstNvmeSize
File "/opt/scylladb/scripts/scylla_util.py", line 291, in firstNvmeSize
firstDisk = ephemeral_disks[0]
IndexError: list index out of range
```

This change reverses the order and first checks that we found
enough disks before getting the fist disk size.

Fixes #7971

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes #8027

(cherry picked from commit 55e3df8a72)
2021-03-21 12:20:19 +02:00
Takuya ASADA
f316e1db54 scylla_util.py: resolve /dev/root to get actual device on aws
When psutil.disk_paritions() reports / is /dev/root, aws_instance mistakenly
reports root partition is part of ephemeral disks, and RAID construction will
fail.
This prevents the error and reports correct free disks.

Fixes #8055

Closes #8040

(cherry picked from commit 32d4ec6b8a)
2021-02-21 16:23:21 +02:00
Shlomi Livne
da2c5fd549 scylla_io_setup did not configure pre tuned gce instances correctly
scylla_io_setup condition for nr_disks was using the bitwise operator
(&) instead of logical and operator (and) causing the io_properties
files to have incorrect values

Fixes #7341

Reviewed-by: Lubos Kosco <lubos@scylladb.com>
Signed-off-by: Shlomi Livne <shlomi@scylladb.com>

Closes #8019

(cherry picked from commit 718976e794)
2021-02-14 13:10:19 +02:00
Evgeniy Naydanov
47b121130a scylla_raid_setup: try /dev/md[0-9] if no --raiddev provided
If scylla_raid_setup script called without --raiddev argument
then try to use any of /dev/md[0-9] devices instead of only
one /dev/md0.  Do it in this way because on Ubuntu 20.04
/dev/md0 used by OS already.

Closes #7628

(cherry picked from commit 587b909c5c)

Fixes #7627.
2021-01-03 16:46:16 +02:00
Takuya ASADA
15f55141ec scylla_raid_setup: use sysfs to detect existing RAID volume
We may not able to detect existing RAID volume by device file existance,
we should use sysfs instead to make sure it's running.

Fixes #7383

Closes #7399

(cherry picked from commit fc1c4f2261)
2021-01-03 16:45:36 +02:00
Aleksandr Bykov
50c01f7331 dist: scylla_util: fix aws_instance.ebs_disks method
aws_instance.ebs_disks() method should return ebs disk
instead of ephemeral

Signed-off-by: Aleksandr Bykov <alex.bykov@scylladb.com>

Closes #7780

(cherry picked from commit e74dc311e7)
2020-12-16 11:58:19 +02:00
Lubos Kosco
3b617164dc scylla_util.py: Increase disk to ram ratio for GCP
Increase accepted disk-to-RAM ratio to 105 to accomodate even 7.5GB of
RAM for one NVMe log various reasons for not recommending the instance
type.

Fixes #7587

Closes #7600

(cherry picked from commit a0b1474bba)
2020-12-09 09:38:01 +02:00
Takuya ASADA
d966e2d500 install.sh: set PATH for relocatable CLI tools in python thunk
We currently set PATH for relocatable CLI tools in scylla_util.run() and
scylla_util.out(), but it doesn't work for perftune.py, since it's not part of
Scylla, does not use scylla_util module.
We can set PATH in python thunk instead, it can set PATH for all python scripts.

Fixes #7350

(cherry picked from commit 5867af4edd)
2020-11-29 11:52:24 +02:00
Lubos Kosco
542a7d28a3 scylla_util.py: fix metadata gcp call for disks to get details
disk parsing expects output from recursive listing of GCP
metadata REST call, the method used to do it by default,
but now it requires a boolean flag to run in recursive mode

Fixes #7684

Closes #7685

(cherry picked from commit 4d0587ed11)
2020-11-29 11:39:23 +02:00
Lubos Kosco
708588bf8b scylla_util.py: properly parse GCP instances without size
fixes #7577

Closes #7592

(cherry picked from commit 5c488b6e9a)
2020-11-16 14:02:36 +02:00
Bentsi Magidovich
4896ce0fd4 scylla_util.py: fix exception handling in curl
Retry mechanism didn't work when URLError happend. For example:

  urllib.error.URLError: <urlopen error [Errno 101] Network is unreachable>

Let's catch URLError instead of HTTP since URLError is a base exception
for all exceptions in the urllib module.

Fixes: #7569

Closes #7567

(cherry picked from commit 956b97b2a8)
2020-11-11 14:18:05 +02:00
Avi Kivity
9d84b1f13d Merge 'improvements for GCE image' from Bentsi
when logging in to the GCE instance that is created from the GCE image it takes 10 seconds to understand that we are not running on AWS. Also, some unnecessary debug logging messages are printed:
```
bentsi@bentsi-G3-3590:~/devel/scylladb$ ssh -i ~/.ssh/scylla-qa-ec2 bentsi@35.196.8.86
Warning: Permanently added '35.196.8.86' (ECDSA) to the list of known hosts.
Last login: Sun Nov  1 22:14:57 2020 from 108.128.125.4

   _____            _ _       _____  ____
  / ____|          | | |     |  __ \|  _ \
 | (___   ___ _   _| | | __ _| |  | | |_) |
  \___ \ / __| | | | | |/ _` | |  | |  _ <
  ____) | (__| |_| | | | (_| | |__| | |_) |
 |_____/ \___|\__, |_|_|\__,_|_____/|____/
               __/ |
              |___/

Version:
       666.development-0.20201101.6be9f4938
Nodetool:
	nodetool help
CQL Shell:
	cqlsh
More documentation available at:
	http://www.scylladb.com/doc/
By default, Scylla sends certain information about this node to a data collection server. For information, see http://www.scylladb.com/privacy/

WARNING:root:Failed to grab http://169.254.169.254/latest/...
WARNING:root:Failed to grab http://169.254.169.254/latest/...
    Initial image configuration failed!

To see status, run
 'systemctl status scylla-image-setup'

[bentsi@artifacts-gce-image-jenkins-db-node-aa57409d-0-1 ~]$

```
this PR fixes this

Closes #7523

* github.com:scylladb/scylla:
  scylla_util.py: remove unnecessary logging
  scylla_util.py: make is_aws_instance faster
  scylla_util.py: added ability to control sleep time between retries in curl()

(cherry picked from commit 7a3376907e)
2020-11-11 14:17:59 +02:00
Takuya ASADA
fe2d6765f9 node_exporter_install: upgrade to latest release
We currently uses outdated version of node_exporter, let's upgrade to latest
version.

Fixes #7427
2020-10-25 13:59:14 +02:00
Takuya ASADA
d5ff82dc61 scylla_setup: skip iotune when developer_mode is enabled
When developer mode automatically enabled on nonroot mode, we should skip
iotune since the parameter won't be used.

Closes #7327
2020-10-12 11:08:10 +03:00
Bentsi Magidovich
7be252e929 dist: fix incorrect AWS user-data url
we used http://169.254.169.254/latest/meta-data/user-data
but correct one http://169.254.169.254/latest/user-data
Fixes: https://github.com/scylladb/scylla-machine-image/issues/63

Closes #7388
2020-10-11 18:20:54 +03:00
Takuya ASADA
0f786f05fe install.sh: logging to scylla-server.log when journalctl --user does not work
On some environment such as CentOS8, journalctl --user -xe does not work
since journald is running in volatile mode.
The issue cannnot fix in non-root mode, as a workaround we should logging to
a file instead of journal.

Also added scylla_logrotate to ExecStartPre which rename previous log file,
since StandardOutput=file:/path/to/file will erase existing file when service
restarted.

Fixes #7131

Closes #7326
2020-10-05 13:17:27 +03:00
Takuya ASADA
d611d74905 dist/common/scripts/scylla_setup: force developer mode on nonroot when NOFILE is too low
On Ubuntu 16/18 and Debian 9, LimitNOFILE is set to 4096 and not able to override from
user unit.
To run scylla-server in such environment, we need to turn on developer mode and show
warnings.

Fixes #7133

Closes #7323
2020-10-04 10:16:30 +03:00
Takuya ASADA
8504332e17 scylla_setup: skip offline warnings on nonroot mode
Since most of the scripts requires root privilege, we don't shows up offline
warning on nonroot mode.

Fixes #7286

Closes #7287
2020-09-29 13:30:13 +03:00
Pekka Enberg
1adf2cc848 Revert "scylla_ntp_setup: use chrony on all distributions"
This reverts commit 8366d2231d because it
causes the following "scylla_setup" failure on Ubuntu 16.04:

  Command: 'sudo /usr/lib/scylla/scylla_setup --nic ens5 --disks /dev/nvme0n1  --swap-directory / '
  Exit code: 1
  Stdout:
  Setting up libtomcrypt0:amd64 (1.17-7ubuntu0.1) ...
  Setting up chrony (2.1.1-1ubuntu0.1) ...
  Creating '_chrony' system user/group for the chronyd daemon…
  Creating config file /etc/chrony/chrony.conf with new version
  Processing triggers for libc-bin (2.23-0ubuntu11.2) ...
  Processing triggers for ureadahead (0.100.0-19.1) ...
  Processing triggers for systemd (229-4ubuntu21.29) ...
  501 Not authorised
  NTP setup failed.
  Stderr:
  chrony.service is not a native service, redirecting to systemd-sysv-install
  Executing /lib/systemd/systemd-sysv-install enable chrony
  Traceback (most recent call last):
  File "/opt/scylladb/scripts/libexec/scylla_ntp_setup", line 63, in <module>
  run('chronyc makestep')
  File "/opt/scylladb/scripts/scylla_util.py", line 504, in run
  return subprocess.run(cmd, stdout=stdout, stderr=stderr, shell=shell, check=exception, env=scylla_env).returncode
  File "/opt/scylladb/python3/lib64/python3.8/subprocess.py", line 512, in run
  raise CalledProcessError(retcode, process.args,
  subprocess.CalledProcessError: Command '['chronyc', 'makestep']' returned non-zero exit status 1.
2020-09-29 11:23:23 +03:00
Takuya ASADA
8366d2231d scylla_ntp_setup: use chrony on all distributions
To simplify scylla_ntp_setup, use chrony on all distributions.
2020-09-27 12:30:02 +03:00
Takuya ASADA
eae2aa58fa dist/common/scripts: move back get_set_nic_and_disks_config_value to scylla_util.py
The function mistakenly moved to scylla_sysconfig_setup but it also referenced
from scylla_prepare, move back to scylla_util.py

Fixes #7276

Closes #7280
2020-09-25 13:05:43 +03:00
Pekka Enberg
9a19c028e4 scylla_kernel_check: Switch to os.mkdirs() function
Commit 8e1f7d4fc7 ("dist/common/scripts: drop makedirs(), use
os.makedirs()") dropped the "mkdirs()" function, but forgot to convert
the caller in scylla_kernel_check to os.mkdirs().

Message-Id: <20200923104510.230244-1-penberg@scylladb.com>
2020-09-23 13:54:43 +03:00
Takuya ASADA
f243ccfa89 scylla_cpuscaling_setup: add Install section for scylla-cpupower.service
Install section requires for enable/disable services.

Fixes #7230
2020-09-22 10:18:01 +02:00
Takuya ASADA
48223022f7 scylla_util.py: de-duplicate code on parse_scylla_dirs_with_default() and get_scylla_dirs()
Seems like parse_scylla_dirs_with_default() and get_scylla_dirs() shares most of
the code, de-duplicate it.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2020-09-20 00:50:37 +09:00
Takuya ASADA
f8321bc66a scylla_util.py: remove rmtree() and redhat_version() since these are unused 2020-09-20 00:50:05 +09:00
Takuya ASADA
8e1f7d4fc7 dist/common/scripts: drop makedirs(), use os.makedirs()
Since os.makedirs() has exist_ok option, no need to create wrapper function.
2020-09-20 00:48:06 +09:00
Takuya ASADA
85f76e80b4 dist/common/scripts: drop hex2list.py since it's nolonger used
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2020-09-20 00:45:25 +09:00
Takuya ASADA
0f5c83f73d dist/common/scripts: drop is_systemd() since we nolonger support non-systemd environment 2020-09-20 00:45:02 +09:00
Takuya ASADA
82701dc5ed dist/common/scripts: drop dist_name() and dist_ver()
It can be replaced with distro.name() and distro.version().

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2020-09-20 00:42:27 +09:00
Takuya ASADA
79d8192dc7 dist/common/scripts: move functions that are only called from single file
scylla_util.py is a library for common functions across setup scripts,
it should not include private function of single file.
So move all those functions to caller file.

Signed-off-by: Takuya ASADA <syuu@scylladb.com>
2020-09-20 00:42:19 +09:00
Avi Kivity
12ca7f7ace Merge 'dist/common/scripts: skip internet access on offline installation' from Takuya ASADA
We need to skip internet access on offline installation.
To do this we need following changes:
 - prevent running yum/apt for each script
 - set default "NO" for scripts it requires package installation
 - set default "NO" for scripts it requires internet access, such as NTP

See #7153

Closes #7224

* syuu1228-offline_setup:
  dist/common/scripts: skip internet access on offline installation
  scylla_ntp_setup: use shutil.witch() to lookup command
2020-09-17 12:14:20 +03:00
Bentsi Magidovich
12e018ad04 scylla_util.py: added GCE instance/image support 2020-09-16 11:32:57 +03:00
Bentsi Magidovich
5aef44fc82 scylla_util.py: make max_retries as a curl function argument 2020-09-16 11:25:01 +03:00
Bentsi Magidovich
1f31e976cc scylla_util.py: adding timeout to curl function 2020-09-16 11:25:01 +03:00
Bentsi Magidovich
f5d97afaa2 scylla_util.py: styling fixes to curl function
- rename deprecated logging.warn to logging.warning
- remove redundant round brackets in the if statement
2020-09-16 11:25:01 +03:00
Bentsi Magidovich
a24ec2686f scylla_util.py: change default value for headers argument in curl function
- It was set to {} that is incorrect and can lead to unexpected behavior
https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments
- Order of the arguments changed to more convinient way
2020-09-16 11:25:01 +03:00
Takuya ASADA
db9e6f50f3 dist/common/scripts: skip internet access on offline installation
We need to skip internet access on offline installation.
To do this we need following changes:
 - prevent running yum/apt for each script
 - set default "NO" for scripts it requires package installation
 - set default "NO" for scripts it requires internet access, such as NTP

See #7153
Fixes #7182
2020-09-16 10:05:20 +09:00
Takuya ASADA
ca8f0ff588 scylla_ntp_setup: use shutil.witch() to lookup command
The command installed directory may different between distributions,
we can abstract the difference using shutil.witch().
Also the script become simpler than passing full path to os.path.exists().
2020-09-16 10:04:23 +09:00
Takuya ASADA
5f541fbdc5 scylla_setup: drop hugepages package installation
hugepages and libhugetlbfs-bin packages is only required for DPDK mode,
and unconditionally installation causes error on offline mode, so drop it.

Fixes #7182
2020-09-14 17:05:09 +03:00
Takuya ASADA
e1b15ba09e dist/common/scripts: abort scylla_prepare with better error message
When configuration files for perftune contain invalid parameter, scylla_prepare
may cause traceback because error handling is not eough.

Throw all errors from create_perftune_conf(), catch them on scylla_prepare,
print user understandable error.

Fixes #6847
2020-09-08 23:39:34 +03:00
Takuya ASADA
4deb245198 scylla_ntp_setup: don't install ntpd package when it's already exists
Don't install ntpd package when it's already exists.

Related with #7153
2020-09-08 13:59:04 +03:00
Takuya ASADA
cb221ac393 scylla_setup: verify package correctly on offline install
do_verify_package written only for .rpm/.deb, does not working correctly
for offline install(including nonroot).
We should check file existance for the environment, not by package existance
using rpm/dpkg.

Fixes #7075
2020-08-24 20:10:36 +09:00
Takuya ASADA
c71e5f244a scylla_util.py: implement is_offline() to detect offline installed image
Like is_nonroot(), detect offline installed image using install.sh.
2020-08-24 20:10:36 +09:00