Commit Graph

1383 Commits

Author SHA1 Message Date
Takuya ASADA
5b62bebbb6 scylla_io_setup: check root privilege on root mode
This is side effect of allowing to run scylla_io_setup in nonroot mode,
the script able to run in non-root user even the installation is not
nonroot mode.

Result of that, the script finally failed to write io_properties.yaml
and causes permission denied.  Since the evaluation takes long time, we
should run permission check before starting it.

We need to add root privilege check again, but skip it on nonroot mode.

Fixes #8915

Closes #8984
2021-08-22 16:49:40 +03:00
Takuya ASADA
cb19048186 docker: use dist/common/supervisor script for docker
supervisor scripts for Docker and supervisor scripts for offline
installer are almost same, drop Docker one and share same code to
deduplicate them.

Closes #9143

Fixes #9194
2021-08-16 13:36:14 +03:00
Takuya ASADA
e5bb88b69a scylla_cpuscaling_setup: change scaling_governor path
On some environment /sys/devices/system/cpu/cpufreq/policy0/scaling_governor
does not exist even it supported CPU scaling.
Instead, /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor is
avaliable on both environment, so we should switch to it.

Fixes #9191

Closes #9193
2021-08-11 15:31:14 +03:00
Takuya ASADA
b822c642e5 docker: fix housekeeping --repo-files to apt repository
Even we switched to Ubuntu based container image, housekeeping still
using yum repository.
It should be switched to apt repository.

Fixes #9144

Closes #9147
2021-08-09 07:47:03 +03:00
Nadav Har'El
9662de85f5 Merge 'Azure snitch support' from Pekka Enberg
This add support for Azure snitch. The work is an adaptation of
AzureSnitch for Apache Cassandra by Yoshua Wakeham:

https://raw.githubusercontent.com/yoshw/cassandra/9387-trunk/src/java/org/apache/cassandra/locator/AzureSnitch.java

Also change `production_snitch_base` to protect against
a snitch implementation setting DC and rack to an empty string,
which Lubos' says can happen on Azure.

Fixes #8593

Closes #9084

* github.com:scylladb/scylla:
  scylla_util: Use AzureSnitch on Azure
  production_snitch_base: Fallback for empty DC or rack strings
  azure_snitch: Azure snitch support
2021-08-03 22:52:05 +03:00
Eduardo Benzecri
f196a4131a scylla_setup: Fix outdated message
Message changed according to what 'scylla_bootparam_setup' currently does
(set a clock source at boot time) instead of of what it used to do in
the past (setting huge pages).

Closes #9116.
2021-08-02 16:04:38 +03:00
Takuya ASADA
3ecdd15777 dist/debian: keep sysconfdir.conf for scylla-housekeeping on 'remove'
Same as 4309785, dpkg does not re-install confffiles when it removed by
user, we are missing sysconfdir.conf for scylla-housekeeping on rollback.
To prevent this, we need to stop removing drop-in file directory on
'remove'.

Fixes #9109

Closes #9110
2021-07-29 12:32:21 +03:00
Pekka Enberg
ef5b2934e8 scylla_util: Use AzureSnitch on Azure
Fixes #8593
2021-07-28 14:07:42 +03:00
Takuya ASADA
fdc786b451 install.sh: add supervisor support
Bring supervisor support from dist/docker to install.sh, make it
installable from relocatable package.
This enables to use supervisor with nonroot / offline environment,
and also make relocatable package able to run in Docker environment.

Related #8849

Closes #8918
2021-07-27 12:51:29 +03:00
Takuya ASADA
42fd73d033 scylla_setup: add RAID5 support
This supports optional RAID5 support on scylla_setup.

Fixes #9076

Closes #9093
2021-07-27 12:49:29 +03:00
Yaron Kaikov
a004b1da30 scylla_util:add AWS arm based instance to supported list
Today we have a Scylla AMI image based on x86 archituctre only.
Following the work we did in https://github.com/scylladb/scylla-machine-image/pull/153 we can build
ARM based AMI image

Let's add ARM based instance to supported list

Closes #9064
2021-07-22 15:48:29 +03:00
Avi Kivity
2cfc517874 main, test: adjust number of networking iocbs
Seastar's default limit of 10,000 iocbs per shard is too low for
some workload (it places an upper bound on the number of idle
connections, above which a crash occurs). Use the new Seastar
feature to raise the default to 50000.

Also multiply the global reservation by 5, and round it upwards
so the number is less weird. This prevents io_setup() from failing.

For tests, the reservation is reduced since they don't create large
numbers of connections. This reduces surprise test failures when they
are run on machines that haven't been adjusted.

Fixes #9051

Closes #9052
2021-07-18 14:38:44 +03:00
Yaron Kaikov
aa7c40ba50 dist: build docker based on ubuntu 20.04 OS
Today our docker image is based on Centos7 ,Since centos will be EOL in
2024 and no longer has stable release stream. let's move our docker image to be based on ubuntu 20.04

Based on the work done in https://github.com/scylladb/scylla/pull/8730,
let's build our docker image based on local packages using buildah

Closes #8849
2021-07-12 13:32:03 +03:00
Takuya ASADA
def81807aa scylla-fstrim.timer: drop BindsTo=scylla-server.service
To avoid restart scylla-server.service unexpectedly, drop BindsTo=
from scylla-fstrim.timer.

Fixes #8921

Closes #8973
2021-07-07 17:36:24 +03:00
Takuya ASADA
f19ebe5709 dist/redhat: fix systemd unit name of scylla-node-exporter
systemd unit name of scylla-node-exporter is
scylla-node-exporter.service, not node-exporter.service.

Fixes #8966

Closes #8967
2021-07-05 18:06:51 +03:00
Takuya ASADA
f71f9786c7 dist: stop removing /etc/systemd/system/*.mount on package uninstall
Listing /etc/systemd/system/*.mount as ghost file seems incorrect,
since user may want to keep using RAID volume / coredump directory after
uninstalling Scylla, or user may want to upgrade enterprise version.

Also, we mixed two types of files as ghost file, it should handle differently:
 1. automatically generated by postinst scriptlet
 2. generated by user invoked scylla_setup

The package should remove only 1, since 2 is generated by user decision.

However, just dropping .mount from %files section causes another
problem, rpm will remove these files during upgrade, instead of
uninstall (#8924).

To fix both problem, specify .mount files as "%ghost %config".
It will keep files both package upgrade and package remove.

See scylladb/scylla-enterprise#1780

Closes #8810
Closes #8924

Closes #8959
2021-07-05 18:03:51 +03:00
Avi Kivity
0d87744ba0 Revert "dist: stop removing /etc/systemd/system/*.mount on package uninstall"
This reverts commit a677c46672. It causes
upgrade from a version that did not have a commit to a version that
does have the commit to lose the .mount files, since they change
from being owned by the package (via %ghost) to not being owned.

Fixes #8924.
2021-07-01 08:55:54 +03:00
Takuya ASADA
a677c46672 dist: stop removing /etc/systemd/system/*.mount on package uninstall
Listing /etc/systemd/system/*.mount as ghost file seems incorrect,
since user may want to keep using RAID volume / coredump directory after
uninstalling Scylla, or user may want to upgrade enterprise version.

Also, we mixed two types of files as ghost file, it should handle differently:
 1. automatically generated by postinst scriptlet
 2. generated by user invoked scylla_setup

The package should remove only 1, since 2 is generated by user decision.

See scylladb/scylla-enterprise#1780

Closes #8810
2021-06-21 14:53:54 +03:00
Eliran Sinvani
9bfb2754eb dist: rpm: Add specific versioning and python3 dependency
The Red Hat packages were missing two things, first the metapackage
wasn't dependant at all in the python3 package and second, the
scylla-server package dependencies didn't contain a version as part
of the dependency which can cause to some problems during upgrade.
Doing both of the things listed here is a bit of an overkill as either
one of them separately would solve the problem described in #XXXX
but both should be applied in order to express the correct concept.

Fixes #8829

Closes #8832
2021-06-09 20:02:43 +03:00
Avi Kivity
a55b434a2b treewide: extent copyright statements to present day 2021-06-06 19:18:49 +03:00
Yaron Kaikov
6a447db8a8 scylla_util.py: Fix Azure support for machine-image
In https://github.com/scylladb/scylla/pull/7807 we added support for
Azure instance in Scylla.

The following changes are required in order machine-image to work:
1) fix wrong metadata URL and updating metadata path values (was
   intreduce in
f627fcbb0c)
2) fix function naming which been used my machine image
3) add missing function which are reuqired by mahcine-image
4) cleanup unused functions

Closes #8596
2021-06-06 09:21:23 +03:00
Lubos Kosco
777771df34 scylla_util.py: Relax GCE setup NVMe device checks
We don't want to fail I/O setup if there are more than one NVMe devices
mounted as root nor if there are no NVMe devices.

Fixes #8032

Closes #8444
2021-06-06 09:21:23 +03:00
Avi Kivity
e96ff3d82d dist: add new docker building process
The new process has the following differences from the Dockerfile
based image:

 - Using buildah commands instead of a Dockerfile. This is more flexible
   since we don't need to pack everything into a "build context" and
   transfer it to the container; instead we interact with the container
   as we build it.
 - Using packages instead of a remote yum repository. This makes it
   easy to create an image in one step (no need to create a repository,
   promote, then download the packages back via yum. It means that
   the image cannot be upgraded via yum, but container images are
   usually just replaced with a new version.
 - Build output is an OCI archive (e.g. a tarball), not a docker image
   in a local repoistory. This means the build process can later be
   integrated into ninja, since the artifact is just a file. The file
   can be uploaded into a repository or made available locally with
   skopeo.
 - any build mode is supported, not just release. This can be used
   for quick(er) testing with dev mode.

I plan to integrate it further into the build system, but currently
this is blocked on a buildah bug [1].

[1] https://github.com/containers/buildah/issues/3262

Closes #8730
2021-05-31 10:05:22 +03:00
Yaron Kaikov
dd453ffe6a install.sh: Setup aio-max-nr upon installation
This is a follow up change to #8512.

Let's add aio conf file during scylla installation process and make sure
we also remove this file when uninstall Scylla

As per Avi Kivity's suggestion, let's set aio value as static
configuration, and make it large enough to work with 500 cpus.

Closes #8650
2021-05-24 14:24:20 +03:00
Takuya ASADA
3d307919c3 scylla_raid_setup: use /dev/disk/by-uuid to specify filesystem
Currently, var-lib-scylla.mount may fails because it can start before
MDRAID volume initialized.
We may able to add "After=dev-disk-by\x2duuid-<uuid>.device" to wait for
device become available, but systemd manual says it automatically
configure dependency for mount unit when we specify filesystem path by
"absolute path of a device node".

So we need to replace What=UUID=<uuid> to What=/dev/disk/by-uuid/<uuid>.

Fixes #8279

Closes #8681
2021-05-24 14:24:08 +03:00
Takuya ASADA
838acb44d0 scylla-fstrim.timer: fix wrong description from 'daily' to 'weekly'
It scheduled weekly, not daily.

Fixes #8633

Closes #8644
2021-05-14 16:02:12 +02:00
Yaron Kaikov
588a065304 scylla_io_setup: configure "aio-max-nr" before iotune
On severl instance types in AWS and Azure, we get the following failure
during scylla_io_setup process:
```
ERROR 2021-04-14 07:50:35,666 [shard 5] seastar - Could not setup Async
I/O: Resource temporarily unavailable. The most common cause is not
enough request capacity in /proc/sys/fs/aio-max-nr. Try increasing that
number or reducing the amount of logical CPUs available for your
application
```

We have scylla_prepare:configure_io_slots() running before the
scylla-server.service start, but the scylla_io_setup is taking place
before

1) Let's move configure_io_slots() to scylla_util.py since both
   scylla_io_setup and scylla_prepare are import functions from it
2) cleanup scylla_prepare since we don't need the same function twice
3) Let's use configure_io_slots() during scylla_io_setup to avoid such
failure

Fixes: #8587

Closes #8512
2021-05-11 18:39:10 +03:00
Avi Kivity
6977064693 dist: scylla_raid_setup: reduce xfs block size to 1k
Since Linux 5.12 [1], XFS is able to to asynchronously overwrite
sub-block ranges without stalling. However, we want good performance
on older Linux versions, so this patch reduces the block size to the
minimum possible.

That turns out to be 1024 for crc-protected filesystems (which we want)
and it can also not be smaller than the sector size. So we fetch the
sector size and set the block size to that if it is larger than 512.
Most SSDs have a sector size of 512, so this isn't a problem.

Tested on AWS i3.large.

Fixes #8156.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ed1128c2d0c87e5ff49c40f5529f06bc35f4251b

Closes #8585
2021-05-05 16:07:50 +03:00
Lubos Kosco
c26bcf29f9 scylla_io_setup: add disk properties for L Azure instances 2021-05-04 13:13:05 +02:00
Lubos Kosco
f627fcbb0c scylla_util.py: add new class for Azure cloud support 2021-05-04 13:12:42 +02:00
Takuya ASADA
c9324634ca scylla_raid_setup: enabling mdmonitor.service on Debian variants
On Debian variants, mdmonitor.service cannnot enable because it missing
[Install] section, so 'systemctl enable mdmonitor.service' will fail,
not able to run mdmonitor after the system restarted.

To force running the service, add Wants=mdmonitor.service on
var-lib-scylla.mount.

Fixes #8494

Closes #8530
2021-04-28 11:32:27 +03:00
Peter Veentjer
c255903fb0 dist: Added r5b to ena instance_class.
The r5b instances also have ena support. For a confirmation
that all r5b instances have ena, go to the following page:

https://instances.vantage.sh/

Select the r5b and add the 'enhanced networking' column. Then
it will show that for every r5b type there is ena support

Closes #8546
2021-04-27 15:39:24 +03:00
Pekka Enberg
0ddbed2513 dist: Add support for disabling writeback cache
This adds support for disabling writeback cache by adding a new
DISABLE_WRITEBACK_CACHE option to "scylla-server" sysconfig file, which
makes the "scylla_prepare" script (that is run before Scylla starts up)
call perftune.py with appropriate parameters. Also add a
"--disable-writeback-cache" option to "scylla_sysconfig_setup", which
can be called by scylla-machine image scripts, for example.

Refs: #7341
Tests: dtest (next-gating)

Closes #8526
2021-04-22 11:24:49 +03:00
Takuya ASADA
00dcaf2896 dist/debian: rename .default file correctly
On 'product != scylla' environment, we have a bug with .default file
(sysconfig file) handling.
Since .default file should be install original name, package name can be
doesn't match with .default filename.
(ex: default file is /etc/default/scylla-node-exporter, but
     package name is scylla-enterprise-node-exporter)
When filename doesn't match with package name, it should be renamed with
as follows:
  <package name>.<filename>.default
We already do this on .service file, but mistakenly haven't handled
.default file, so let's add it too.

Related scylladb/scylla-enterprise#1718
Fixes #8527

Closes #8528
2021-04-21 14:24:21 +03:00
Takuya ASADA
0b01e1a167 dist: add DefaultDependencies=no to .mount units
To avoid ordering cycle error on Ubuntu, add DefaultDependencies=no
on .mount units.

Fixes #8482

Closes #8495
2021-04-19 09:06:42 +03:00
Avi Kivity
80529f7097 Revert "nonroot: generate scylla_sysconfdir.py correctly"
This reverts commit e991e01f2e. It
breaks installation on CentOS 7.

Fixes #8456.
2021-04-12 16:19:39 +03:00
Takuya ASADA
735c83b27f scylla_ntp_setup: detect already installed ntp client
On current implementation, we may re-run ntp configuration even it
already configured.
Also, the system may configured with non-default ntp client, we just
ignoring that and configure with default ntp client.

This patch minimize unnecessary re-configuration of ntp client.
It run in following order:
 1. Check NTP client is already running. If it running, skip setup
 2. Check NTP client is alrady installed. If it installed, use it
 3. If there is non of NTP client package installed,
    - if it's CentOS, install chrony
    - if it's on other distributions, install systemd-timesyncd

Related with #8344, #8339
2021-04-08 22:52:02 +09:00
Takuya ASADA
2545d7fd43 scylla_util.py: return bool value on systemd_unit.is_active()
Currently, 'if unit.is_active():' is always True since is_active()
returns result in string (active, inactive, unknown).
To avoid such scripting bug, change return value in bool.
2021-04-08 21:54:05 +09:00
Takuya ASADA
0b2c1edddc scylla_ntp_setup: support systemd-timesyncd
On Ubuntu/Debian systemd-timesyncd is default NTP client, and installed
by default.
So use it instead of installing chrony.

Fixes #8339

Closes #8344
2021-04-06 15:28:34 +03:00
Takuya ASADA
e991e01f2e nonroot: generate scylla_sysconfdir.py correctly
We have scripting bug, when /var/log/journal exists, install.sh does not generate scylla_sysconfdir.py.
Stop generating scylla_sysconfdir.py in if else condition, do that
unconditionally in install.sh, also drop pre-generated
scylla_sysconfdir.py from dist/common/scripts.

Also, $rsysconfdir is correct path to point nonroot mode sysconfdir,
instead of $sysconfdir.

Fixes #8385

Closes #8386
2021-04-05 15:31:12 +03:00
Avi Kivity
fb890889cc version: prepare for the 4.6 cycle 2021-04-01 20:40:52 +03:00
Takuya ASADA
3af31eebeb scylla_setup: stop hardcode product name on scylla_setup
Stop hardcode product name on scylla_setup, dynamically generate
scylla_product.py in install.sh.

Fixes #8367

Closes #8384
2021-04-01 15:07:58 +03:00
Takuya ASADA
6f678ab7ff aws: initialize self._disks['ebs'] when no EBS disks
Seems like aws_instance.ebs_disks() causes traceback when no EBS disks
available, need to initialize with empty list.

Fixes #8365

Closes #8366
2021-03-29 17:21:14 +03:00
Takuya ASADA
d9a625c842 scylla_setup: don't run node-exporter setup when it's not installed
We need to run package existance check before run setup of
node-exporter.

Fixes #8276

Closes #8278
2021-03-18 11:24:18 +01:00
Takuya ASADA
e8cfd5114f scylla_coredump_setup: support SLES
SLES requires to install systemd-coredump package and enable
systemd-coredump.socket to use systemd-coredump.
2021-03-15 19:19:56 +09:00
Takuya ASADA
13871ff1f8 scylla_setup: use rpm to check package availability for SLES
Use rpm to check scylla packages installed on SLES.
2021-03-15 19:18:44 +09:00
Takuya ASADA
e3b5ffcf14 dist: install optional packages for SLES
Support SUSE original package manager 'zypper' for pkg_install()
function.
2021-03-15 19:17:48 +09:00
Takuya ASADA
af8eae317b scylla_coredump_setup: avoid coredump failure when hard limit of coredump is set to zero
On the environment hard limit of coredump is set to zero, coredump test
script will fail since the system does not generate coredump.
To avoid such issue, set ulimit -c 0 before generating SEGV on the script.

Note that scylla-server.service can generate coredump even ulimit -c 0
because we set LimitCORE=infinity on its systemd unit file.

Fixes #8238

Closes #8245
2021-03-10 19:28:10 +02:00
Takuya ASADA
2d9feaacea scylla_raid_setup: don't abort using raiddev when array_state is 'clear'
On Ubuntu 20.04 AMI, scylla_raid_setup --raiddev /dev/md0 causes
'/dev/md0 is already using' (issue #7627).
So we merged the patch to find free mdX (587b909).

However, look into /proc/mdstat of the AMI, it actually says no active md device available:

ubuntu@ip-10-0-0-43:~$ cat /proc/mdstat
Personalities :
unused devices: <none>

We currently decide mdX is used when os.path.exists('/sys/block/mdX/md/array_state') == True,
but according to kernel doc, the file may available even array is STOPPED:

    clear

        No devices, no size, no level
        Writing is equivalent to STOP_ARRAY ioctl
https://www.kernel.org/doc/html/v4.15/admin-guide/md.html

So we should also check array_state != 'clear', not just array_state
existance.

Fixes #8219

Closes #8220
2021-03-07 18:30:11 +02:00
Takuya ASADA
53c7600da8 dist: increase fs.aio-max-nr value for other apps
Current fs.aio-max-nr value cpu_count() * 11026 is exact size of scylla
uses, if other apps on the environment also try to use aio, aio slot
will be run out.
So increase value +65536 for other apps.

Related #8133

Closes #8228
2021-03-07 12:11:36 +02:00