Currently --ami does not check instance types, creates invalid
io_properties.yaml on unsupported instance types.
It actually won't occur on AMI startup, since scylla_ami_setup only
invoke scylla_io_setup --ami when the instance is supported, so we don't
get the issue on startup, but we still get when we run scylla_io_setup
manually.
It's better to check instance type on scylla_io_setup, too.
Refs #5438
VERSION_ID of centos7 is "7", but VERSION_ID of oel7.7 is "7.7"
scylla_ntp_setup doesn't work on OEL7.7 for ValueError.
- ValueError: invalid literal for int() with base 10: '7.7'
This patch changed redhat_version() to return version string, and compare
with parse_version().
Fixes#5433
Signed-off-by: Amos Kong <amos@scylladb.com>
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.
This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is used.
Fixes#4474Fixes#5474Fixes#5480
"
These series solves an issue with scylla_setup and prevent it from
waiting forever if housekeeping cannot look for the new Scylla version.
Fixes#5302
It should be backported to versions that support offline installations.
"
* 'scylla_setup_timeout' of git://github.com/amnonh/scylla:
scylla_setup: do not wait forever if no reply is return housekeeping
scylla_util.py: Add optional timeout to out function
When scylla is installed without a network connectivity, the test if a
newer version is available can cause scylla_setup to wait forever.
This patch adds a limit to the time scylla_setup will wait for a reply.
When there is no reply, the relevent error will be shown that it was
unable to check for newer version, but this will not block the setup
script.
Fixes#5302
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
This reverts commit 4333b37f9e. It breaks upgrades,
and the user question is not informative enough for the user to make a correct
decision.
Fixes#5478.
Fixes#5480.
A Linux machine typically has multiple clocksources with distinct
performances. Setting a high-performant clocksource might result in
better performance for ScyllaDB, so this should be considered whenever
starting it up.
This patch introduces the possibility of enforcing optimized Linux
clocksource to Scylla's setup/start-up processes. It does so by adding
an interactive question about enforcing clocksource setting to scylla_setup,
which modifies the parameter "CLOCKSOURCE" in scylla_server configuration
file. This parameter is read by perftune.py which, if set to "yes", proceeds
to (non persistently) setting the clocksource. On x86, TSC clocksource is
used.
Fixes#4474
We may able to use chrony setup script on future version of RHEL/CentOS,
it better to run chrony setup when RHEL version >= 8, not only 8.
Note that on Fedora it still provides ntp/ntpdate package, so we run
ntp setup on it for now. (same on debian variants)
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191203192812.5861-1-syuu@scylladb.com>
Currently interactive RAID setup prompt does not list virtio-blk devices due to
following reasons:
- We fail matching '-p' option on 'lsblk --help' output since misusage of
regex functon, list_block_devices() always skipping to use lsblk output.
- We don't check existance of /dev/vd* when we skipping to use lsblk.
- We mistakenly excluded virtio-blk devices on 'lsblk -pnr' output using '-e'
option, but we actually needed them.
To fix the problem we need to use re.search() instead of re.match() to match
'-p' option on 'lsblk --help', need to add '/dev/vd*' on block device list,
then need to stop '-e 252' option on lsblk which excludes virtio-blk.
Additionally, it better to parse 'TYPE' field of lsblk output, we should skip
'loop' devices and 'rom' devices since these are not disk devices.
Fixes#4066
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20191201160143.219456-1-syuu@scylladb.com>
Currently we support to assign workdir from scylla.yaml, and we use many
hardcode '/var/lib/scylla' in setup scripts.
Some setup scripts get scylla directories by parsing scylla.yaml, introduced
parse_scylla_dirs_with_default() that adds default values if scylla directories
aren't assigned in scylla.yaml
Signed-off-by: Amos Kong <amos@scylladb.com>
Currently we are checking an invalid path of some default scylla directories,
the directories don't exist, so the tune will always be skipped. It caused by
two problem.
Problem 1: paths of default directories is invalid
Introduced by commit 5ec191536e, we try to tune some scylla default directories
if they exist. But the directory paths we try are wrong.
For example:
- What we check: /var/lib/scylla/commitlog_directory
- Correct one: /var/lib/scylla/commitlog
Problem 2: wrong path join
Introduced by commit 31ddb2145a, default_path might be replaced from
'/var/lib/scylla/' to '/var/lib/scylla'.
Our code tries to check an invalid path that is wrongly join, eg:
'/var/lib/scyllacommitlog'
Signed-off-by: Amos Kong <amos@scylladb.com>
The default values of data_file_directories and commitlog_directory were
commented by commit e0f40ed16a. It causes scylla_util.py:get_scylla_dirs() to
fail in checking the values.
This patch changed get_scylla_dirs() to return default data/commitlog
directories if they aren't set.
Fixes#5358
Reviewed-by: Pavel Emelyanov <xemul@scylladb.com>
Signed-off-by: Amos Kong <amos@scylladb.com>
Currently scylla_io_setup will skip in scylla_setup, because we didn't support
those new instance types.
I manually executed scylla_io_setup, and the scylla-server started and worked
well.
Let's apply this patch first, then check if there is some new problem in
ami-test.
Signed-off-by: Amos Kong <amos@scylladb.com>
It is useful to have an option to limit the execution time of a shell
script.
This patch adds an optional timeout parameter, if a parameter will be
provided a command will return and failure if the duration is passed.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Since nonroot mode requires to run everything on non-privileged user,
most of setup scripts does not able to use nonroot mode.
We only provide following functions on nonroot mode:
- EC2 check
- IO setup
- Node exporter installer
- Dev mode setup
Rest of functions will be skipped on scylla_setup.
To implement nonroot mode on setup scripts, scylla_util provides
utility functions to abstract difference of directory structure between normal
installation and nonroot mode.
Scylla requires the CLMUL and SSE 4.2 instruction sets and will fail without them.
There is a check in main(), but that happens after the code is running and it may
already be too late. Add a check in scylla_prepare which runs before the main
executable.
The scylla_blocktune.py has a FIXME for btrfs from 2016, which is no
longer relevant for Scylla deployments, as Red Hat dropped support for
the file system in 2017.
Message-Id: <20190823114013.31112-1-penberg@scylladb.com>
Don't let perftune.py print false alarm error message when we calculate
a compute CPU set for tuning modes.
This may happen when we calculate a CPU set for non-MQ tuning modes on
small systems on which these modes are forbidden because they would
result in a zero CPU set, e.g. sq_split on a system with a single
physical core.
We are going to utilize a newly introduced --get-cpu-mask-quiet execution
mode introduced to the seastar/script/perftune.py by the
"perftune.py: introduce --get-cpu-mask-quiet" series which would return
a zero CPU set if that's what it turns out to be instead of exiting with
an error what --get-cpu-mask would do in such a case.
The rest of scylla_util.py logic is going to handle a zero CPU set
returned by get_mode_cpuset() correctly.
Fixes#4211Fixes#4443
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190731212901.9510-1-vladz@scylladb.com>
Currently NIC selection prompt on scylla_setup just proceed setup when
user just pressed Enter key on the prompt.
The prompt should ask NIC name again until user input correct NIC name.
Fixes#4517
Message-Id: <20190617124925.11559-1-syuu@scylladb.com>
We used to use /opt/scylladb just for Scylla build toolchain and
dependency libraries, not for Scylla main package.
But since we merged relocatable package, Scylla main binary and
dependency libraries are all located under /opt/scylladb, only
setup scripts remained on /usr/lib/scylla.
It strange to keep using both /usr/lib/<app name> and /opt/<app name>,
we should merge them into single place.
Message-Id: <20190614011038.17827-1-syuu@scylladb.com>
To avoid 'Bad permmisons' error when user changed default umask, we need
to verify system umask is acceptable for scylla-server.
Fixes#4157
Message-Id: <20190612130343.6043-1-syuu@scylladb.com>
Apparently we are having some issues running iotune in the i3en instances,
as the values not always make sense. We believe it is something that XFS
is doing, and running fio directly on the device (no filesystem) provides
more meaningful results (more according to AWS published expected values).
For now, let's use fio instead. In this patch I have ran fio for our 4
dimensions in each of the three types of disks (large, xlarge, 3xlarge).
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190524111454.27956-1-glauber@scylladb.com>
Right now if the user tries to execute this in an unrecognized OS, the
following will be thrown:
Traceback (most recent call last):
File "/usr/lib/scylla/libexec/scylla_setup", line 214, in <module>
do_verify_package('scylla-enterprise-jmx')
File "/usr/lib/scylla/libexec/scylla_setup", line 73, in do_verify_package
if res != 0:
UnboundLocalError: local variable 'res' referenced before assignment
It would be a lot nicer to exit gracefully and print a messge saying what
is going on. This was caught when running on OEL, which the previous patch
fixed. Still, there are other unknown OS out there the users may try to run
on.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Oracle Linux is a RHEL-like distribution and we support it just fine, but our
new incarnation of scylla_setup is failing to recognize it.
os-release for OEL is a bit different. It doesn't have an ID_LIKE string, and
only shows an ID string, which is set to 'ol'. So let's recognize this.
Fixes: #4493
Branches: 3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
AWS just released their new instances, the i3en instances. The instance
is verified already to work well with scylla, the only adjustments that
we need is advertise that we support it, and pre-fill the disk
information according to the performance numbers obtained by running the
instance.
Fixes#4486
Branches: 3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190508170831.6003-1-glauber@scylladb.com>
Whenever the iotune_args array uses "--smp", it needs cpudata.smp()
which returns an integer instead of a string. So when iotune_args is
passed to subprocess.check_call(), it actually throws "TypeError:
expected str, bytes or os.PathLike object, not int" but
"%s did not pass validation tests, it may not be on XFS..." is shown as
the exception.
Even though the user inputs correct arguments, it might still throw an
error and confuse the user that he/she has not passed the right
arguments.
One simple fix is to use str(cpudata.smp()) instead of cpudata.smp().
Signed-off-by: JP-Reddy <guthijp.reddy@gmail.com>
Message-Id: <20190406070118.48477-1-guthijp.reddy@gmail.com>
scylla-housekeeping always wants to run in the installation to check if
we are running the latest version. This happens regardless of whether or
not we said yes or no to the housekeeping scylla_setup question - as
that question only deals with whether or not we want to do this through
a timer.
It is fine to try to run scylla-housekeeping, as long as we time it out.
The current code doesn't.
The naive solution is to add a timeout parameter to urllib.request.open.
However, that timeout is not respected and in my tests I saw real
timeouts up to four times higher the timeout we set. For a reasonable 5s
timeout, this mean a 20s real timeout which can lead to a very bad user
experience. This seems to be a known problem with this module according
to a quick Google search.
This patch then takes a slightly more complex solution and uses
multiprocess to enforce a well-defined user-visible timeout.
Fixes#3980
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190506122335.5707-1-glauber@scylladb.com>
The setup script asks the user whether or not housekeeping should
be called, and in the first time the script is executed this decision
is respected.
However if the script is invoked again, that decision is not respected.
This is because the check has the form:
if (housekeeping_cfg_file_exists) {
version_check = ask_user();
}
if (version_check) { do_version_check() } else { dont_do_it() }
When it should have the form:
if (housekeeping_cfg_file_exists) {
version_check = ask_user();
if (version_check) { do_version_check() } else { dont_do_it() }
}
(Thanks python)
This is problematic in systems that are not connected to the internet, since
housekeeping will fail to run and crash the setup script.
Fixes#4462
Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502034211.18435-1-glauber@scylladb.com>
Newer versions of RHEL ship the os-release file with newlines in the
end, which our script was not prepared to handle. As such, scylla_setup
would fail.
This patch makes our OS detection robust against that.
Fixes#4473
Branches: master, branch-3.1
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190502152224.31307-1-glauber@scylladb.com>
perftune.py executes hwloc-calc, the command is now provided as
relocatable binary, placed under /opt/scylladb/bin.
So we need to add the directory to PATH when calling
subprocess.check_output(), but our utility function already do that,
switch to it.
Fixes#4443
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190418124345.24973-1-syuu@scylladb.com>
Since we moved relocatable .rpm now Scylla able to run on Amazon Linux
2.
However, is_redhat_variant() on scylla_util.py does not works on Amazon
Linux 2, since it does not have /etc/redhat-release.
So we need to switch to /etc/os-release, use ID_LIKE to detect Redhat
variants/Debian variants.
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Message-Id: <20190417115634.9635-1-syuu@scylladb.com>
Although curl is widely available, there is no reason to depend on it.
There are mainly two users, as indicated by grep:
1) scylla-housekeeping
2) scripts within the AMI
3) docker image
The AMI has its own RPM and it already depends on curl. While we could
get rid of the curl dependency there too, we can do that later. Docker
is its own thing and it only needs it at build time anyway.
For the main scylla repo, this patch changes scylla-housekeeping so as
not to depend on the curl binary and use urllib directly instead. We can
then remove curl from our dependency list.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
Message-Id: <20190411125642.9754-1-glauber@scylladb.com>
"
Calculation of IO properties is slightly wrong for i3.metal, because we get
the number of disks wrong. The reason for that is our check for ephemeral nvme
disks, that pre-date the time in which root devices were exposed as nvme devices
(nitro and metal instances).
"
toolchain updated with python3-psutil
* 'ec2fixes' of github.com:glommer/scylla:
scylla_util.py: do not include root disks in ephemeral list
scylla-python3: include the psutil module
fix typo in scylla_ec2_check
Nitro instances (and metal ones) put their root device in nvme (as a
protocol. it is still EBS). Our algorithm so far has relied on parsing
the nvme devices to figure out which ones are ephemeral but it will
break for those instances.
Out of our supported instances so far, the i3.metal is the only one
in which this breaks.
Signed-off-by: Glauber Costa <glauber@scylladb.com>
When we call perftune.py in order to get a particular mode's cpu set
(e.g. mode=sq_split) it may fail and print an error message to stderr because
there are too few CPUs for a particular configuration mode (e.g. when
there are only 2 CPUs and the mode is sq_split).
We already treat these situations correctly however we let the
corresponding perftune.py error message get out into the syslog.
This is definitely confusing, stressful and annoying.
Let's not let these messages out.
Fixes#4211
Signed-off-by: Vlad Zolotarov <vladz@scylladb.com>
Message-Id: <20190325220018.22824-1-vladz@scylladb.com>
Collect /etc/redhat-release as well as os-release from relevant
hosts. The problem with os-release is that it doesn't contain the
minor version of the EL OS family. Since this is only present in
Red Hat distributions and derivatives, it will not be collected
in Debian derivatives.
Another approach is to use lsb_release -a but it will not provide
anything more useful than os-release on Debian and lsb needs to be
installed on EL derivatives first.
Fixes#4093
Message-Id: <20190225204727.20805-4-dyasny@scylladb.com>