--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.
We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.
Fixes#11700Closes#11831
Currently, --disks options does not allow symlinks such as
/dev/disk/by-uuid/* or /dev/disk/azure/*.
To allow using them, is_unused_disk() should resolve symlink to
realpath, before evaluating the disk path.
Fixes#11634Closes#11646
Even we have __escape() for escaping " middle of the value to writing
sysconfig file, we didn't unescape for reading from sysconfig file.
So adding __unescape() and call it on get().
It seems like distribution original sysconfig files does not use double
quote to set the parameter when the value does not contain space.
Adding function to detect spaces in the value, don't usedouble quote
when it not detected.
Fixes#9149
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".
However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.
To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.
Fixes#11617Closes#11666
This add support stripped binary installation for relocatable package.
After this change, scylla and unified packages only contain stripped binary,
and introduce "scylla-debuginfo" package for debug symbol.
On scylla-debuginfo package, install.sh script will extract debug symbol
at /opt/scylladb/<dir>/.debug.
Note that we need to keep unstripped version of relocatable package for rpm/deb,
otherwise rpmbuild/debuild fails to create debug symbol package.
This version is renamed to scylla-unstripped-$version-$release.$arch.tar.gz.
See #8918
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closes#9005
Fixes a regression introduced in 80917a1054:
"scylla_prepare: stop generating 'mode' value in perftune.yaml"
When cpuset.conf contains a "full" CPU set the negation of it from
the "full" CPU set is going to generate a zero mask as a irq_cpu_mask.
This is an illegal value that will eventually end up in the generated
perftune.yaml, which in line will make the scylla service fail to start
until the issue is resolved.
In such a case a irq_cpu_mask must represent a "full" CPU set mimicking
a former 'MQ' mode.
Fixes#11701
Tested:
- Manually on a 2 vCPU VM in an 'auto-selection' mode.
- Manually on a large VM (48 vCPUs) with an 'MQ' manually
enforced.
Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>
To recap: the Nix devenv ({default,shell,flake}.nix and friends) in Scylla is a nicer (for those who consider it so, that is) alternative to dbuild: a completely deterministic build environment without Docker.
In theory we could support much more (creating installable packages, container images, various deployment affordances, etc. -- Nix is, among other things, a kind of parallel-to-everything-else devops realm) but there is clearly no demand and besides duplicating the work the release team is already doing (and doing just fine, needless to say) would be pointless and wasteful.
This PR reflects the accumulated changes that I have been carrying locally for the past year or so. The version currently in master _probably_ can still build Scylla, but that Scylla certainly would not pass unit tests.
What the previous paragraph seems to mean is, apparently I'm the only active user of Nix devenv for Scylla. Which, in turn, presents some obvious questions for the maintainers:
- Does this need to live in the Scylla source at all? (The changes to non-Nix-specific parts are minimal and unobtrusive, but they are still changes)
- If it's left in, who is going to maintain it going forward, should more users somehow appear? (I'm perfectly willing to fix things up when alerted, but no timeliness guarantees)
Closes#9557
* github.com:scylladb/scylladb:
nix: add README.md
build: improvements & upgrades to Nix dev environment
build: allow setting SCYLLA_RELEASE from outside
"
This series adds a long waited transition of our auto-generation
code to irq_cpu_mask instead of 'mode' in perftune.yaml.
And then it fixes a regression in scylla_prepare perftune.yaml
auto-generation logic.
"
* 'scylla_prepare_fix_regression-v1' of https://github.com/vladzcloudius/scylla:
scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
scylla_prepare: stop generating 'mode' value in perftune.yaml
* Add some more useful stuff to the shell environment, so it actually
works for debugging & post-mortem analysis.
* Wrap ccache & distcc transparently (distcc will be used unless
NODISTCC is set to a non-empty value in the environment; ccache will
be used if CCACHE_DIR is not empty).
* Package the Scylla Python driver (instead of the C* one).
* Catch up to misc build/test requirements (including optional) by
requiring or custom-packaging: wasmtime 0.29.0, cxxbridge,
pytest-asyncio, liburing.
* Build statically-linked zstd in a saner and more idiomatic fashion.
* In pure builds (where sources lack Git metadata), derive
SCYLLA_RELEASE from source hash.
* Refactor things for more parameterization.
* Explicitly stub out installPhase (seeing that "nix build" succeeds
up to installPhase means we didn't miss any dependencies).
* Add flake support.
* Add copious comments.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Ubuntu 20.04 has less than 3 years of OS support remaining.
We should switch to Ubuntu 22.04 to reduce the need for OS upgrades in newly installed clusters.
Closes#11440
Just like 4a8ed4c, we also need to wait for udev event completion to
create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the
disk just after formatting.
Fixes#11359
This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.
And a call for scylla_cpuset_setup is such an explicit request.
The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.
And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.
The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.
This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.
Fixes#11385Fixes#10121
Modern perftune.py supports a more generic way of defining IRQ CPUs:
'irq_cpu_mask'.
This patch makes our auto-generation code create a perftune.yaml
that uses this new parameter instead of using outdated 'mode'.
As a side effect, this change eliminates the notion of "incorrect"
value in cpuset.conf - every value is valid now as long as it fits into
the 'all' CPU set of the specific machine.
Auto-generated 'irq_cpu_mask' is going to include all bits from 'all'
CPU mask except those defined in cpuset.conf.
Fixes#9903
On recent version of systemd, StandardOutput=syslog is obsolete.
We should use StandardOutput=journal instead, but since it's default value,
so we can just drop it.
Fixes#11322Closes#11339
Current debug log is bit difficult to collect in CI, to find the debug log
we must know which script caused Exception.
Because the filename does not include prefix, and also specified
directory is shared with other programs.
To make things more easily, let's change debug log directory to /var/tmp/scylla.
Closes#10730
To make scylla setup scripts easier to handle in Ansible, stop deleting
perftune.yaml and detect cpuset.conf changes by mtime of the file.
Also, skip update cpuset.conf when same parameter specified.
Fixes#10121Closes#10312
On 48b6aec16a we mistakenly allowed
check=True on systemd_unit.is_active(), it should be check=False.
We check unit's status by "systemctl is-active" output string,
it returns "active" or "inactive".
But systemctl command returns non-zero status when it returning
"inactive", so we are getting Exception here.
To fix this, we need new option "ignore_error=True" for out(),
and use it in systemd_unit.is_active().
Fixes#10455Closes#10467
Storage field of "coredumpctl info" changed at systemd-v248, it added
"(present)" on the end of line when coredump file available.
Fixes#10669Closes#10714
To run scylla-housekeeping we currently use "sudo -u scylla <cmd>" to switch
scylla user, but it fails on some environment.
Since recent version of Python 3 supports to switch user on subprocess module,
let's use python native way and drop sudo.
Fixes#10483Closes#10538
Using traceback_with_variables module, generate more detail traceback
with variables into debug log.
This will help fixing bugs which is hard to reproduce.
Closes#10472
[avi: regenerate frozen toolchain]
Currently our error message on scylla_prepare says "Exception occurred
while creating perftune.yaml", even perftune.yaml is already generated,
and error occurred after that.
To describe error more correctly, add another error message after
perftune.yaml generated.
see scylladb/scylla-enterprise#2201
Closes#10575
Seems like 59adf05 has a bug, the regex pattern only handles first
32CPUs cpuset pattern, and ignores rest.
We should extend regex pattern to handle all CPUs.
Fixes#10523Closes#10524
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.
related #10390Closes#10414
We currently does not able to get any error message from subprocess when we specified capture_output=True on subprocess.run().
This is because CalledProcessError does not print stdout/stderr when it raised, and we don't catch the exception, we just let python to cause Traceback.
Result of that, we only able to know exit status and failed command but
not able to get stdout/stderr.
This is problematic especially working on perftune.py bug, since the
script should caused Traceback but we never able to see it.
To resolve this, add wrapper function "out()" for capture output, and
print stdout/stderr with error message inside the function.
Fixes#10390Closes#10391
Previous versions of Docker image runs scylla as root, but cb19048
accidently modified it to scylla user.
To keep compatibility we need to revert this to root.
Fixes#10261Closes#10325
We changed supervisor service name at cb19048, but this breaks
compatibility with scylla-operator.
To fix the issue we need to revert the service name to previous one.
Fixes#10269Closes#10323
Currently, we just passes entire output of perftune.py when getting CPU
mask from the script, but it may cause parse error since the script may
also print warning message.
To avoid that, we need to extract CPU mask from the output.
Fixes#10082Closes#10107
Since our Docker image moved to Ubuntu, we mistakenly copy
dist/docker/etc/sysconfig/scylla-server to /etc/sysconfig, which is not
used in Ubuntu (it should be /etc/default).
So /etc/default/scylla-server is just default configuration of
scylla-server .deb package, --log-to-stdout is 0, same as normal installation.
We don't want keep the duplicated configuration file anyway,
so let's drop dist/docker/etc/sysconfig/scylla-server and configure
/etc/default/scylla-server in build_docker.sh.
Fixes#10270Closes#10280
On e1b15ba, we introduce user-friendly error message when Exception
occured while generating perftune.yaml.
However, it becomes difficult to investigate bugs since we dropped
traceback.
To resolve this problem, let's print both traceback and user-friendly
messages.
Related #10050Closes#10140
If the Docker startup script is passed both "--alternator-port" and
"--alternator-https-port", a combination which is supposed to be
allowed, it passes to Scylla the "--alternator-address" option twice.
This isn't necessary, and worse - not allowed.
So this patch fixes the scyllasetup.py script to only pass this
parameter once.
Fixes#10016.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220202165814.1700047-1-nyh@scylladb.com>
Currently, cloud related code have cross-dependencies between
scylla and scylla-machine-image.
It is not good way to implement, and single change can break both
package.
To resolve the issue, we need to move all cloud related code to
scylla-machine-image, and remove them from scylla repository.
Change list:
- move cloud part of scylla_util.py to scylla-machine-image
- move cloud part of scylla_io_setup to scylla-machine-image
- move scylla_ec2_check to scylla-machine-image
- move cloud part of scylla_bootparam_setup to scylla-machine-image
Closes#9957
Currently, --swap-size does not able to specify exact file size because
the option takes parameter only in GB.
To fix the limitation, let's add --swpa-size-bytes to specify swap size
in bytes.
We need this to implement preallocate swapfile while building IaaS
image.
see scylladb/scylla-machine-image#285
Closes#9971
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.
Fixes#9540
This reverts commit 0d8f932f0b,
because RHEL developer explains this is not a bug, it's expected behavior.
(mdadm --monitor does not start when RAID level is 0)
see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936
So we should stop downgrade mdadm package and modify our script not to
enable mdmonitor.service on RAID0, use it only for RAID5.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.
So, make it return boolean in both cases.
Fixes https://github.com/scylladb/scylla/issues/9848Closes#9849
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.
Fixes#9837Closes#9841
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).
Fixes#9540Closes#9782