Commit Graph

1458 Commits

Author SHA1 Message Date
Takuya ASADA
acc408c976 scylla_setup: fix incorrect type definition on --online-discard option
--online-discard option defined as string parameter since it doesn't
specify "action=", but has default value in boolean (default=True).
It breaks "provisioning in a similar environment" since the code
supposed boolean value should be "action='store_true'" but it's not.

We should change the type of the option to int, and also specify
"choices=[0, 1]" just like --io-setup does.

Fixes #11700

Closes #11831
2022-11-08 08:40:44 +02:00
Takuya ASADA
464b5de99b scylla_setup: allow symlink to --disks option
Currently, --disks options does not allow symlinks such as
/dev/disk/by-uuid/* or /dev/disk/azure/*.

To allow using them, is_unused_disk() should resolve symlink to
realpath, before evaluating the disk path.

Fixes #11634

Closes #11646
2022-10-28 07:24:11 +03:00
Takuya ASADA
cd6030d5df scylla_util.py: adding unescape for sysconfig_parser
Even we have __escape() for escaping " middle of the value to writing
sysconfig file, we didn't unescape for reading from sysconfig file.
So adding __unescape() and call it on get().
2022-10-27 16:39:47 +09:00
Takuya ASADA
de57433bcf scylla_util.py: on sysconfig_parser, don't use double quote when it's possible
It seems like distribution original sysconfig files does not use double
quote to set the parameter when the value does not contain space.
Adding function to detect spaces in the value, don't usedouble quote
when it not detected.

Fixes #9149
2022-10-27 16:36:27 +09:00
Takuya ASADA
a938b009ca scylla_raid_setup: run uuidpath existance check only after mount failed
We added UUID device file existance check on #11399, we expect UUID
device file is created before checking, and we wait for the creation by
"udevadm settle" after "mkfs.xfs".

However, we actually getting error which says UUID device file missing,
it probably means "udevadm settle" doesn't guarantee the device file created,
on some condition.

To avoid the error, use var-lib-scylla.mount to wait for UUID device
file is ready, and run the file existance check when the service is
failed.

Fixes #11617

Closes #11666
2022-10-25 08:54:21 +03:00
Takuya ASADA
49d5e51d76 reloc: add support stripped binary installation for relocatable package
This add support stripped binary installation for relocatable package.
After this change, scylla and unified packages only contain stripped binary,
and introduce "scylla-debuginfo" package for debug symbol.
On scylla-debuginfo package, install.sh script will extract debug symbol
at /opt/scylladb/<dir>/.debug.

Note that we need to keep unstripped version of relocatable package for rpm/deb,
otherwise rpmbuild/debuild fails to create debug symbol package.
This version is renamed to scylla-unstripped-$version-$release.$arch.tar.gz.

See #8918

Signed-off-by: Takuya ASADA <syuu@scylladb.com>

Closes #9005
2022-10-13 15:11:32 +02:00
Vlad Zolotarov
8195dab92a scylla_prepare: correctly handle a former 'MQ' mode
Fixes a regression introduced in 80917a1054:
"scylla_prepare: stop generating 'mode' value in perftune.yaml"

When cpuset.conf contains a "full" CPU set the negation of it from
the "full" CPU set is going to generate a zero mask as a irq_cpu_mask.
This is an illegal value that will eventually end up in the generated
perftune.yaml, which in line will make the scylla service fail to start
until the issue is resolved.

In such a case a irq_cpu_mask must represent a "full" CPU set mimicking
a former 'MQ' mode.

Fixes #11701
Tested:
 - Manually on a 2 vCPU VM in an 'auto-selection' mode.
 - Manually on a large VM (48 vCPUs) with an 'MQ' manually
   enforced.
Message-Id: <20221004004237.2961246-1-vladz@scylladb.com>
2022-10-06 17:43:37 +03:00
Botond Dénes
f4540ef0d6 Merge 'Upgrade nix devenv' from Michael Livshin
To recap: the Nix devenv ({default,shell,flake}.nix and friends) in Scylla is a nicer (for those who consider it so, that is) alternative to dbuild: a completely deterministic build environment without Docker.

In theory we could support much more (creating installable packages, container images, various deployment affordances, etc. -- Nix is, among other things, a kind of parallel-to-everything-else devops realm) but there is clearly no demand and besides duplicating the work the release team is already doing (and doing just fine, needless to say) would be pointless and wasteful.

This PR reflects the accumulated changes that I have been carrying locally for the past year or so.  The version currently in master _probably_ can still build Scylla, but that Scylla certainly would not pass unit tests.

What the previous paragraph seems to mean is, apparently I'm the only active user of Nix devenv for Scylla.  Which, in turn, presents some obvious questions for the maintainers:

- Does this need to live in the Scylla source at all?  (The changes to non-Nix-specific parts are minimal and unobtrusive, but they are still changes)
- If it's left in, who is going to maintain it going forward, should more users somehow appear?  (I'm perfectly willing to fix things up when alerted, but no timeliness guarantees)

Closes #9557

* github.com:scylladb/scylladb:
  nix: add README.md
  build: improvements & upgrades to Nix dev environment
  build: allow setting SCYLLA_RELEASE from outside
2022-10-03 09:40:09 +03:00
Avi Kivity
372eadf542 Merge "perftune related improvements in scylla_* scripts" from Vlad Zolotarov
"
This series adds a long waited transition of our auto-generation
code to irq_cpu_mask instead of 'mode' in perftune.yaml.

And then it fixes a regression in scylla_prepare perftune.yaml
auto-generation logic.
"

* 'scylla_prepare_fix_regression-v1' of https://github.com/vladzcloudius/scylla:
  scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
  scylla_prepare: stop generating 'mode' value in perftune.yaml
2022-10-02 13:25:13 +03:00
Michael Livshin
d178ac17dc nix: add README.md
Signed-off-by: Michael Livshin <repo@cmm.kakpryg.net>
2022-10-02 12:26:02 +03:00
Michael Livshin
7bd13be3f2 build: improvements & upgrades to Nix dev environment
* Add some more useful stuff to the shell environment, so it actually
  works for debugging & post-mortem analysis.

* Wrap ccache & distcc transparently (distcc will be used unless
  NODISTCC is set to a non-empty value in the environment; ccache will
  be used if CCACHE_DIR is not empty).

* Package the Scylla Python driver (instead of the C* one).

* Catch up to misc build/test requirements (including optional) by
  requiring or custom-packaging: wasmtime 0.29.0, cxxbridge,
  pytest-asyncio, liburing.

* Build statically-linked zstd in a saner and more idiomatic fashion.

* In pure builds (where sources lack Git metadata), derive
  SCYLLA_RELEASE from source hash.

* Refactor things for more parameterization.

* Explicitly stub out installPhase (seeing that "nix build" succeeds
  up to installPhase means we didn't miss any dependencies).

* Add flake support.

* Add copious comments.

Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
2022-10-02 11:47:16 +03:00
Yaron Kaikov
27e326652b build_docker.sh:fix python2 dependency
Following the revert of b004da9d1b which solved https://github.com/scylladb/scylla-pkg/issues/3094

updating docker dependency to match `scylla-tools-java` requirements

Closes #11522
2022-09-12 13:33:06 +03:00
Yaron Kaikov
9f9ee8a812 build_docker.sh: Build docker based on Ubuntu:22.04
Ubuntu 20.04 has less than 3 years of OS support remaining.

We should switch to Ubuntu 22.04 to reduce the need for OS upgrades in newly installed clusters.

Closes #11440
2022-09-04 14:00:27 +03:00
Takuya ASADA
8835a34ab6 scylla_raid_setup: prevent mount failed for /var/lib/scylla
Just like 4a8ed4c, we also need to wait for udev event completion to
create /dev/disk/by-uuid/$UUID for newly formatted disk, to mount the
disk just after formatting.

Fixes #11359
2022-08-27 03:27:44 +09:00
Takuya ASADA
40134efee4 scylla_raid_setup: check uuid and device path are valid
Added code to check make sure uuid and uuid based device path are valid.
2022-08-27 03:08:31 +09:00
Vlad Zolotarov
c538cc2372 scylla_prepare + scylla_cpuset_setup: make scylla_cpuset_setup idempotent without introducing regressions
This patch fixes the regression introduced by 3a51e78 which broke
a very important contract: perftune.yaml should not be "touched"
by Scylla scriptology unless explicitly requested.

And a call for scylla_cpuset_setup is such an explicit request.

The issue that the offending patch was intending to fix was that
cpuset.conf was always generated anew for every call of
scylla_cpuset_setup - even if a resulting cpuset.conf would come
out exactly the same as the one present on the disk before tha call.

And since the original code was following the contract mentioned above
it was also deleting perftune.yaml every time too.
However, this was just an unavoidable side-effect of that cpuset.conf
re-generation.

The above also means that if scylla_cpuset_setup doesn't write to cpuset.conf
we should not "touch" perftune.yaml and vise versa.

This patch implements exactly that together with reverting the dangerous
logic introduced by 3a51e78.

Fixes #11385
Fixes #10121
2022-08-25 13:03:02 -04:00
Vlad Zolotarov
80917a1054 scylla_prepare: stop generating 'mode' value in perftune.yaml
Modern perftune.py supports a more generic way of defining IRQ CPUs:
'irq_cpu_mask'.

This patch makes our auto-generation code create a perftune.yaml
that uses this new parameter instead of using outdated 'mode'.

As a side effect, this change eliminates the notion of "incorrect"
value in cpuset.conf - every value is valid now as long as it fits into
the 'all' CPU set of the specific machine.

Auto-generated 'irq_cpu_mask' is going to include all bits from 'all'
CPU mask except those defined in cpuset.conf.

Fixes #9903
2022-08-25 13:02:57 -04:00
Takuya ASADA
60e8f5743c systemd: drop StandardOutput=syslog
On recent version of systemd, StandardOutput=syslog is obsolete.
We should use StandardOutput=journal instead, but since it's default value,
so we can just drop it.

Fixes #11322

Closes #11339
2022-08-22 10:47:37 +03:00
Yaron Kaikov
c42c5111eb SCYLLA-VERSION-GEN: use semver-compatible version
Setting Scylla to use semantic versioning. (Ref: https://semver.org/)

Closes: https://github.com/scylladb/scylla/issues/9543

Closes #10957
2022-07-25 18:06:28 +03:00
Takuya ASADA
ce87e15ecf scylla_prepare: fix Exception when SET_NIC_AND_DISKS=no and SET_CLOCKSOURCE=yes
We shouldn't call get_tune_mode() when NIC tuning is disabled.

fixes #10412

Closes #10959
2022-07-05 14:52:52 +03:00
Takuya ASADA
7501465b7c scylla_util.py: change debug log directory to /var/tmp/scylla
Current debug log is bit difficult to collect in CI, to find the debug log
we must know which script caused Exception.
Because the filename does not include prefix, and also specified
directory is shared with other programs.

To make things more easily, let's change debug log directory to /var/tmp/scylla.

Closes #10730
2022-07-05 14:49:00 +03:00
Takuya ASADA
3a51e7820a scylla_cpuset_setup: stop deleting perftune.yaml and skip update cpuset.conf when same parameter specified
To make scylla setup scripts easier to handle in Ansible, stop deleting
perftune.yaml and detect cpuset.conf changes by mtime of the file.
Also, skip update cpuset.conf when same parameter specified.

Fixes #10121

Closes #10312
2022-06-23 10:28:36 +03:00
Israel Fruchter
d2ca2455db scripts/scylla_util.py: introduce back user/group arguments for out()
since #10467 remove the user/group parameters needed for the housekeeping
call, need to introuce them back

Fixes: #10804

Closes #10818
2022-06-16 13:50:17 +03:00
Takuya ASADA
5643c6de56 scylla_util.py: fix "systemctl is-active" causes error
On 48b6aec16a we mistakenly allowed
check=True on systemd_unit.is_active(), it should be check=False.
We check unit's status by "systemctl is-active" output string,
it returns "active" or "inactive".
But systemctl command returns non-zero status when it returning
"inactive", so we are getting Exception here.
To fix this, we need new option "ignore_error=True" for out(),
and use it in systemd_unit.is_active().

Fixes #10455

Closes #10467
2022-06-13 13:45:50 +03:00
Takuya ASADA
ad2344a864 scylla_coredump_setup: support new format of Storage field
Storage field of "coredumpctl info" changed at systemd-v248, it added
"(present)" on the end of line when coredump file available.

Fixes #10669

Closes #10714
2022-06-07 02:21:32 +03:00
Takuya ASADA
b6003989f9 scylla_setup: stop using sudo -u, use user/group parameter on subprocess module
To run scylla-housekeeping we currently use "sudo -u scylla <cmd>" to switch
scylla user, but it fails on some environment.
Since recent version of Python 3 supports to switch user on subprocess module,
let's use python native way and drop sudo.

Fixes #10483

Closes #10538
2022-05-19 17:21:35 +03:00
Takuya ASADA
883b97d8b2 dist/common/scripts: generate debug log when exception occurred
Using traceback_with_variables module, generate more detail traceback
with variables into debug log.
This will help fixing bugs which is hard to reproduce.

Closes #10472

[avi: regenerate frozen toolchain]
2022-05-17 13:18:27 +03:00
Takuya ASADA
00ce34c29b scylla_prepare: describe error more correctly
Currently our error message on scylla_prepare says "Exception occurred
while creating perftune.yaml", even perftune.yaml is already generated,
and error occurred after that.
To describe error more correctly, add another error message after
perftune.yaml generated.

see scylladb/scylla-enterprise#2201

Closes #10575
2022-05-16 20:05:58 +03:00
Takuya ASADA
a9dfe5a8f4 scylla_sysconfig_setup: handle >=32CPUs correctly
Seems like 59adf05 has a bug, the regex pattern only handles first
32CPUs cpuset pattern, and ignores rest.
We should extend regex pattern to handle all CPUs.

Fixes #10523

Closes #10524
2022-05-11 14:46:30 +02:00
Takuya ASADA
48b6aec16a scripts: use "out()" function for all capture_output subprocesses
On acaf0bb we applied out() just for perftune.py because we had issue #10390
with this script.
But the issue can happen with other commands too, let's apply it to all
commands which uses capture_output.

related #10390

Closes #10414
2022-04-26 13:56:52 +03:00
Takuya ASADA
acaf0bb88a scripts: print perftune.py error message when capture_output=True
We currently does not able to get any error message from subprocess when we specified capture_output=True on subprocess.run().
This is because CalledProcessError does not print stdout/stderr when it raised, and we don't catch the exception, we just let python to cause Traceback.
Result of that, we only able to know exit status and failed command but
not able to get stdout/stderr.

This is problematic especially working on perftune.py bug, since the
script should caused Traceback but we never able to see it.

To resolve this, add wrapper function "out()" for capture output, and
print stdout/stderr with error message inside the function.

Fixes #10390

Closes #10391
2022-04-18 14:06:51 +03:00
Takuya ASADA
f95a531407 docker: run scylla as root
Previous versions of Docker image runs scylla as root, but cb19048
accidently modified it to scylla user.
To keep compatibility we need to revert this to root.

Fixes #10261

Closes #10325
2022-04-04 17:25:13 +03:00
Takuya ASADA
41edc045d9 docker: revert scylla-server.conf service name change
We changed supervisor service name at cb19048, but this breaks
compatibility with scylla-operator.
To fix the issue we need to revert the service name to previous one.

Fixes #10269

Closes #10323
2022-04-03 19:18:18 +03:00
Alexey Kartashov
d86c3a8061 dist/docker: fix incorrect locale value
Docker build script contains an incorrect locale specification for LC_ALL setting,
this commit fixes that.

Fixes #10310

Closes #10321
2022-04-03 14:24:54 +03:00
Takuya ASADA
59adf05951 scylla_sysconfig_setup: avoid perse error on perftune.py --get-cpu-mask
Currently, we just passes entire output of perftune.py when getting CPU
mask from the script, but it may cause parse error since the script may
also print warning message.

To avoid that, we need to extract CPU mask from the output.

Fixes #10082

Closes #10107
2022-03-28 16:31:14 +03:00
Takuya ASADA
bdefea7c82 docker: enable --log-to-stdout which mistakenly disabled
Since our Docker image moved to Ubuntu, we mistakenly copy
dist/docker/etc/sysconfig/scylla-server to /etc/sysconfig, which is not
used in Ubuntu (it should be /etc/default).
So /etc/default/scylla-server is just default configuration of
scylla-server .deb package, --log-to-stdout is 0, same as normal installation.

We don't want keep the duplicated configuration file anyway,
so let's drop dist/docker/etc/sysconfig/scylla-server and configure
/etc/default/scylla-server in build_docker.sh.

Fixes #10270

Closes #10280
2022-03-27 14:50:10 +03:00
Takuya ASADA
59c72d5d60 scylla_prepare: print Traceback with current user-friendly messages
On e1b15ba, we introduce user-friendly error message when Exception
occured while generating perftune.yaml.
However, it becomes difficult to investigate bugs since we dropped
traceback.
To resolve this problem, let's print both traceback and user-friendly
messages.

Related #10050

Closes #10140
2022-03-20 16:55:18 +02:00
Nadav Har'El
cb6630040d docker: don't repeat "--alternator-address" option twice
If the Docker startup script is passed both "--alternator-port" and
"--alternator-https-port", a combination which is supposed to be
allowed, it passes to Scylla the "--alternator-address" option twice.
This isn't necessary, and worse - not allowed.

So this patch fixes the scyllasetup.py script to only pass this
parameter once.

Fixes #10016.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20220202165814.1700047-1-nyh@scylladb.com>
2022-02-02 23:26:11 +02:00
Takuya ASADA
c2ccdac297 move cloud related code from scylla repository to scylla-machine-image
Currently, cloud related code have cross-dependencies between
scylla and scylla-machine-image.
It is not good way to implement, and single change can break both
package.

To resolve the issue, we need to move all cloud related code to
scylla-machine-image, and remove them from scylla repository.

Change list:
 - move cloud part of scylla_util.py to scylla-machine-image
 - move cloud part of scylla_io_setup to scylla-machine-image
 - move scylla_ec2_check to scylla-machine-image
 - move cloud part of scylla_bootparam_setup to scylla-machine-image

Closes #9957
2022-02-01 11:26:59 +02:00
Takuya ASADA
218dd3851c scylla_swap_setup: add --swap-size-bytes
Currently, --swap-size does not able to specify exact file size because
the option takes parameter only in GB.
To fix the limitation, let's add --swpa-size-bytes to specify swap size
in bytes.
We need this to implement preallocate swapfile while building IaaS
image.

see scylladb/scylla-machine-image#285

Closes #9971
2022-01-31 18:32:32 +02:00
Takuya ASADA
32f2eb63ac scylla_raid_setup: use mdmonitor only when RAID level > 0
We found that monitor mode of mdadm does not work on RAID0, and it is
not a bug, expected behavior according to RHEL developer.
Therefore, we should stop enabling mdmonitor when RAID0 is specified.

Fixes #9540
2022-01-26 22:33:07 +09:00
Takuya ASADA
cd57815fff Revert "scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8"
This reverts commit 0d8f932f0b,
because RHEL developer explains this is not a bug, it's expected behavior.
(mdadm --monitor does not start when RAID level is 0)
see: https://bugzilla.redhat.com/show_bug.cgi?id=2031936

So we should stop downgrade mdadm package and modify our script not to
enable mdmonitor.service on RAID0, use it only for RAID5.
2022-01-26 22:33:06 +09:00
Avi Kivity
fcb8d040e8 treewide: use Software Package Data Exchange (SPDX) license identifiers
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.

Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.

The changes we applied mechanically with a script, except to
licenses/README.md.

Closes #9937
2022-01-18 12:15:18 +01:00
Valerii Ponomarov
12fa68fe67 scylla_util: return boolean calling systemd_unit.available
As of now, 'systemd_unit.available' works ok only when provided
unit is present.
It raises Exception instead of returning boolean
when provided systemd unit is absent.

So, make it return boolean in both cases.

Fixes https://github.com/scylladb/scylla/issues/9848

Closes #9849
2021-12-28 15:14:04 +02:00
Takuya ASADA
6a834261fb scylla_coredump_setup: prevent coredump timeout on systemd-coredump@.service
On newer version of systemd-coredump, coredump handled in
systemd-coredump@.service, and may causes timeout while running the
systemd unit, like this:
  systemd[1]: systemd-coredump@xxxx.service: Service reached runtime time limit. Stopping.
To prevent that, we need to override TimeoutStartSec=infinity.

Fixes #9837

Closes #9841
2021-12-27 13:58:07 +02:00
Takuya ASADA
0d8f932f0b scylla_raid_setup: workaround for mdmonitor.service issue on CentOS8
On CentOS8, mdmonitor.service does not works correctly when using
mdadm-4.1-15.el8.x86_64 and later versions.
Until we find a solution, let's pinning the package version to older one
which does not cause the issue (4.1-14.el8.x86_64).

Fixes #9540

Closes #9782
2021-12-27 12:07:34 +02:00
Takuya ASADA
7064ae3d90 dist: fix scylla-housekeeping uuid file chmod call
Should use chmod() on a file, not fchmod()

Fixes #9683

Closes #9802
2021-12-27 11:47:06 +02:00
Takuya ASADA
6870938842 scylla_raid_setup: fix typo
Closes #9790
2021-12-14 11:15:23 +02:00
Takuya ASADA
ea20f89c56 dist: allow running scylla-housekeeping with strict umask setting
To avoid failing scylla-housekeeping in strict umask environment,
we need to chmod a+r on repository file and housekeeping.uuid.

Fixes #9683

Closes #9739
2021-12-05 20:46:46 +02:00
Takuya ASADA
097a6ee245 dist: add support im4gn/is4gen instance on AWS
Add support next-generation, storage-optimized ARM64 instance types.

Fixes #9711

Closes #9730
2021-12-05 13:20:01 +02:00