scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-18 22:02:08 +00:00

Author	SHA1	Message	Date
artem.penner	9898e5700b	scylla-node-exporter: Add systemd collector to node exporter This PR enables the node_exporter systemd collector and configures the unit whitelist to include scylla-server.service and systemd-coredump services. Motivation: We currently lack visibility into system-level service states, which is critical for diagnosing stability issues. This configuration enables two specific use cases: - Detecting Coredump Loops: We encounter scenarios where ScyllaDB enters a restart loop. To pinpoint SIGSEGV (coredumps) as the root cause, we need to track when the systemd-coredump service becomes active, indicating a dump is being processed. - Identifying Startup Failures: We need to detect when the scylla-server unit enters a failed state. This is essential for catching unrecoverable errors (e.g., corrupted commitlogs or configuration bugs) that prevent the server from starting. example of promql queries: - `node_systemd_unit_state{name=~"systemd-coredump@.*", state="active"} == 1` - `node_systemd_unit_state{name="scylla-server.service", state="failed"} == 1` Closes #28402	2026-03-20 08:39:56 +02:00
Avi Kivity	5ae40caa6d	dist: tune tcp_mem to 3% of total memory in scylla-kernel-conf package tcp_mem defaults to 9% of total memory. ScyllaDB defaults to 93%. The sum is more than 100%. Fix by tuning tcp_mem to 3% of total memory. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-734 Closes scylladb/scylladb#28700	2026-03-05 12:51:04 +03:00
Grzegorz Burzyński	b4f0eb666f	packaging: add systemctl command to dependencies scylladb/scylla container image doesn't include systemctl binary, while it is used by perftune.py script shipped within the same image. Scylla Operator runs this script to tune Scylla nodes/containers, expecting its all dependencies to be available in the container's PATH. Without systemctl, the script fails on systems that run irqbalance (e.g., on EKS nodes) as the script tries to reconfigure irqbalance and restart it via systemctl afterwards. Fixes: scylladb/scylla-operator#3080 Closes scylladb/scylladb#28567	2026-02-25 10:19:32 +02:00
Botond Dénes	0bf4c68af5	Merge 'docs: fix link to docker build readme in the README.MD' from Marcin Szopa Links were pointing to the `debian` subdirectory. However, there docker build was refactored to use `redhat`: `1abf981a73`, see https://github.com/scylladb/scylladb/pull/22910 No backport, just a README link fixes. Closes scylladb/scylladb#28699 * github.com:scylladb/scylladb: docs: fix path to the build_docker.sh which was moved from debian to redhat subdirectory docs: fix link to docker build README.MD	2026-02-20 08:21:46 +02:00
Avi Kivity	58a662b9db	dist: refresh container base image (ubi9-minimal) Using an outdated image can cause problems when `microdnf update` runs, if the distribution doesn't maintain good update hygiene. Although, I suspect that when update failures happen they're really caused by propagation delay of packages to mirrors. Fix by using --pull=always to get a fresh image. Ref https://scylladb.atlassian.net/browse/SCYLLADB-714 Closes scylladb/scylladb#28680	2026-02-19 12:42:43 +03:00
Marcin Szopa	9217f85e99	docs: fix path to the build_docker.sh which was moved from debian to redhat subdirectory	2026-02-18 12:19:27 +01:00
Israel Fruchter	6b3ce5fdcc	dist: scylla_coredump_setup: force unmount /var/lib/systemd/coredump before setup When setting up coredump handling, if there are old mounts in a deleted state (e.g. from an older installation), systemd might fail to activate the new `.mount` unit properly because it assumes the path is already mounted. Explicitly unmount `/var/lib/systemd/coredump` before proceeding with the setup to ensure a clean state. Fix: scylladb/scylla-enterprise#5692 Closes scylladb/scylladb#28300	2026-01-22 14:35:26 +02:00
Yaniv Michael Kaul	af8eaa9ea5	scripts: fixes flagged by CodeQL/PyLens Unused imports, unused variables and such. Initially, there were no functional changes, just to get rid of some standard CodeQL warnings. I've then broken the CI, as apparently there's a install time(!?) Python script creation for the sole purpose of product naming. I changed it - we have it in etcdir, as SCYLLA-PRODUCT-FILE. So added (copied from a different script) a get_product() helper function in scylla_util.py and used it instead. While at it, also fixed the too broad import from scylla_util, which 'forced' me to also fix other specific imports (such as shutil). Improvement - no need to backport. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#27883	2026-01-09 15:13:12 +02:00
Yaron Kaikov	1ee89c9682	Revert "scripts: benign fixes flagged by CodeQL/PyLens" This reverts commit `377c3ac072`. This breaks all artifact tests and cloud image build process Closes scylladb/scylladb#27881	2025-12-28 09:49:49 +02:00
Yaniv Michael Kaul	377c3ac072	scripts: benign fixes flagged by CodeQL/PyLens Unused imports, unused variables and such. No functional changes, just to get rid of some standard CodeQL warnings. Benign - no need to backport. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#27801	2025-12-24 13:08:24 +02:00
Amnon Heiman	a213e41250	scylla-node-exporter: Add ethtool to node exporter AWS suggests following multiple network performance metrics: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html#network-performance-metrics This patch enables the ethtool collector with the specific list of metrics Ater this patch the relevant metris looks like: $ curl http://localhost:9100/metrics \|& grep ethtool node_ethtool_bw_in_allowance_exceeded{device="ens5"} 0 node_ethtool_bw_out_allowance_exceeded{device="ens5"} 0 node_ethtool_conntrack_allowance_available{device="ens5"} 51303 node_ethtool_conntrack_allowance_exceeded{device="ens5"} 0 node_ethtool_info{bus_info="0000:00:05.0",device="ens5",driver="ena",expansion_rom_version="",firmware_version="",version="6.14.0-1015-aws"} 1 node_ethtool_linklocal_allowance_exceeded{device="ens5"} 0 node_scrape_collector_duration_seconds{collector="ethtool"} 0.001091436 node_scrape_collector_success{collector="ethtool"} 1 Signed-off-by: Amnon Heiman <amnon@scylladb.com> Closes scylladb/scylladb#27358	2025-12-08 14:27:10 +02:00
Avi Kivity	b82f92b439	main: replace p11-kit hack for trust paths override with gnutls hack p11-kit has hardcoded paths for the trust paths. Of course, each Linux distribution hardcodes those paths differently. As a result, our relocatable gnutls, which uses p11-kit-trust.so to process the trust paths, needs some overrides to select the right paths. Currently, we use p11_kit_override_system_files(), a p11-kit API intended for testing, but which worked well enough for our purpose, to override the trust module configuration. Unfortunately, starting (presumably [1]) in gnutls 3.8.11, gnutls changed how it works with p11-kit and our override is now ignored. This was likely unintentional, but there appears to be a better way: instead of letting gnutls auto-load the trust module from a hacked configuration, we load the modules outselves using gnutls_pkcs11_init(GNUTLS_PKCS11_FLAG_MANUAL) and gnutls_pkcs11_add_provider(). These appear to be intended for the purpose. We communicate the paths to the scylla executable using an environment variable. This isn't optimal, but is much easier than adding a command line variable since there are multiple levels of command line parsing due to the subtool mechanism. With this, we unlock the possibility to upgrade gnutls to newer versions. [1] `aa5f15a872` Closes scylladb/scylladb#27348	2025-12-04 11:33:51 +02:00
Avi Kivity	1f6e3301e7	dist: systemd: drop deprecated CPU and I/O shares/weight from scylla-server.slice The BlockIOWeight and CPUShares are deprecated. They are only used on RHEL 7, which has reached end-of-life. Their replacements, IOWeight and CPUWeight, are already set in the file. Remove the deprecated settings to reduce noise in the logs. Closes scylladb/scylladb#27222	2025-11-26 06:42:11 +02:00
Yehuda Lebi	a05ebbbfbb	dist/docker: add configurable blocked-reactor-notify-ms parameter Add --blocked-reactor-notify-ms argument to allow overriding the default blocked reactor notification timeout value of 25 ms. This change provides users the flexibility to customize the reactor notification timeout as needed. Fixes: scylladb/scylla-enterprise#5525 Closes scylladb/scylladb#26892	2025-11-11 12:38:40 +02:00
Avi Kivity	8e480110c2	dist: housekeeping: set python.multiprocessing fork mode to "fork" Python 3.14 changed the multiprocessing fork mode to "forkserver", presumably for good reasons. However, it conflicts with our relocatable Python system. "forkserver" forks and execs a Python process at startup, but it does this without supplying our relocated ld.so. The system ld.so detects a conflict and crashes. Fix this by switching back to "fork", which is sufficient for housekeeping's modest needs. Closes scylladb/scylladb#26831	2025-11-05 15:47:38 +03:00
Takuya ASADA	eb30594a60	dist: detect corrupted NUMA topology information There are some environment which has corrupted NUMA topology information, such as some instance types on AWS EC2 with specific Linux kernel images. On such environment, we cannot get HW information correctly from hwloc, so we cannot proceed optimization on perftune. To avoid causing script error, check NUMA topology information and skip running perftune if the information corrupted. Related scylladb/seastar#2925 Closes scylladb/scylladb#26344	2025-10-22 01:11:14 +03:00
Avi Kivity	bb02295695	setup: add the lazytime XFS mount option In `f828fe0d59` ("setup: add the lazytime XFS version") we added the lazytime mount option to /var/lib/scylla, but it was quickly reverted (`8f5e80e61a`) as it caused a regression on CentOS 7. We reinstate it now with a kernel version check. This will avoid the lazytime mount option on CentOS 7, which is unsupported anyway. The lazytime option avoids marking the inode as dirty if it's only for the purpose of updating mtime/ctime. This won't help much while writing sstables (since the write also updates extent information), but may help a little with with commitlog writes, since those are pure overwrites. It likely won't help with the RWF_NOWAIT violations seen in [1], since those are likely due to in-memory locking, not flushing dirty inodes to disk. Tested with an install to Ubuntu 24.04 LTS followed by a scylla_setup run. The lazytime option was added the the .mount file and showed up in the live mount. [1] https://github.com/scylladb/seastar/issues/2974 Closes scylladb/scylladb#26436 Fixes #26002	2025-10-09 15:55:58 +03:00
Robert Bindar	2c74a6981b	Make scylla_io_setup detect request size for best write IOPS We noticed during work on scylladb/seastar#2802 that on i7i family (later proved that it's valid for i4i family as well), the disks are reporting the physical sector sizes incorrectly as 512bytes, whilst we proved we can render much better write IOPS with 4096bytes. This is not the case on AWS i3en family where the reported 512bytes physical sector size is also the size we can achieve the best write IOPS. This patch works around this issue by changing `scylla_io_setup` to parse the instance type out of `/sys/devices/virtual/dmi/id/product_name` and run iotune with the correct request size based on the instance type. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#25315	2025-10-08 14:30:52 +03:00
Avi Kivity	5d1846d783	dist: scylla_raid_setup: don't override XFS block size on modern kernels In `6977064693` ("dist: scylla_raid_setup: reduce xfs block size to 1k"), we reduced the XFS block size to 1k when possible. This is because commitlog wants to write the smallest amount of padding it can, and older Linux could only write a multiple of the block size. Modern Linux [1] can O_DIRECT overwrite a range smaller than a filesystem block. However, this doesn't play well with some SSDs that have 512 byte logical sector size and 4096 byte physical sector size - it causes them to issue read-modify-writes. To improve the situation, if we detect that the kernel is recent enough, format the filesystem with its default block size, which should be optimal. Note that commitlog will still issue sub-4k writes, which can translate to RMW. There, we believe that the amplification is reduced since sequential sub-physical-sector writes can be merged, and that the overhead from commitlog space amplification is worse than the RMW overhead. Tested on AWS i4i.large. fsqual report: ``` memory DMA alignment: 512 disk DMA alignment: 512 filesystem block size: 4096 context switch per write io (size-changing, append, blocksize 4096, iodepth 1): 0.0003 (GOOD) context switch per write io (size-changing, append, blocksize 4096, iodepth 3): 0.7961 (BAD) context switch per write io (size-unchanging, append, blocksize 4096, iodepth 3): 0 (GOOD) context switch per write io (size-unchanging, append, blocksize 4096, iodepth 7): 0.0001 (GOOD) context switch per write io (size-unchanging, append, blocksize 512, iodepth 1): 0.125 (BAD) context switch per write io (size-unchanging, overwrite, blocksize 512, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 3): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 3): 0 (GOOD) context switch per write io (size-changing, append, blocksize 4096, iodepth 1): 0 (GOOD) context switch per write io (size-changing, append, blocksize 4096, iodepth 3): 0.8006 (BAD) context switch per write io (size-unchanging, append, blocksize 4096, iodepth 3): 0.0001 (GOOD) context switch per write io (size-unchanging, append, blocksize 4096, iodepth 7): 0 (GOOD) context switch per write io (size-unchanging, append, blocksize 512, iodepth 1): 0.125 (BAD) context switch per write io (size-unchanging, overwrite, blocksize 512, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 3): 0 (GOOD) context switch per read io (size-changing, append, blocksize 512, iodepth 30): 0 (GOOD) ``` The sub-block overwrite cases are GOOD. In comparison, the fsqual report for 1k (similar): ``` memory DMA alignment: 512 disk DMA alignment: 512 filesystem block size: 1024 context switch per write io (size-changing, append, blocksize 1024, iodepth 1): 0.0005 (GOOD) context switch per write io (size-changing, append, blocksize 1024, iodepth 3): 0.7948 (BAD) context switch per write io (size-unchanging, append, blocksize 1024, iodepth 3): 0.0015 (GOOD) context switch per write io (size-unchanging, append, blocksize 1024, iodepth 7): 0.0022 (GOOD) context switch per write io (size-unchanging, append, blocksize 512, iodepth 1): 0.4999 (BAD) context switch per write io (size-unchanging, overwrite, blocksize 512, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 3): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 3): 0 (GOOD) context switch per write io (size-changing, append, blocksize 1024, iodepth 1): 0 (GOOD) context switch per write io (size-changing, append, blocksize 1024, iodepth 3): 0.798 (BAD) context switch per write io (size-unchanging, append, blocksize 1024, iodepth 3): 0.0012 (GOOD) context switch per write io (size-unchanging, append, blocksize 1024, iodepth 7): 0.0019 (GOOD) context switch per write io (size-unchanging, append, blocksize 512, iodepth 1): 0.5 (BAD) context switch per write io (size-unchanging, overwrite, blocksize 512, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 1): 0 (GOOD) context switch per write io (size-unchanging, overwrite, blocksize 512, O_DSYNC, iodepth 3): 0 (GOOD) context switch per read io (size-changing, append, blocksize 512, iodepth 30): 0 (GOOD) ``` Fixes #25441. [1] `ed1128c2d0` Closes scylladb/scylladb#25445	2025-09-30 17:14:36 +03:00
Yaron Kaikov	0a025d121f	packaging: Add `adduser` as dependnacy As `adduser` command is being used by `/var/lib/dpkg/info/scylla-server.postinst` and similar during rpm post-install. Fixes: https://github.com/scylladb/scylladb/issues/23722 Closes scylladb/scylladb#25928	2025-09-10 21:51:25 +03:00
Yaron Kaikov	d57741edc2	build_docker.sh: enable debug symboles installation Adding the latest scylla.repo location to our docker container, this will allow installation scylla-debuginfo package in case it's needed Fixes: https://github.com/scylladb/scylladb/issues/24271 Closes scylladb/scylladb#25646	2025-09-08 18:39:27 +03:00
Michael Litvak	25fb3b49fa	dist/docker: add dc and rack arguments add --dc and --rack commandline arguments to the scylla docker image, to allow starting a node with a specified dc and rack names in a simple way. This is useful mostly for small examples and demonstrations of starting multiple nodes with different racks, when we prefer not to bother with editing configuration files. The ability to assign nodes to different racks is especially important with RF=Rack enforcing. The previous method to achieve this is to set the snitch to GossipingPropertyFileSnitch and provide a configuration file in /etc/scylla/cassandra-rackdc.properties with the name of the dc and rack. The new dc and rack parameters are implemented similarly by using the snitch GossipingPropertyFileSnitch and writing the dc and rack values to the rackdc properties file. We don't support passing the parameters together with a different snitch, or when mounting a properties file from the host, because we don't want to overwrite it. Example: docker run -d --name scylla1 scylladb/scylla --dc my_dc1 --rack my_rack1 Fixes scylladb/scylladb#23423 Closes scylladb/scylladb#25607	2025-08-24 17:48:07 +03:00
Taras Veretilnyk	15e3980693	docs: Add documentation for the nodetool dropquarantinedsstables command Fixes scylladb/scylladb#19061	2025-08-01 11:46:33 +02:00
Yaron Kaikov	fdcaa9a7e7	dist/common/scripts/scylla_sysconfig_setup: fix `SyntaxWarning: invalid escape sequence` There are invalid escape sequence warnings where raw strings should be used for the regex patterns Fixes: https://github.com/scylladb/scylladb/issues/24915 Closes scylladb/scylladb#24916	2025-07-14 11:20:41 +02:00
Yaron Kaikov	66ff6ab6f9	packaging: add `ps` command to dependancies ScyllaDB container image doesn't have ps command installed, while this command is used by perftune.py script shipped within the same image. This breaks node and container tuning in Scylla Operator. Fixes: #24827 Closes scylladb/scylladb#24830	2025-07-13 17:09:05 +03:00
Avi Kivity	07c5edcc30	tools: add patchelf utility We use patchelf to rewrite the dynamic loader (known as the interpreter) of the binaries we ship, so we can point to our shipped dynamic loader, which is compatible with our binaries, rather than rely on the distribution's dynamic loader, which is likely to be incompatible. Upstream patchelf losing compatibity [1] with Linux 5.17 and below. This change was also picked up by Fedora 42, so we cannot update the toolchain to that distribution until we have an alternative. Here we add a minimal patchelf alternative. It was mostly written by Claude. It is minimal in that it only supports --set-interpreter and --print-interpreter, and works well enough for our needs. We still use the original patchelf for --remove-rpath; this reduces our maintenance needs. [1] `43b75fbc9f` [2] `4b015255d1` Closes scylladb/scylladb#24695	2025-06-30 07:24:05 +03:00
Yaniv Michael Kaul	198ecd8039	Do not perform blkdiscard by default on the disks during RAID setup. This is not needed on clean disks, which is often the case with cloud instances, but can be useful on bare metal servers with disks that were used before. Therefore, the default is to skip blkdiscard operation, which makes overall installation faster. If the user wishes to run it anyway, use the newly introduced --blkdiscard option of scylla_raid_setup to perform it. Note: since we either perform online discard or schedule fstrim, the (previously used) space will gradually get trimmed, this way or another. Fixes: https://github.com/scylladb/scylladb/issues/24470 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#24579	2025-06-26 12:25:38 +02:00
Avi Kivity	37f9cf6de6	dist: rpm: override %_sbindir for Fedora 42 Fedora 42 merged /usr/sbin into /usr/bin [1]. As part of that change the rpm macro %_sbindir was redefined from /usr/sbin to /usr/bin. As a result RPM build on Fedora 42 fails: install.sh places some files into /usr/sbin, while rpmbuild looks for them in /usr/bin. We could resolve this either by following the change and moving the files to /usr/bin as well, or fixing the spec to place the files in /usr/sbin. The former is more difficult: - what about Debian/Ubuntu? - what about older RPM-based distributions (like all RHEL distributions)? - what about scripts that hard-code /usr/sbin/<scylla utility>? So we pick the latter, and redefine %_sbindir to /usr/sbin. Since that directory still exists (as a symlink), installation on systems with merged /usr/bin and /usr/sbin will work. We'll have to address the problem later (likely by installing to either /usr/bin or /usr/sbin depending on context), but for now, this is a simple solution that works everywhere. [1] https://fedoraproject.org/wiki/Changes/Unify_bin_and_sbin Closes scylladb/scylladb#24101	2025-05-16 12:05:29 +02:00
Avi Kivity	092a88c9b9	dist: drop the scylla-env package scylla-env was used to glue together support for older distributions. It hasn't been used for many years. Remove it. Closes scylladb/scylladb#23985	2025-05-09 14:10:00 +03:00
Yaniv Michael Kaul	b374f94b15	pip installation: use --no-cache-dir There are two reasons we may want NOT to use caching of pip deps: 1. When building a container, unless we specifically clean it up, it'll remain, even when we squash the image layers later. 2. When building a container, that cache is not useful, as we squash our containers later (so that layer is not cached really). And our CI cleans up the layers repo anyway. 3. Caching sometimes isn't great, and doesn't ensure we pick up the exact version (or latest) that we wish to... This PR changes two locations in Scylla, both of which (also) build containers, so certainly relevant for 1, 2 above and possibly 3. No real need to backport. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#23822	2025-04-21 13:46:57 +03:00
Avi Kivity	5e1cf90a51	build: replace tools/java submodule with packaged cassandra-stress We no longer use tools/java (scylladb/scylla-tools-java.git) for nodetool or cqlsh; only cassandra-stress. Since that is available in package form install that and excise the tools/java submodule from the source tree. pgo/ is adjusted to use the packaged cassandra-stress (and the cqlsh submodule). A few jmx references are dropped as well. Frozen toolchain regenerated. Optimized clang from https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz Closes scylladb/scylladb#23698	2025-04-15 10:11:28 +03:00
Takuya ASADA	781dec5852	dist/docker: run the container as non-root user Since it is requirement for Red Hat OpenShift Certification, we need to run the container as non-root user. Related scylladb/scylla-pkg#4858 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2025-03-05 23:39:56 +09:00
Takuya ASADA	1abf981a73	dist/docker: switch to UBI9 Switch container base image to UBI9, to prepare for Red Hat OpenShift Certification. Fixes scylladb/scylla-pkg#4858 Signed-off-by: Takuya ASADA <syuu@scylladb.com>	2025-03-05 23:39:56 +09:00
Takuya ASADA	f2a8ae101b	dist/docker: drop hostname package, use Python API We currently depends on hostname command to get local IP, but we can do this on Python API. After the change, we can drop the package. Closes scylladb/scylladb#22909	2025-02-24 15:03:44 +02:00
Takuya ASADA	b5e306047f	dist: fix upgrade error from 2024.1 We need to allow replacing nodetool from scylla-enterprise-tools < 2024.2, just like we did for scylla-tools < 5.5. This is required to make packages able to upgrade from 2024.1. Fixes #22820 Closes scylladb/scylladb#22821	2025-02-13 12:36:24 +02:00
Avi Kivity	5c647408c7	systemd: map libraries close to the executable The Intel Optimizaton Manual states that branches with relative offsets greater than 2GB suffer a penalty. They cite a 6% improvement when this is avoided. Our code doesn't rely heavily on dynamically linked libraries, so I don't expect a similar win, but it's still better to do it than not. Eliminate long branches by asking the dynamic linker to restrict itself to the lower 4GB of the address space. I saw that it maps libraries at 1GB+ addresses, so this satisfies the limitation. Fix is from the Intel Optimization Manual as well. This change was ported from ScyllaDB Enterprise. Closes scylladb/scylladb#22498	2025-02-11 22:16:09 +02:00
Avi Kivity	cf72c31617	treewide: improve bash error reporting bash error handling and reporting is atrocious. Without -e it will just ignore errors. With -e it will stop on errors, but not report where the error happened (apart from exiting itself with an error code). Improve that with the `trap ERR` command. Note that this won't be invoked on intentional error exit with `exit 1`. We apply this on every bash script that contains -e or that it appears trivial to set it in. Non-trivial scripts without -e are left unmodified, since they might intentionally invoke failing scripts. Closes scylladb/scylladb#22747	2025-02-10 18:28:52 +03:00
Yaron Kaikov	93f53f4eb8	dist: support smooth upgrade from enterprise to source availalbe When upgrading for example from `2024.1` to `2025.1` the package name is not identical casuing the upgrade command to fail: ``` Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade scylla -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"' Exit code: 100 Stdout: Selecting previously unselected package scylla. Preparing to unpack .../6-scylla_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb ... Unpacking scylla (2025.1.0~dev-0.20250118.1ef2d9d07692-1) ... Errors were encountered while processing: /tmp/apt-dpkg-install-JbOMav/0-scylla-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb /tmp/apt-dpkg-install-JbOMav/1-scylla-python3_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb /tmp/apt-dpkg-install-JbOMav/2-scylla-server_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb /tmp/apt-dpkg-install-JbOMav/3-scylla-kernel-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb /tmp/apt-dpkg-install-JbOMav/4-scylla-node-exporter_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb /tmp/apt-dpkg-install-JbOMav/5-scylla-cqlsh_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb Stderr: E: Sub-process /usr/bin/dpkg returned an error code (1) ``` Adding `Obsoletes` (for rpm) and `Replaces` (for deb) Fixes: https://github.com/scylladb/scylladb/issues/22420 Closes scylladb/scylladb#22457	2025-02-08 21:56:09 +02:00
Takuya ASADA	fb4c7dc3d8	dist: Support FIPS mode - To make Scylla able to run in FIPS-compliant system, add .hmac files for crypto libraries on relocatable/rpm/deb packages. - Currently we just write hmac value on *.hmac files, but there is new .hmac file format something like this: ``` [global] format-version = 1 [lib.xxx.so.yy] path = /lib64/libxxx.so.yy hmac = <hmac> ``` Seems like GnuTLS rejects fips selftest on .libgnutls.so.30.hmac when file format is older one. Since we need to absolute path on "path" directive, we need to generate .libgnutls.so.30.hmac in older format on create-relocatable-script.py, Signed-off-by: Takuya ASADA <syuu@scylladb.com> Closes scylladb/scylladb#22384	2025-01-26 22:49:21 +02:00
Takuya ASADA	f2a53d6a2c	dist: make p11-kit-trust.so able to work in relocatable package Currently, our relocatable package doesn't contains p11-kit-trust.so since it dynamically loaded, not showing on "ldd" results (Relocatable packaging script finds dependent libraries by "ldd"). So we need to add it on create-relocatable-pacakge.py. Also, we have two more problems: 1. p11 module load path is defined as "/usr/lib64/pkcs11", not referencing to /opt/scylladb/libreloc (and also RedHat variants uses different path than Debian variants) 2. ca-trust-source path is configured on build time (on Fedora), it compatible with RedHat variants but not compatible with Debian variants To solve these problems, we need to override default p11-kit configuration. To do so, we need to add an configuration file to /opt/scylladb/share/pkcs11/modules/p11-kit-trust.module. Also, ofcause p11-kit doesn't reference /opt/scylladb by default, we need to override load path by p11_kit_override_system_files(). On the configuration file, we can specify module load path by "modules: <path>", and also we can specify ca-trust-source path by "x-init-reservied: paths=<path>". Fixes scylladb/scylladb#13904 Closes scylladb/scylladb#22302	2025-01-15 10:09:17 +02:00
Kefu Chai	f8885a4afd	dist/docker,docs: replace "--experimental" with "--experimental-features" The "--experimental" option was removed in commit `f6cca741ea`. Using this deprecated option now causes Scylla to fail with the error: ``` error: the argument ('on') for option '--experimental-features' is invalid ``` So, in this change, let's update the docker entry point script to use `--experimental-features` command line option instead. The related document is updated accordingly. Fixes scylladb/scylladb#22207 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22283	2025-01-14 07:56:38 -05:00
Yaron Kaikov	b74565e83f	dist/common/scripts/scylla_raid_setup: reduce XFS metadata overhead The block size of 1k is significantly increasing metadata overhead with xfs since it reserves space upfront for btree expansion. With CRC disabled, this reservation doesn't happen. Smaller btree blocks reduce the fanout factor, increasing btree height and the reservation size. So block size implies a trade-off between write amplification and metadata size. Bigger blocks, smaller metadata, more write ampl. Smaller blocks, more metadata, and less write ampl. Let's disable both `rmapbt` and `relink` since we replicate data, and we can afford to rebuild a replica on local corruption. Fixes: https://github.com/scylladb/scylladb/issues/22028 Closes scylladb/scylladb#22072	2025-01-07 13:18:21 +02:00
Yaron Kaikov	74c5aabd23	build_docker: add option for building container based on Ubuntu Pro Today our container is based on ubuntu:22.04, we need to build another container based on Ubuntu Pro for FIPS support (currently the latest one is 20.04) The default docker build process doesn't change, if FIPS is required I have added `--type pro` to build a supported container. To enable FIPS there is a need to attach an Ubuntu Pro subscription (it will be done as part of https://github.com/scylladb/scylla-pkg/issues/4186) Closes scylladb/scylladb#21974	2024-12-20 13:09:24 +02:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Yaron Kaikov	3a00ffd2eb	build_docker.sh: remove rsyslog installation and conf It seems that no one is using rsyslog, so there is no point having it inside our container (see https://github.com/scylladb/scylladb/issues/21923#issuecomment-2545191667) Refs: https://github.com/scylladb/scylladb/issues/21923 Closes scylladb/scylladb#21953	2024-12-17 17:34:35 +02:00
Kefu Chai	a9c244ddf7	dist: scylla_io_setup: use raw string to avoid invalid escape sequence Use raw string literals to prevent syntax warnings when using regular expressions with backslash-based patterns. The original code triggered a SyntaxWarning in developer mode (`python3 -Xdev`) due to unescaped backslash characters in regex patterns like '\s'. While CPython typically interprets these silently, strict Python parsing modes raise warnings about potentially unintended escape sequences. This change adds the `r` prefix to string literals containing regex patterns, ensuring consistent behavior across different Python runtime configurations and eliminating unnecessary syntax warning like: ``` /opt/scylladb/scripts/libexec/scylla_io_setup:41: SyntaxWarning: invalid escape sequence '\s' pattern = re.compile(_nocomment + r"CPUSET=\s\"" + _reopt(_cpuset) + _reopt(_smp) + "\s\"") ``` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21839	2024-12-09 19:18:39 +03:00
Takuya ASADA	2b3115ac79	scylla-server.service: drop scylla-jmx.service Since we dropped scylla-jmx at `3cd2a61`, Wants=scylla-jmx.service is not needed anymore. Also we have issue on nonroot mode installation with this line (#21720), we need to drop this now. Fixes #21720 Closes scylladb/scylladb#21721	2024-12-01 14:14:33 +02:00
Takuya ASADA	6fe09a5a16	scylla_raid_setup: support installing semanage on Amazon Linux 2 Since Amazon Linux 2 has different package name for semange, we need to adjust package name. Fixes #21351	2024-11-11 17:27:24 +09:00
Takuya ASADA	7ad5e69c54	scylla_raid_setup: fix failure on SELinux package installation After merged `5a470b2`, we found that scylla_raid_setup fails on offline mode installation. This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect internet. Seems like it occur because of missing "policycoreutils-python-utils" package, which is the package for "semange" command. So we need to implement the relabeling patch without using the command. Fixes #21441	2024-11-11 17:27:24 +09:00
Kefu Chai	961a53f716	dist: systemd: use default KillMode before this change, we specify the KillMode of the scylla-service service unit explicitly to "process". according to according to https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html, > If set to process, only the main process itself is killed (not recommended!). and the document suggests use "control-group" over "process". but scylla server is not a multi-process server, it is a multi-threaded server. so it should not make any difference even if we switch to the recommended "control-group". in the light that we've been seeing "defunct" scylla process after stopping the scylla service using systemd. we are wondering if we should try to change the `KillMode` to "control-group", which is the default value of this setting. in this change, we just drop the setting so that the systemd stops the service by stopping all processes in the control group of this unit are stopped. Refs scylladb/scylladb#21507 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21508	2024-11-09 20:07:11 +02:00

1 2 3 4 5 ...

1642 Commits