This change allows yielding during hashing computations to prevent
stalls.
The performance of this solution was compared with the previous
implementation that used one alien thread and the implementation
after the alien thread was reverted. The results (median) of
`perf-cql-raw` with `--connection-per-request 1 --smp 10` parameters
are as follows:
- Alien thread: 41.5 new connections/s per shard
- Reverted alien thread: 244.1 new connections/s per shard
- This commit (yielding in hashing): 198.4 new connections/s per shard
The alien thread is limited by a single-core hashing throughput,
which is roughly 400-500 hashes/s in the test environment. Therefore,
with smp=10, the throughput is below 50 hashes/s, and the difference
between the alien thread and other solutions further increases with
higer smp.
The roughly 20% performance deterioration compared to
the old implementation without the alien thread comes from the fact
that the new hashing algorithm implemented in `utils/crypt_sha512.cc`
performs an expensive self-verification and stack cleanup.
On the other hand, with smp=10 the current implementation achieves
roughly 5x higher throughput than the alien thread. In addition,
due to yielding added in this commit, the algorithm is expected
to provide similar protection from stalls as the alien thread did.
In a test that in parallel started a cassandra-stress workload and
created thousands of new connections using python-driver, the values of
`scylla_reactor_stalls_count` metric were as follows:
- Alien thread: 109 stalls/shard total
- Reverted alien thread: 13186 stalls/shard total
- This commit (yielding in hashing): 149 stalls/shard total
Similarly, the `scylla_scheduler_time_spent_on_task_quota_violations_ms`
values were:
- Alien thread: 1087 ms/shard total
- Reverted alien thread: 72839 ms/shard total
- This commit (yielding in hashing): 1623 ms/shard total
To summarize, yielding during hashing computations achieves similar
throughput to the old solution without the alien thread but also
prevents stalls similarly to the alien thread.
Fixes: scylladb/scylladb#26859
Refs: scylladb/scylla-enterprise#5711
Change `sha512crypt` and `__crypt_sha512` to coroutines to allow
yielding during hash computations later in this patch series.
Refs: scylladb/scylladb#26859
This patch imports the `crypt_sha512.c` file from the musl library.
We need it to incorporate yielding in the `crypt_r` function to avoid
reactor stalls during long hashing computations.
Before this patch series, ScyllaDB used SHA-512 hashing provided
by the `crypt_r` function, which in our case meant using the implementation
from the `libxcrypt` library. Adding yielding to this `libxcrypt`
implementation is problematic, both due to licensing (LGPL) and because the
implementation is split into many functions across multiple files. In
contrast, the SHA-512 implementation from `musl libc` has a more
permissive license and is concise, which makes it easier to incorporate
into the ScyllaDB codebase.
Both `crypt_sha512.c` and musl license are obtained from
git.musl-libc.org:
- https://git.musl-libc.org/cgit/musl/tree/src/crypt/crypt_sha512.c
- https://git.musl-libc.org/cgit/musl/tree/COPYRIGHT
Import commit:
commit 1b76ff0767d01df72f692806ee5adee13c67ef88
Author: Alex Rønne Petersen <alex@alexrp.com>
Date: Sun Oct 12 05:35:19 2025 +0200
s390x: shuffle register usage in __tls_get_offset to avoid r0 as address
Refs: scylladb/scylladb#26859