Maintenance socket path used for PGO is in the node workdir.
When the node workdir path is too long, the maintenance socket path
(workdir/cql.m) can exceed the Unix domain socket sun_path limit
and failing the PGO training pipeline.
To prevent this:
- pass an explicit --maintenance-socket override
pointing to a short determinitic path in /tmp derived from the MD5
hash of the workdir maintenance socket path
- update maintenance_socket_path to return the matching short path
so that exec_cql.py connects to the right socket
The short path socket files are cleaned up after the cluster stops.
The path is using MD5 hash of the workdir path, so it is deterministic.
Fixes SCYLLADB-1070
Closesscylladb/scylladb#29149
The default 'cassandra' superuser was removed from ScyllaDB, which
broke PGO training. exec_cql.py relied on username/password auth
('cassandra'/'cassandra') to execute setup CQL scripts like auth.cql
and counters.cql.
Switch exec_cql.py to connect via the Unix domain maintenance socket
instead. The maintenance socket bypasses authentication, no credentials
are needed. Additionally, create the 'cassandra' superuser via the
maintenance socket during the populate phase, so that cassandra-stress
keeps working. cassandra-stress hardcodes user=cassandra password=cassandra.
Changes:
- exec_cql.py: replace host/port/username/password arguments with a
single --socket argument; add connect_maintenance_socket() with
wait ready logic
- pgo.py: add maintenance_socket_path() helper; update
populate_auth_conns() and populate_counters() to pass the socket
path to exec_cql.py
Fixes SCYLLADB-1070
Closesscylladb/scylladb#29081
It is considered a dangerous practice with possible unintended
side-effects, affecting later calls to the same function.
Found by CodeQL "Modification of parameter with default".
It was not enabled due to some cqlsh dependency missing.
After 3 years it's hard to say if the thing is fixed or not,
but anyway we don't need another big dependecy while we already
have python driver used exstensively in tests. We use simple
wrapper file exec_cql.py, shared with auth_conns workload to
conveniently read needed preparation statements from the file.
Additionally we switch tablets off as counters don't support
it yet.
It uses some derived roles and permissions
to exercise auth code paths and also creates new
connection with each stress request to exercise
also transport/server.cc connection handling code.
Since a1d7722 tablet keyspaces are not allowed to be repaired via the
old /storage_service/repair_async/{keyspace} API, instead the new
/storage_service/tablets/repair API has to be used. Adjust the repair
code and also add await_completion=true: the script just waits
for the repair to finish immediately after starting it.
Closesscylladb/scylladb#25455
Since 5e1cf90a51
("build: replace tools/java submodule with packaged cassandra-stress")
we run pre-packaged cassandra-stress. As such, we don't need to look for
a Java runtime (which is missing on the frozen toolchain) and can
rely on the cassandra-stress package finding its own Java runtime.
Fix by just dropping all the Java-finding stuff.
Note: Java 11 is in fact present on the frozen toolchain, just
not in a way that pgo.py can find it.
Fixes#24176.
Closesscylladb/scylladb#24178
we updated tools/java/build.xml recently to only build for java-11. so
if
- the `java` executable in `$PATH` points to a java which is neither
java-8 nor java-11.
- java-8 is installed
java-8 is used to execute the cassandra-stress tool. and we would have
following failure:
```
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/cassandra/stress/Stress has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recogniz
es class file versions up to 52.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:621)
```
in order to be compatible with the bytecode targeting java-11, let's run
cassandra-stress with java-11. we do not need to support java-8, because
the new tools/java is now building cassandra-stress targeting java-11 jre.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22142
This patch adds a set of alternator workloads to pgo training
script.
To confirm that added workloads are indeed affecting profile we can compare:
⤖ llvm-profdata show ./build/release-pgo/profiles/workdirs/clustering/prof.profdata
Instrumentation level: IR entry_first = 0
Total functions: 105075
Maximum function count: 1079870885
Maximum internal block count: 2197851358
and
⤖ llvm-profdata show ./build/release-pgo/profiles/workdirs/alternator/prof.profdata
Instrumentation level: IR entry_first = 0
Total functions: 105075
Maximum function count: 5240506052
Maximum internal block count: 9112894084
to see that function counters are on similar levels, they are around 5x higher for alternator
but that's because it combines 5 specific sub-workloads.
To confirm that final profile contains alterantor functions we can inspect:
⤖ llvm-profdata show --counts --function=alternator --value-cutoff 100000 ./build/release-pgo/profiles/merged.profdata
(...)
Instrumentation level: IR entry_first = 0
Functions shown: 356
Total functions: 105075
Number of functions with maximum count (< 100000): 97275
Number of functions with maximum count (>= 100000): 7800
Maximum function count: 7248370728
Maximum internal block count: 13722347326
we can see that 356 functions which symbol name contains word alternator were identified as 'hot' (with max count grater than 100'000). Running:
⤖ llvm-profdata show --counts --function=alternator --value-cutoff 1 ./build/release-pgo/profiles/merged.profdata
(...)
Instrumentation level: IR entry_first = 0
Functions shown: 806
Total functions: 105075
Number of functions with maximum count (< 1): 67036
Number of functions with maximum count (>= 1): 38039
Maximum function count: 7248370728
Maximum internal block count: 13722347326
we can see that 806 alternator functions were executed at least once during training.
And finally to confirm that alternator specific PGO brings any speedups we run:
for workload in read scan write write_gsi write_rmw
do
./build/release/scylla perf-alternator-workloads --smp 4 --cpuset "10,12,14,16" --workload $workload --duration 1 --remote-host 127.0.0.1 2> /dev/null | grep median
done
results BEFORE:
median 258137.51910849303
median absolute deviation: 786.06
median 547.2578202937141
median absolute deviation: 6.33
median 145718.19856685458
median absolute deviation: 5689.79
median 89024.67095807113
median absolute deviation: 1302.56
median 43708.101729598646
median absolute deviation: 294.47
results AFTER:
median 303968.55333940056
median absolute deviation: 1152.19
median 622.4757636209254
median absolute deviation: 8.42
median 198566.0403745328
median absolute deviation: 1689.96
median 91696.44912842038
median absolute deviation: 1891.84
median 51445.356525664996
median absolute deviation: 1780.15
We can see that single node cluster tps increase is typically 13% - 17% with notable exceptions,
improvement for write_gsi is 3% and for write workload whopping 36%.
The increase is on top of CQL PGO.
Write workload is executed more often because it's involved also as data preparation for read and scan.
Some further improvement could be to separate preparation from training as it's done for CQL but it would
be a bit odd if ~3x higher counters for one flow have so big impact.
Additional disclaimers:
- tests are performing exactly the same workloads as in training so there might be some bias
- tests are running single node cluster, more realistic setup will likely show lower improvement
Fixes https://github.com/scylladb/scylla-enterprise/issues/4066
This workload is added to teach PGO about repair.
Tests are inconclusive about its alignment with existing workloads,
because repair doesn't seem utilize 100% of the reactor.
This workload is added to teach PGO about counters.
Tests seem to show it's mostly aligned with existing CQL workloads.
The config YAML is based on the default cassandra-stress schema.
This workload is added to teach PGO about secondary indexes.
Tests seem to show that it's mostly aligned with existing CQL workloads.
The config YAML was copied from one of scylla-cluster-test test cases.
This workload is added to teach PGO about LWT codepaths.
Tests seem to show that it's mostly aligned with existing CQL workloads.
The config YAML was copied from one of scylla-cluster-tests test cases.
This workload is added to teach PGO about streaming.
Tests show that this workload is mostly orthogonal to CQL workloads
(where "orthogonal" means that training on workload A doesn't improve workload
B much, while training on workload A doesn't improve workload B much),
so adding it to the training is quite important.
In contrast to the basic workload, this workload uses clustering
keys, CK range queries, RF=1, logged batches, and more CQL types.
Tests seem to show that this workload is mostly aligned with the existing basic
workload (where "aligned" means that training on workload A improves workload B
about as much as training on workload B).
The config YAML is based on the example YAML attached to cassandra-stress
sources.
Profile-guided optimization consists of the following steps:
1. Build the program as usual, but with with special options (instrumentation
or just some supplementary info tables, depending on the exact flavor of PGO
in use).
2. Collect an execution profile from the special binary by running a
training workload on it.
3. Rebuild the program again, using the collected profile.
This commit introduces a script automating step 2: running PGO training workloads
on Scylla. The contents of training workloads will be added in future commits.
The changes in configure.py responsible for steps 1. and 3. will also appear
in future commits.
As input, the script takes a path to the instrumented binary, a path to a
the output file, and a directory with (optionally) prepopulated datasets for use
in training. The output profile file can be then passed to the compiler to
perform a PGO build.
The script current supports two kinds of PGO instrumentation: LLVM instrumentation
(binary instrumented with -fprofile-generate and -fcs-profile-generate passed to
clang during compilation) and BOLT instrumentation (binary instrumented with
`llvm-bolt -instrument`, with logs from this operation saved to
$binary_path.boltlog)
The actual training workloads for generating the profile will be added in later
commits.