Commit Graph

31 Commits

Author SHA1 Message Date
Avi Kivity
5e1cf90a51 build: replace tools/java submodule with packaged cassandra-stress
We no longer use tools/java (scylladb/scylla-tools-java.git) for
nodetool or cqlsh; only cassandra-stress. Since that is available
in package form install that and excise the tools/java submodule
from the source tree.

pgo/ is adjusted to use the packaged cassandra-stress (and the cqlsh
submodule).

A few jmx references are dropped as well.

Frozen toolchain regenerated.

Optimized clang from

  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz

Closes scylladb/scylladb#23698
2025-04-15 10:11:28 +03:00
Jenkins Promoter
9699c3ded4 Update pgo profiles - aarch64 2025-04-15 04:45:34 +03:00
Jenkins Promoter
8472aa9e53 Update pgo profiles - x86_64 2025-04-15 04:29:24 +03:00
Jenkins Promoter
6c528f5027 Update pgo profiles - aarch64 2025-04-01 04:45:44 +03:00
Jenkins Promoter
3c12029584 Update pgo profiles - x86_64 2025-04-01 04:27:11 +03:00
Jenkins Promoter
d84da3dc11 Update pgo profiles - x86_64 2025-03-15 04:57:28 +02:00
Jenkins Promoter
6e8e2ae333 Update pgo profiles - aarch64 2025-03-15 04:48:49 +02:00
Jenkins Promoter
7b50fbafb3 Update pgo profiles - aarch64 2025-03-01 04:58:49 +02:00
Jenkins Promoter
84e1514152 Update pgo profiles - x86_64 2025-03-01 04:26:11 +02:00
Jenkins Promoter
0d5f5e6c9d Update pgo profiles - x86_64 2025-02-15 20:32:23 +02:00
Jenkins Promoter
9daf50d424 Update pgo profiles - aarch64 2025-02-15 20:32:22 +02:00
Avi Kivity
cf72c31617 treewide: improve bash error reporting
bash error handling and reporting is atrocious. Without -e it will
just ignore errors. With -e it will stop on errors, but not report
where the error happened (apart from exiting itself with an error code).

Improve that with the `trap ERR` command. Note that this won't be invoked
on intentional error exit with `exit 1`.

We apply this on every bash script that contains -e or that it appears
trivial to set it in. Non-trivial scripts without -e are left unmodified,
since they might intentionally invoke failing scripts.

Closes scylladb/scylladb#22747
2025-02-10 18:28:52 +03:00
Jenkins Promoter
9add2ccc41 Update pgo profiles - x86_64 2025-02-05 08:44:54 +02:00
Jenkins Promoter
c7660e5962 Update pgo profiles - aarch64 2025-02-05 07:51:46 +02:00
Michał Chojnowski
bea434f417 pgo: disable tablets for training with secondary index, lwt and counters
As of right now, materialized views (and consequently secondary
indexes), lwt and counters are unsupported or experimental with tablets.
Since by defaults tablets are enabled, training cases using those
features are currently broken.

The right thing to do here is to disable tablets in those cases.

Fixes https://github.com/scylladb/scylladb/issues/22638

Closes scylladb/scylladb#22661
2025-02-04 15:38:53 +02:00
Jenkins Promoter
f5f15c6d07 Update pgo profiles - aarch64 2025-01-15 04:49:45 +02:00
Jenkins Promoter
f021e16d0c Update pgo profiles - x86_64 2025-01-15 04:26:42 +02:00
Jenkins Promoter
a7d8d21e86 Update pgo profiles - aarch64 2025-01-12 15:27:50 +02:00
Jenkins Promoter
b4ca9489c4 Update pgo profiles - x86_64 2025-01-12 15:05:40 +02:00
Kefu Chai
de42dce4c4 pgo: use java-11 when running cassandra-stress
we updated tools/java/build.xml recently to only build for java-11. so
if

- the `java` executable in `$PATH` points to a java which is neither
  java-8 nor java-11.
- java-8 is installed

java-8 is used to execute the cassandra-stress tool. and we would have
following failure:

```
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/cassandra/stress/Stress has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recogniz
es class file versions up to 52.0
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:756)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:473)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:74)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:369)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:363)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:362)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:621)
```

in order to be compatible with the bytecode targeting java-11, let's run
cassandra-stress with java-11. we do not need to support java-8, because
the new tools/java is now building cassandra-stress targeting java-11 jre.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22142
2025-01-02 16:56:29 +02:00
Kefu Chai
6adf70ec03 build: cmake: add CMake options for PGO support
- "Scylla_BUILD_INSTRUMENTED" option

  Scylla_BUILD_INSTRUMENTED allows us to instrument the code at
  different level, namely, IR, and CSIR. this option mirrors
  "--pgo" and "--cspgo" options in `configure.py` . please note,
  the instrumentation at the frontend is not supported, as the IR
  based instrumentation is better when it comes to the use case of
  optimization for performance.
  see https://lists.llvm.org/pipermail/llvm-dev/2015-August/089044.html
  for the rationales.

- "Scylla_PROFDATA_FILE" option

  this option allows us to specify the profile data previous generated
  with the "Scylla_BUILD_INSTRUMENTED" option. this option mirrors
  the `--use-profile` option in `configure.py`, but it does not
  take the empty option as a special case and consider it as a file
  fetched from Git LFS. that will be handled by another option in a
  follow-up change. please note, one cannot use
  -DScylla_BUILD_INSTRUMENTED=PGO and -DScylla_PROFDATA_FILE=...
  at the same time. clang just does not allow this. but CSPGO is fine.

- "Scylla_PROFDATA_COMPRESSED_FILE" option

  this option allows us to specify the compressed profile data previouly
  generated with the "Scylla_BUILD_INSTRUMENTED" option. along with
  "Scylla_PROFDATA_FILE", this option mirros the functionality of
  `--use-profile` in `configure.py`. the goal is to ensure user always
  gets the result with the specified options. if anything goes wrong,
  we just error out.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-12-27 16:16:04 +08:00
Michał Chojnowski
a868b44ad8 configure.py: introduce profile-guided optimization
This commit enables profile-guided optimizations (PGO) in the Scylla build.

A full LLVM PGO requires 3 builds:
1. With -fprofile-generate to generate context-free (pre-inlining) profile. This
profile influences inlining, indirect-call promotion and call graph
simplifications.
2. With -fprofile-use=results_of_build_1 -fcs-profile-generate to generate
context-sensitive (post-inlining) profile. This profile influences post-inline
and codegen optimizations.
3. With -fprofile-use=merged_results_of_builds_1_2 to build the final binary
with both profiles.

We do all three in one ninja call by adding release-pgo and release-cs-pgo
"stages" to release. They are a copy of regular release mode, just with the
flags described above added. With the full course, release objects depend on the
profile file produced by build/release-cs-pgo/scylla, while release-cs-pgo
depends on the profile file generated by build/release-pgo/scylla.

The stages are orthogonal and enabled with separate options. It's recommended
to run them both for full performance, but unfortunately each one adds a full
build of scylla to the compile time, so maybe we can drop one of them in the
future if it turns out e.g. that regular PGO doesn't have a big effect.

It's strongly recommended to combine PGO with LTO. The latter enables the entire
class of binary layout optimizations, which for us is probably the most
important part of the entire thing.
2024-12-27 16:16:04 +08:00
Marcin Maliszkiewicz
80989556ac pgo: add alternator workloads training
This patch adds a set of alternator workloads to pgo training
script.

To confirm that added workloads are indeed affecting profile we can compare:

⤖ llvm-profdata show ./build/release-pgo/profiles/workdirs/clustering/prof.profdata

Instrumentation level: IR  entry_first = 0
Total functions: 105075
Maximum function count: 1079870885
Maximum internal block count: 2197851358

and

⤖ llvm-profdata show ./build/release-pgo/profiles/workdirs/alternator/prof.profdata

Instrumentation level: IR  entry_first = 0
Total functions: 105075
Maximum function count: 5240506052
Maximum internal block count: 9112894084

to see that function counters are on similar levels, they are around 5x higher for alternator
but that's because it combines 5 specific sub-workloads.

To confirm that final profile contains alterantor functions we can inspect:

⤖ llvm-profdata show --counts --function=alternator --value-cutoff 100000 ./build/release-pgo/profiles/merged.profdata
(...)
Instrumentation level: IR  entry_first = 0
Functions shown: 356
Total functions: 105075
Number of functions with maximum count (< 100000): 97275
Number of functions with maximum count (>= 100000): 7800
Maximum function count: 7248370728
Maximum internal block count: 13722347326

we can see that 356 functions which symbol name contains word alternator were identified as 'hot' (with max count grater than 100'000). Running:

⤖ llvm-profdata show --counts --function=alternator --value-cutoff 1 ./build/release-pgo/profiles/merged.profdata
(...)
Instrumentation level: IR  entry_first = 0
Functions shown: 806
Total functions: 105075
Number of functions with maximum count (< 1): 67036
Number of functions with maximum count (>= 1): 38039
Maximum function count: 7248370728
Maximum internal block count: 13722347326

we can see that 806 alternator functions were executed at least once during training.

And finally to confirm that alternator specific PGO brings any speedups we run:

for workload in read scan write write_gsi write_rmw
do
./build/release/scylla perf-alternator-workloads --smp 4 --cpuset "10,12,14,16" --workload $workload --duration 1 --remote-host 127.0.0.1 2> /dev/null | grep median
done

results BEFORE:

median 258137.51910849303
median absolute deviation: 786.06
median 547.2578202937141
median absolute deviation: 6.33
median 145718.19856685458
median absolute deviation: 5689.79
median 89024.67095807113
median absolute deviation: 1302.56
median 43708.101729598646
median absolute deviation: 294.47

results AFTER:

median 303968.55333940056
median absolute deviation: 1152.19
median 622.4757636209254
median absolute deviation: 8.42
median 198566.0403745328
median absolute deviation: 1689.96
median 91696.44912842038
median absolute deviation: 1891.84
median 51445.356525664996
median absolute deviation: 1780.15

We can see that single node cluster tps increase is typically 13% - 17% with notable exceptions,
improvement for write_gsi is 3% and for write workload whopping 36%.
The increase is on top of CQL PGO.

Write workload is executed more often because it's involved also as data preparation for read and scan.
Some further improvement could be to separate preparation from training as it's done for CQL but it would
be a bit odd if ~3x higher counters for one flow have so big impact.

Additional disclaimers:
 - tests are performing exactly the same workloads as in training so there might be some bias
 - tests are running single node cluster, more realistic setup will likely show lower improvement

Fixes https://github.com/scylladb/scylla-enterprise/issues/4066
2024-12-27 16:16:04 +08:00
Michał Chojnowski
95c8d88b96 pgo: add a repair workload
This workload is added to teach PGO about repair.
Tests are inconclusive about its alignment with existing workloads,
because repair doesn't seem utilize 100% of the reactor.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
1c9ce0a9ee pgo: add a counters workload
This workload is added to teach PGO about counters.
Tests seem to show it's mostly aligned with existing CQL workloads.

The config YAML is based on the default cassandra-stress schema.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
47dc0399cb pgo: add a secondary index workload
This workload is added to teach PGO about secondary indexes.
Tests seem to show that it's mostly aligned with existing CQL workloads.

The config YAML was copied from one of scylla-cluster-test test cases.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
e67f4a5c51 pgo: add a LWT workload
This workload is added to teach PGO about LWT codepaths.
Tests seem to show that it's mostly aligned with existing CQL workloads.

The config YAML was copied from one of scylla-cluster-tests test cases.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
e217c124a6 pgo: add a decommission workload
This workload is added to teach PGO about streaming.
Tests show that this workload is mostly orthogonal to CQL workloads
(where "orthogonal" means that training on workload A doesn't improve workload
B much, while training on workload A doesn't improve workload B much),
so adding it to the training is quite important.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
65abecaede pgo: add a clustering workload
In contrast to the basic workload, this workload uses clustering
keys, CK range queries, RF=1, logged batches, and more CQL types.
Tests seem to show that this workload is mostly aligned with the existing basic
workload (where "aligned" means that training on workload A improves workload B
about as much as training on workload B).

The config YAML is based on the example YAML attached to cassandra-stress
sources.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
c1297dbcd2 pgo: add a basic workload
This commit adds the default cassandra-stress workload to the PGO training
suite.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
f73b122de3 pgo: introduce a PGO training script
Profile-guided optimization consists of the following steps:
1. Build the program as usual, but with with special options (instrumentation
or just some supplementary info tables, depending on the exact flavor of PGO
in use).
2. Collect an execution profile from the special binary by running a
training workload on it.
3. Rebuild the program again, using the collected profile.

This commit introduces a script automating step 2: running PGO training workloads
on Scylla. The contents of training workloads will be added in future commits.
The changes in configure.py responsible for steps 1. and 3. will also appear
in future commits.

As input, the script takes a path to the instrumented binary, a path to a
the output file, and a directory with (optionally) prepopulated datasets for use
in training. The output profile file can be then passed to the compiler to
perform a PGO build.

The script current supports two kinds of PGO instrumentation: LLVM instrumentation
(binary instrumented with -fprofile-generate and -fcs-profile-generate passed to
clang during compilation) and BOLT instrumentation (binary instrumented with
`llvm-bolt -instrument`, with logs from this operation saved to
$binary_path.boltlog)

The actual training workloads for generating the profile will be added in later
commits.
2024-12-27 16:16:04 +08:00