Commit Graph

7 Commits

Author SHA1 Message Date
Michał Chojnowski
bea434f417 pgo: disable tablets for training with secondary index, lwt and counters
As of right now, materialized views (and consequently secondary
indexes), lwt and counters are unsupported or experimental with tablets.
Since by defaults tablets are enabled, training cases using those
features are currently broken.

The right thing to do here is to disable tablets in those cases.

Fixes https://github.com/scylladb/scylladb/issues/22638

Closes scylladb/scylladb#22661
2025-02-04 15:38:53 +02:00
Michał Chojnowski
95c8d88b96 pgo: add a repair workload
This workload is added to teach PGO about repair.
Tests are inconclusive about its alignment with existing workloads,
because repair doesn't seem utilize 100% of the reactor.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
1c9ce0a9ee pgo: add a counters workload
This workload is added to teach PGO about counters.
Tests seem to show it's mostly aligned with existing CQL workloads.

The config YAML is based on the default cassandra-stress schema.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
47dc0399cb pgo: add a secondary index workload
This workload is added to teach PGO about secondary indexes.
Tests seem to show that it's mostly aligned with existing CQL workloads.

The config YAML was copied from one of scylla-cluster-test test cases.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
e67f4a5c51 pgo: add a LWT workload
This workload is added to teach PGO about LWT codepaths.
Tests seem to show that it's mostly aligned with existing CQL workloads.

The config YAML was copied from one of scylla-cluster-tests test cases.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
65abecaede pgo: add a clustering workload
In contrast to the basic workload, this workload uses clustering
keys, CK range queries, RF=1, logged batches, and more CQL types.
Tests seem to show that this workload is mostly aligned with the existing basic
workload (where "aligned" means that training on workload A improves workload B
about as much as training on workload B).

The config YAML is based on the example YAML attached to cassandra-stress
sources.
2024-12-27 16:16:04 +08:00
Michał Chojnowski
f73b122de3 pgo: introduce a PGO training script
Profile-guided optimization consists of the following steps:
1. Build the program as usual, but with with special options (instrumentation
or just some supplementary info tables, depending on the exact flavor of PGO
in use).
2. Collect an execution profile from the special binary by running a
training workload on it.
3. Rebuild the program again, using the collected profile.

This commit introduces a script automating step 2: running PGO training workloads
on Scylla. The contents of training workloads will be added in future commits.
The changes in configure.py responsible for steps 1. and 3. will also appear
in future commits.

As input, the script takes a path to the instrumented binary, a path to a
the output file, and a directory with (optionally) prepopulated datasets for use
in training. The output profile file can be then passed to the compiler to
perform a PGO build.

The script current supports two kinds of PGO instrumentation: LLVM instrumentation
(binary instrumented with -fprofile-generate and -fcs-profile-generate passed to
clang during compilation) and BOLT instrumentation (binary instrumented with
`llvm-bolt -instrument`, with logs from this operation saved to
$binary_path.boltlog)

The actual training workloads for generating the profile will be added in later
commits.
2024-12-27 16:16:04 +08:00