524 Commits

Author SHA1 Message Date
Botond Dénes
555cfbcd38 Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity
Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads.
- schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.

No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway)

Closes scylladb/scylladb#29990

* github.com:scylladb/scylladb:
  treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
  scylla-gdb: read shard count from smp::_this_smp instead of smp::count
  schema_builder: make shard_count an explicit constructor parameter
2026-05-27 09:42:06 +03:00
Avi Kivity
8010e408a2 treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
Replace all uses of the deprecated seastar::smp::count with
this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards()
across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call
sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default
  parameter value. This is safe since all callers are on reactor threads,
  but the expression is now evaluated at each call site rather than being
  a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh,
  ent/encryption/encryption.cc: used in default member initializers and
  constructor member-init-lists. Objects are always constructed on reactor
  threads.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.
2026-05-26 17:35:20 +03:00
Avi Kivity
f165b396fd schema_builder: make shard_count an explicit constructor parameter
A recent Seastar update deprecated smp::count and introduced
this_smp_shard_count() as a replacement. One difference is that
this_smp_shard_count() wants to run on a reactor thread.

This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE)
that nevertheless use a schema, as the schema_builder constructor
references smp::count. If we replace it with this_smp_shard_count()
then it will crash when running without a reactor.

To fix, remove the implicit this_smp_shard_count() call from raw_schema's
constructor and require callers to pass shard_count explicitly to
schema_builder. This allows tests that don't run on a reactor thread
to construct schemas without crashing.

Production code and reactor-based tests pass this_smp_shard_count().
Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test,
wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test)
pass a fixed shard count of 1.

Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE)
but also contains one plain BOOST_AUTO_TEST_CASE
(test_empty_key_view_comparison) that constructs a schema_builder without
a reactor context. This test also receives a fixed shard count of 1.
2026-05-26 11:55:56 +03:00
Botond Dénes
597d4252dc types: abstract_type::from_string() switch to fragmented buffers (interface)
Change input: str::string_view -> utils::chunked_string_view.
Change return value: bytes -> managed_bytes.

This patch only changes the interface, with some to_bytes() sprinkled in
the internals to deal with recursive calls.
Internals will be updated in the next patch, to keep the churn of
updating callers separate from the actually important changes.
2026-05-26 09:08:06 +03:00
Andrzej Jackowski
f8156702de tree: add missing -present to copyright headers
~2076 files used "Copyright (C) YYYY-present ScyllaDB" while
~88 files used "Copyright (C) YYYY ScyllaDB". This
inconsistency leads to unnecessary code review discussions
and gradual spread of the less common format.

Standardize all ScyllaDB copyright headers to use -present.

Fixes SCYLLADB-1984

Closes scylladb/scylladb#29876
2026-05-21 10:57:42 +02:00
Marcin Maliszkiewicz
83823149e9 Merge 'audit: implement audit_rules config' from Andrzej Jackowski
This patch series adds `audit_rules`, a new audit configuration option for fine-grained, role-aware audit filtering with per-rule sink routing. Rules can be configured in `scylla.yaml` or updated live through `system.config` without restarting the node. Each rule specifies target sinks (`table`, `syslog`), statement categories, qualified table name patterns, and role patterns. Table and role patterns use POSIX `fnmatch` with extended glob syntax. For table-scoped categories (`DML`, `DDL`, `QUERY`), a rule matches only when the category, role, and qualified table name all match. For table-independent categories (`AUTH`, `ADMIN`, `DCL`), the table filter is ignored. Empty category or role lists match nothing; an empty table list matches nothing only for table-scoped categories. The new rules are additive with the existing `audit_categories`, `audit_keyspaces`, and `audit_tables` settings: both mechanisms are evaluated for each audit event, and the final sink set is the union of all matches.

To avoid evaluating glob patterns on every audit event, audit rules use a preprocessed cache of known roles and tables. The cache is kept in sync through group0 role/table snapshots, role-change notifications, and schema migration notifications. For known entities, rule matching uses precomputed role/table rule sets; unknown entities fall back to direct rule evaluation. When `audit_rules` is empty, per-event rule matching returns immediately and does not evaluate glob patterns. Audit still keeps known role/table metadata in sync while audit is enabled, so rules can be enabled later through live configuration updates without restarting the node.

**Performance**
Measured with `perf-simple-query --smp 1 --duration 100` against a null syslog socket. Results show no regression when audit is disabled, and audit-rules performance has at most 1% more instructions than legacy config for equivalent workloads:

```
===============================================================================================================================================================================
Configuration                                     | Binary     |         throughput (tps) | insns/op                 | cpu_cycles/op            | alloc/op | logal/op | task/op
===============================================================================================================================================================================
audit=none [1]                                    | baseline   |                 206922.4 |                  36591.6 |                  15348.3 |     58.1 |      0.0 |    14.1
audit=none [1]                                    | this PR    |        207856.4  (+0.5%) |         36544.9  (-0.1%) |         15274.0  (-0.5%) |     58.1 |      0.0 |    14.1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit=syslog keyspaces=ks [2]                     | baseline   |                  94871.8 |                  54163.0 |                  27172.4 |     72.0 |      0.0 |    24.0
audit=syslog keyspaces=ks [2]                     | this PR    |         96138.4  (+1.3%) |         54072.3  (-0.2%) |         26699.3  (-1.7%) |     72.0 |      0.0 |    24.0
audit=syslog audit-rules=ks [3]                   | this PR    |         95142.1  (+0.3%) |         54457.8  (+0.5%) |         26953.8  (-0.8%) |     72.0 |      0.0 |    24.0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit=syslog keyspaces=ks-non-existent [4]        | baseline   |                 213997.8 |                  36735.6 |                  14848.1 |     58.1 |      0.0 |    14.1
audit=syslog keyspaces=ks-non-existent [4]        | this PR    |        219297.2  (+2.5%) |         36667.3  (-0.2%) |         14500.1  (-2.3%) |     58.1 |      0.0 |    14.1
audit=syslog audit-rules=ks-non-existent [5]      | this PR    |        211038.7  (-1.4%) |         36999.7  (+0.7%) |         15048.6  (+1.4%) |     58.1 |      0.0 |    14.1
===============================================================================================================================================================================

[1] ./scylla perf-simple-query --smp 1 --duration 100 --audit "none"
[2] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock"
[3] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks.*"],"roles":["*"]}]' --audit-unix-socket-path "/tmp/audit-null.sock"
[4] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks-non-existent" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock"
[5] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks-non-existent.*"],"roles":["*"]}]' --audit-unix-socket-path "/tmp/audit-null.sock"

audit-null.sock was created with `socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null`
```

Fixes: SCYLLADB-1430
No backport: new feature

Closes scylladb/scylladb#29267

* github.com:scylladb/scylladb:
  test: alternator: audit: rules filtering and batch bypass
  test: perf: add --audit-rules option to perf-simple-query
  docs: add audit rules section to the auditing guide
  test: audit: cover role and schema cache notifications
  test: audit: cover audit rules cluster behavior
  audit: rebuild rule caches on group0 snapshot and role changes
  audit: refresh rule caches on schema, role, and config changes
  audit: route matching rules to configured sinks
  test: cover preprocessed audit rule cache
  audit: add preprocessed rule matching cache
  audit: pass sink targets to storage helpers
  test: audit: cover rule matching semantics
  audit: add rule matching and sink helpers
  test: audit: cover audit_rules configuration
  config: add live audit_rules option
  test: cover audit rule parsing and validation
  audit: define audit_rule type with parsing and validation
2026-05-20 14:10:45 +02:00
Avi Kivity
6df04c9e5b Update seastar submodule
Changed seastar::http::experimental to seastar::http to reflect
graduation of the seastar http API.

Changed call to seastar::rename_file() (in sstables/storage.cc,
sstables/sstable_directory.cc, sstable/sstables.cc and
db/hints/internal/hint_storage.cc) to reflect new default parameter.

Updated scylla_gdb test helper get_task() to work with updated
accept loop in Seatar. This is just test code (attempts to find
a task to operate on), not used in real scylla-gdb.py work, but
nevertheless the adjustment keeps backward compatibility.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1798
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2043

* seastar 485a62b2...510f3148 (43):
  > reactor_backend: fix iocb double-free and shutdown hang during AIO teardown
  > file: fix default DMA alignment
  > http: add to_reply() to redirect_exception with extra-header support
  > core: propagate syscall errors via `coroutine::exception`
  > file: assert dma alignments are powers of two
  > doc: Document undocumented io_tester features and fix output example
  > backtrace: print the build_id along with the backtrace
  > reactor: default to oneline backtraces
  > Merge 'json: formatter: support types with user-defined conversion to sstring' from Benny Halevy
    tests: json_formatter: test formatter::write with string types
    json: formatter: support types with user-defined conversion to sstring
  > httpd_test: fix build failure with Seastar_SSTRING=OFF
  > net/tls: introduce ssl_call wrapper for SSL I/O
  > build: disable unused command line argument error for C++ module
  > coroutine/generator: fix setup of generator's waiting task
  > tests/tls: set 1000-day validity for self-signed CA cert
  > net: tls: openssl: disable certificate compression
  > reactor: reduce steady_clock::now() calls per scheduling quantum
  > fair_queue: remove notify_request_finished()
  > loop: use small_vector for parallel_for_each_state incomplete futures
  > dodge false sharing in spinlock
  > Merge 'Handle nowait support for reads and writes independently' from Pavel Emelyanov
    file: Change nowait_works mode detection
    file: Introduce read-only nowait_mode
    filesystem: Make nowait_works bit a enum class too
    file: Make nowait_works bit a enum class
  > Merge 'net/tls: improve OpenSSL error queue hygiene' from Gellért Peresztegi-Nagy
    net/tls: assert clean error queue before SSL operations
    net/tls: clear error queue after successful SSL operations
    net/tls: clear error queue after successful SSL_CTX_new
    net/tls: drain error queue on unexpected error codes
    net/tls: use make_openssl_error for BIO creation failure
  > vla.hh: add missing includes
  > Merge 'smp: make smp::count non-static' from Avi Kivity
    smp: convert all smp::count usages to instance-aware alternatives
    smp: add per-instance shard_count and this_smp() infrastructure
    disk_params: document pre-init smp::count access with explicit 0
    reactor_backend: document pre-init smp::count access with explicit 0
    tests: alien_test: pass shard count to alien thread explicitly
  > build: fix cmake missing ninja on Ubuntu 26.04
  > rpc: Fix uint64 wraparound of expired timeout in send_entry()
  > Merge 'Generalize some RPC tests' from Pavel Emelyanov
    tests: Generalize async connection-based scheduling RPC tests
    tests: Generalize sync connection-based scheduling RPC tests
    tests: Remove redundant variadic/nonvariadic RPC tuple tests
    tests: Generalize max timeout RPC tests
  > net: tls: openssl: Share BIO ptrs across shards
  > http: fix compilation on clang 22 with c++26
  > build: openssl tools needed for test cert generation
  > reactor: support rename2
  > future: fix forwarding of reference types
  > Merge 'Zero-copy http chunked data sink' from Pavel Emelyanov
    http: Make chunked data sink zero-copy
    tests/prometheus_http: Rewrite on top of http::client
    tests/httpd: Rewrite content_length_limit on top of http::client
  > tests: Replace ad-hoc http_consumer with production HTTP parser
  > Merge 'co_return to accept same expressions and types as return' from Alexey Bashtanov
    tests/unit/{coroutines,futures}: strict types on co_return and set_value
    api: introduce version 10:
    core/{coroutine,future}: make `co_return` more strict with types
    core/{coroutine,future}: preparations to fix `co_return` type semantics
  > Merge 'Perftune.py: add special handling for mlx5 rss queues number calculation' from Vladislav Zolotarov
    perftune.py: NetPerfTuner: enhance RSS (a.k.a. "Rx") queues accounting for mlx5 devices
    perftune.py: update docstring of NetPerfTuner.__get_rps_cpus() method
    perftune.py: add a method that parses and models the output of the 'ethtool -l' command for a given interface
  > httpd: rewrite do_accepts/do_accept_one as coroutines
  > file: add mmap support to file
  > http: Move client code out of experimental namespace
  > file: add hugetlbfs support to file system detection
  > tests: Replace test_source_impl with util::as_input_stream
  > tests: Replace buf_source_impl with util::as_input_stream
  > Merge 'rpc_tester: expose throuput for rpc tester' from Marcin Szopa
    rpc_tester: remove unused payload size variable from job_rpc_streaming class
    rpc_tester: add start time tracking for throughput calculation, print throughput and msg/s for job_rpc
    rpc_tester: refactor result emission to use dedicated functions for messages and throughput
  > iostream: cast first argument of `std::min` to `size_t`

Closes scylladb/scylladb#29952
2026-05-20 13:47:12 +03:00
Andrzej Jackowski
f39c48648a test: perf: add --audit-rules option to perf-simple-query
Allow perf-simple-query to compare audit-rule matching
with the category/keyspace/table audit filters under
the same workload.

Register a hardcoded "tester" role with the audit cache
so rules targeting that role exercise the preprocessed
fast path.

The new option was used to measure audit-rules
performance against the category/keyspace/table audit
config. The results are as follows:

===============================================================================================================================================================================
Configuration                                     | Binary     |         throughput (tps) | insns/op                 | cpu_cycles/op            | alloc/op | logal/op | task/op
===============================================================================================================================================================================
audit=none [1]                                    | baseline   |                 206922.4 |                  36591.6 |                  15348.3 |     58.1 |      0.0 |    14.1
audit=none [1]                                    | this PR    |        207856.4  (+0.5%) |         36544.9  (-0.1%) |         15274.0  (-0.5%) |     58.1 |      0.0 |    14.1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit=syslog keyspaces=ks [2]                     | baseline   |                  94871.8 |                  54163.0 |                  27172.4 |     72.0 |      0.0 |    24.0
audit=syslog keyspaces=ks [2]                     | this PR    |         96138.4  (+1.3%) |         54072.3  (-0.2%) |         26699.3  (-1.7%) |     72.0 |      0.0 |    24.0
audit=syslog audit-rules=ks [3]                   | this PR    |         95142.1  (+0.3%) |         54457.8  (+0.5%) |         26953.8  (-0.8%) |     72.0 |      0.0 |    24.0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit=syslog keyspaces=ks-non-existent [4]        | baseline   |                 213997.8 |                  36735.6 |                  14848.1 |     58.1 |      0.0 |    14.1
audit=syslog keyspaces=ks-non-existent [4]        | this PR    |        219297.2  (+2.5%) |         36667.3  (-0.2%) |         14500.1  (-2.3%) |     58.1 |      0.0 |    14.1
audit=syslog audit-rules=ks-non-existent [5]      | this PR    |        211038.7  (-1.4%) |         36999.7  (+0.7%) |         15048.6  (+1.4%) |     58.1 |      0.0 |    14.1
===============================================================================================================================================================================

[1] ./scylla perf-simple-query --smp 1 --duration 100 --audit "none"
[2] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks" --audit-categories "DCL...
[3] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categorie...
[4] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks-non-existent" --audit-ca...
[5] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categorie...

audit-null.sock was created with `socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null`

Refs SCYLLADB-1430
2026-05-20 06:55:15 +02:00
Botond Dénes
c3daa6379c sstables: refactor parse_path() to return std::expected<> instead of throwing
make_entry_descriptor() and the two overloads of parse_path() used to signal
parse failures by throwing malformed_sstable_exception, which made parse_path()
expensive to use as a probe (e.g. to classify directory entries).

Change make_entry_descriptor() and both parse_path() overloads to return
std::expected<T, sstring>, where the sstring carries the error message on
failure, eliminating the exception overhead at probe call sites.

Call sites that previously caught malformed_sstable_exception to treat the
path as a non-SSTable file (utils/directories.cc, db/snapshot/backup_task.cc,
tools/scylla-sstable.cc) now check the expected result directly.

Call sites where a parse failure is a genuine error (sstable_directory.cc,
sstables.cc, tools/schema_loader.cc, tools/scylla-sstable.cc) re-throw
explicitly as malformed_sstable_exception using the error string, preserving
the existing error propagation behaviour.
2026-05-11 11:58:14 +03:00
Nadav Har'El
e277f747bd Merge 'Make collection unfreezing more efficient' from Botond Dénes
Introduce `read_from_collection_cell_view()` which reads a `collection_mutation` directly from the IDL representation of a collection (`ser::collection_cell_view`). This cuts down the number of allocations required drastically compared to the current method of:

    IDL -> collection_mutatio_description -> collection_mutation

Reduces the number of allocations to unfreeze a collection from O(collection_cell_count) -> O(1) (actually, due to buffer fragmentation, it is O(collection_size)).
The new method is used when unfreezing frozen mutations and frozen mutation fragments. This is on the hot path: all writes with collections benefit.

Add a `--collection` flag to `perf-simple-query` to allow measuring the performance improvement of this PR.
With  `dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write`  the number of allocations drop from ~123 to 102, which is a significant amount of allocations shaved off.

Refs: https://github.com/scylladb/scylladb/issues/3602 (solves one use-case out of the many listed therein)
Fixes: SCYLLADB-1046
Fixes: SCYLLADB-1077

Backport: this is an optimization so normally not a backport candidate, but we may have to backport to relieve certain customers

Closes scylladb/scylladb#29033

* github.com:scylladb/scylladb:
  test/perf/perf_simple_query: add --collection=N
  test/boost/frozen_mutation_test: add freeze/unfreeze test for large collections
  mutation/mutation_partition_view: use read_from_collection_cell_view() to read collections
  mutation/collection_mutation: introduce read_from_collection_cell_view()
  mutation/atomic_cell: atomic_cell_type: add write*() and *serialized_size()
  mutation/collection_mutation: generalize serialize_collection_mutation
  mutation/mutation_partition_view: avoid copying collection
  mutation/mutation_partition_view: accept collection_mutation in the consume API
  partition_builder: add move variant of accept_*_cell() collection overloads
2026-05-10 20:39:08 +03:00
Andrzej Jackowski
bc67dd0b82 audit: split startup into construction and storage phases
The table-based audit backend needs Raft to create its keyspace,
but the audit service must exist earlier so that CQL paths don't
silently skip auditing.

Split startup into two phases: construction and storage
initialization.  Queries arriving between the two phases are
logged as errors.

This is a refactoring commit and the split sections will be
moved later in this patch series.

Refs SCYLLADB-1615
2026-04-28 18:58:42 +02:00
Aleksandra Martyniuk
bcdab2e012 service: extend tablet_migration_info to handle rebuilds
Make tablet_migration_info::{src,dst} optional, so that it can be
reused by rebuild, for respectively leaving and pending replica.
2026-04-17 09:58:07 +02:00
Botond Dénes
9999e7b642 test/perf/perf_simple_query: add --collection=N
Defaults to 0. When N > 0, adds a map<blob, blob> collection column to
the schema. Each row will have a collection cell with N elements.
Allows benchmarking collection handling.
2026-04-15 09:46:54 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Andrzej Jackowski
23c386a27f test: perf: add audit-unix-socket-path to perf-simple-query
To allow performance benchmarking with custom syslog sinks.

Example use case:

-- Audit + default syslog: ~100k tps
taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY"

```
110263.72 tps ( 66.1 allocs/op,  16.0 logallocs/op,  25.7 tasks/op,  254900 insns/op,  144796 cycles/op,        0 errors)
throughput:
	mean=   107137.48 standard-deviation=3142.98
	median= 106665.00 median-absolute-deviation=1786.03
	maximum=111435.19 minimum=97620.79
instructions_per_op:
	mean=   256311.36 standard-deviation=5037.13
	median= 256288.09 median-absolute-deviation=2223.08
	maximum=274220.89 minimum=248141.40
cpu_cycles_per_op:
	mean=   146443.47 standard-deviation=2844.19
	median= 146001.85 median-absolute-deviation=1514.82
	maximum=157177.54 minimum=142981.03
```

-- Audit + custom syslog: ~400k tps
socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null
taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path /tmp/audit-null.sock

```
404929.62 tps ( 65.9 allocs/op,  16.0 logallocs/op,  25.5 tasks/op,   77406 insns/op,   35559 cycles/op,        0 errors)
throughput:
	mean=   399868.39 standard-deviation=6232.88
	median= 401770.65 median-absolute-deviation=3859.09
	maximum=406126.79 minimum=383434.84
instructions_per_op:
	mean=   77481.26 standard-deviation=168.31
	median= 77405.54 median-absolute-deviation=84.33
	maximum=78081.46 minimum=77332.84
cpu_cycles_per_op:
	mean=   35871.32 standard-deviation=516.83
	median= 35699.70 median-absolute-deviation=251.15
	maximum=37454.86 minimum=35432.60
```

-- No audit: ~800k tps
taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30

```
808970.95 tps ( 53.3 allocs/op,  16.0 logallocs/op,  14.9 tasks/op,   49904 insns/op,   20471 cycles/op,        0 errors)
throughput:
	mean=   809065.31 standard-deviation=6222.39
	median= 810507.10 median-absolute-deviation=1827.99
	maximum=815213.41 minimum=782104.84
instructions_per_op:
	mean=   49905.50 standard-deviation=21.81
	median= 49900.12 median-absolute-deviation=7.72
	maximum=50010.97 minimum=49892.57
cpu_cycles_per_op:
	mean=   20429.00 standard-deviation=41.40
	median= 20425.18 median-absolute-deviation=29.11
	maximum=20530.74 minimum=20355.42
```

Closes scylladb/scylladb#29396
2026-04-09 16:00:41 +03:00
Tomasz Grabiec
3775593e53 test: perf_simple_query: Add 'sstable-format' command-line option 2026-03-18 16:25:20 +01:00
Tomasz Grabiec
6ee9bc63eb test: perf_simple_query: Add 'sstable-summary-ratio' command-line option 2026-03-18 16:25:20 +01:00
Tomasz Grabiec
38d130d9d0 test: perf-simple-query: Add option to disable index cache 2026-03-18 16:25:20 +01:00
Marcin Maliszkiewicz
9318c80203 perf: add abort_source support to wait-for-port loops
Check abort_source on each retry iteration in
wait_for_alternator and wait_for_cql so the
wait can be interrupted on shutdown.

Didn't use sleep_abortable as the sleep is very short
anyway.
2026-03-16 16:14:10 +01:00
Marcin Maliszkiewicz
edf0148bee perf-alternator: wait for alternator port before running workload
This patch is mostly for the purpose of running pgo CI job.

We may receive connection error if asyncio.sleep(5) in
pgo.py is not sufficient waiting time.

In pgo.py we do wait for port but only for cql,
anyway it's better to have high level check than
trying to wait for alternator port there.
2026-03-16 16:07:52 +01:00
Botond Dénes
6004e84f18 test: move away from tombstone_gc_state(nullptr) ctor
Use for_tests() instead (or no_gc() where approriate).
2026-03-03 14:09:28 +02:00
Avi Kivity
27a5502f14 Merge 'Reapply "main: test: add future and abort_source to after_init_func"' from Marcin Maliszkiewicz
The patchset fixes abort_source implementation for perf-alternator and perf-cql-raw. It moves
run_standalone function to common code in perf.hh with necessary templating.

We also add extensive testing so that it's more difficult to break the tooling in the future.

Fixes SCYLLADB-560
Backport: no, internal tooling improvement

Closes scylladb/scylladb#28541

* github.com:scylladb/scylladb:
  test: cluster: add tests for perf tools
  test: perf: fix port race condition on startup in connect workload
  test: perf: prepare benchmarks to bind to custom host
  test: perf: make perf-alterantor remote port configurable
  test: perf: fix ASAN leak warnings in perf-alternator
  Reapply "main: test: add future and abort_source to after_init_func"
2026-02-19 19:12:46 +02:00
Ernest Zaslavsky
45d824e0fe s3_perf: fix upload operation flow
Correct the upload operation logic. The previous flow incorrectly
checked for the test file on S3 even when performing operations that do
not download the file, such as uploads.
2026-02-19 11:14:59 +02:00
Marcin Maliszkiewicz
c69534504c test: perf: fix port race condition on startup in connect workload
Other workloads at startup call prepopulate() which connects
with retry loop therefore it waits until cql port is open.

This commit adds a single place where we will wait for port
for all workloads.

Timeout is set to 5 minutes so that even slowest machines
are able to start.
2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz
828f2fbdb1 test: perf: prepare benchmarks to bind to custom host
This is usefull for tests where we use local networks
like 127.5.5.5 to avoid port and host collisions.
2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz
9f2b97bef4 test: perf: make perf-alterantor remote port configurable
It could be a usefull option to have.
2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz
f5a212e91e test: perf: fix ASAN leak warnings in perf-alternator
Those were intentional as test process is short lived
but when we add automated tests in the following commits
we expect clean exit, with 0 exit code.
2026-02-19 09:33:10 +01:00
Marcin Maliszkiewicz
0c76c73e34 Reapply "main: test: add future and abort_source to after_init_func"
This reverts commit ceec703bb7.

The commit was fixed with abort source handling for alternator
standalone path so it's safe to reapply.
2026-02-19 09:33:10 +01:00
Ernest Zaslavsky
4026b54a5e s3_perf: fix the CMake build
Fix the CMake build of the perf_s3_client by adding the
necessary linkage with the jsoncpp library.
2026-02-18 17:12:08 +02:00
Ferenc Szili
d7cfaf3f84 test, simulator: compute load based on tablet size instead of count
This patch changes the load balancing simulator so that it computes
table load based on tablet sizes instead of tablet count.

best_shard_overcommit measured minimal allowed overcommit in cases
where the number of tablets can not be evenly distributed across
all the available shards. This is still the case, but instead of
computing it as an integer div_ceil() of the average shard load,
it is now computed by allocating the tablet sizes using the
largest-tablet-first method. From these, we can get the lowest
overcommit for the given set of nodes, shards and tablet sizes.
2026-02-12 12:54:55 +01:00
Ferenc Szili
216443c050 test, simulator: generate tablet sizes and update load_stats
This change adds a random tablet size generator. The tablet sizes are
created in load_stats.

Further changes to the load balance simulator:

- apply_plan() updates the load_stats after a migration plan is issued by the
load balancer,

- adds the option to set a command line option which controls the tablet size
deviation factor.
2026-02-12 12:54:55 +01:00
Ferenc Szili
e31870a02d test, simulator: postpone creation of load_stats_ptr
With size based load balancing, we will have to move the tablet size in
load_stats after each internode migration issued by balance_tablets().
This will be done in a subsequent commit in apply_plan() which is
called from rebalance_tablets().

Currently, rebalance_tablets() is passed a load_stats_ptr which is
defined as:

using load_stats_ptr = lw_shared_ptr<const load_stats>;

Because this is a pointer to const, apply_plan() can't modify it.

So, we pass a reference to load_stats to rebalance_tablets() and create
a load_stats_ptr from it for each call to balance_tablets().
2026-02-12 12:54:55 +01:00
Avi Kivity
ceec703bb7 Revert "main: test: add future and abort_source to after_init_func"
This reverts commit 7bf7ff785a. The commit
tried to add clean shutdown to `scylla perf` paths, but forgot at least
`scylla perf-alternator --workload wr` which now crashes on uninitialized
`c.as`.

Fixes #28473

Closes scylladb/scylladb#28478
2026-02-02 09:22:24 +01:00
Avi Kivity
6676953555 Merge 'test: perf: add option to write results to json in perf-cql-raw and perf-alternator' from Marcin Maliszkiewicz
Adds --json-result option to perf-cql-raw and perf-alternator, the same as perf-simple-query has.
It is useful for automating test runs.

Related: https://scylladb.atlassian.net/browse/SCYLLADB-434

Bacport: no, original benchmark is not backported

Closes scylladb/scylladb#28451

* github.com:scylladb/scylladb:
  test: perf: add example commands to perf-alternator and perf-cql-raw
  test: perf: add option to write results to json in perf-cql-raw
  test: perf: add option to write results to json in perf-alternator
  test: perf: move write_json_result to a common file
2026-02-01 13:57:10 +02:00
Marcin Maliszkiewicz
80e627c64b test: perf: add example commands to perf-alternator and perf-cql-raw 2026-01-30 08:48:19 +01:00
Marcin Maliszkiewicz
ea29e4963e test: perf: add option to write results to json in perf-cql-raw 2026-01-29 10:56:03 +01:00
Marcin Maliszkiewicz
d974ee1e21 test: perf: add option to write results to json in perf-alternator 2026-01-29 10:55:52 +01:00
Marcin Maliszkiewicz
a74b442c65 test: perf: move write_json_result to a common file
The implementation is going to be shared with
perf-alternator and perf-cql-raw.
2026-01-29 10:54:11 +01:00
Tomasz Grabiec
0d090aa47b tablets: Cache pointer to stats during plan-making
Saves on lookup cost, esp. for candidate evaluation. This showed up in
perf profile in the past.

Also, lays the ground for splitting stats per rack.
2026-01-28 01:32:00 +01:00
Marcin Maliszkiewicz
32543625fc test: perf: reuse stream id
When one request is super slow and req/s high
in theory we have a collision on id, this patch
avoids that by reusing id and aborting when there
is no free one (unlikely).
2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz
7bf7ff785a main: test: add future and abort_source to after_init_func
This commit avoids leaking seastar::async future from two benchmark
tools: perf-alternator and perf-cql-raw. Additionally it adds
abort_source for fast and clean shutdown.
2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz
0d20300313 test: perf: add option to stress multiple tables in perf-cql-raw 2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz
a033b70704 test: perf: add perf-cql-raw benchmarking tool
The tool supports:
- auth or no auth modes
- simple read and write workloads
- connection pool or connection per request modes
- in-process or remote modes, remote may be usefull
to assess tool's overhead or use it as bigger scale benchmark
- uses prepared statements by default
- connection only mode, for testing storms

It could support in the future:
- TLS mode
- different workloads
- shard awareness

Example usage:
> build/release/scylla perf-cql-raw --workdir /tmp/scylla-data --smp 2
--cpus 0,1 \
--developer-mode 1 --workload read --duration 5 2> /dev/null

Running test with config: {workload=read, partitions=10000, concurrency=100, duration=5, ops_per_shard=0}
Pre-populated 10000 partitions
97438.42 tps (269.2 allocs/op,   1.1 logallocs/op,  35.2 tasks/op,  113325 insns/op,   80572 cycles/op,        0 errors)
102460.77 tps (261.1 allocs/op,   0.0 logallocs/op,  31.7 tasks/op,  108222 insns/op,   75447 cycles/op,        0 errors)
95707.93 tps (261.0 allocs/op,   0.0 logallocs/op,  31.7 tasks/op,  108443 insns/op,   75320 cycles/op,        0 errors)
102487.87 tps (261.0 allocs/op,   0.0 logallocs/op,  31.7 tasks/op,  107956 insns/op,   74320 cycles/op,        0 errors)
100409.60 tps (261.0 allocs/op,   0.0 logallocs/op,  31.7 tasks/op,  108337 insns/op,   75262 cycles/op,        0 errors)
throughput:
	mean=   99700.92 standard-deviation=3039.28
	median= 100409.60 median-absolute-deviation=2759.85
	maximum=102487.87 minimum=95707.93
instructions_per_op:
	mean=   109256.53 standard-deviation=2281.39
	median= 108337.37 median-absolute-deviation=1034.83
	maximum=113324.69 minimum=107955.97
cpu_cycles_per_op:
	mean=   76184.36 standard-deviation=2493.46
	median= 75320.20 median-absolute-deviation=922.09
	maximum=80572.19 minimum=74320.00
2026-01-22 12:26:50 +01:00
Marcin Maliszkiewicz
1318ff5a0d test: perf: move cut_arg helper func to common code
It will be reused later.
2026-01-19 14:33:10 +01:00
Ferenc Szili
621cb19045 load_sketch: use tablet sizes in load computation
This commit changes load_sketch so that it computes node and shard load
based on tablet sizes instead of tablet count.
2025-12-27 10:37:23 +01:00
Aleksandra Martyniuk
d66a36058b service: pass topology and system_keyspace to load_balancer ctor
Pass a pointer to service::topology and db::system_keyspace to load
balancer. It will be used in the following patches to create
rack_list_colocation plan.
2025-12-16 13:25:38 +01:00
Tomasz Grabiec
721434054b perf-row-cache-update: Add scenario with large tombstone covering many rows
Fills memtable with rows and a tombstone which deletes all rows which
are already in cache.

Similar to raft log workload, but more extreme.

With -c1 -m4G, observed really bad performance:

update: 1711.976196 [ms], preemption: {count: 22603, 99%: 0.943127 [ms], max: 1494.571776 [ms]}, cache: 2148/2906 [MB], alloc/comp: 1334/869 [MB] (amp: 0.651), pr/me/dr 1062186/0/1062187
cache: 2148/2906 [MB], memtable: 738/1024 [MB], alloc/comp: 993/0 [MB] (amp: 0.000)

Which means that max reactor stall during cache update was 1.5 [s]
0.7 GB memtables. 2.1 GB in cache.
2025-12-06 01:03:09 +01:00
Botond Dénes
6ee0f1f3a7 Merge 'replica/table: add a metric for hypothetical total file size without compression' from Michał Chojnowski
This patch adds a metric for pre-compression size of sstable files.

This patch adds a per-table metric
`scylla_column_family_total_disk_space_before_compression`,
which measures the hypothetical total size of sstables on disk,
if Data.db was replaced with an uncompressed equivalent.

As for the implementation:
Before the patch, tables and sstable sets are already tracking their total physical file size.
Whenever sstables are added or removed, the size delta is propagated from the sstable up through sstable sets into table_stats.
To implement the new metric, we turn the size delta that is getting passed around from a one-dimensional to a two-dimensional value, which includes both the physical and the pre-compression size.

New functionality, no backport needed.

Closes scylladb/scylladb#26996

* github.com:scylladb/scylladb:
  replica/table: add a metric for hypothetical total file size without compression
  replica/table: keep track of total pre-compression file size
2025-11-20 09:10:38 +02:00
Michał Chojnowski
1cfce430f1 replica/table: keep track of total pre-compression file size
Every table and sstable set keeps track of the total file size
of contained sstables.

Due to a feature request, we also want to keep track of the hypothetical
file size if Data files were uncompressed, to add a metric that
shows the compression ratio of sstables.

We achieve this by replacing the relevant `uint_64 bytes_on_disk`
counters everywhere with a struct that contains both the actual
(post-compression) size and the hypothetical pre-compression size.

This patch isn't supposed to change any observable behavior.
In the next patch, we will use these changes to add a new metric.
2025-11-13 00:49:57 +01:00
Dario Mirovic
549e6307ec audit: unify create_audit and start_audit
There is no need to have `create_audit` separate from `start_audit`.
`create_audit` just stores the passed parameters, while `start_audit`
does the actual initialization and startup work.

Refs #26022
2025-11-06 03:05:06 +01:00