Commit Graph

4995 Commits

Author SHA1 Message Date
Nadav Har'El
21ecc12fc6 Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski
After schema reload, `target_parser::is_local()` did not recognize the
vector-index local target format `{"pk": [...], "tc": "..."}`, causing
local vector indexes to be treated as global. This broke duplicate
detection when both a global and a local vector index existed on the same
column. Fix by introducing `vector_index::is_local()` and dispatching
to it from `create_index_from_index_row()` based on the index class.
Also adds tests for local/global vector index coexistence.

Fixes: SCYLLADB-987

backport reasoning: we added local vector index support in 2026.1

Closes scylladb/scylladb#29492

* github.com:scylladb/scylladb:
  test/cqlpy: add tests for global and local vector index coexistence
  index: fix local vector index locality detection after schema reload
2026-05-27 15:34:57 +03:00
Botond Dénes
555cfbcd38 Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity
Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads.
- schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.

No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway)

Closes scylladb/scylladb#29990

* github.com:scylladb/scylladb:
  treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
  scylla-gdb: read shard count from smp::_this_smp instead of smp::count
  schema_builder: make shard_count an explicit constructor parameter
2026-05-27 09:42:06 +03:00
Avi Kivity
8010e408a2 treewide: replace deprecated smp::count and smp::all_cpus() with new APIs
Replace all uses of the deprecated seastar::smp::count with
this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards()
across the ScyllaDB codebase (seastar submodule untouched).

Both replacement functions require a reactor thread context. All call
sites were verified to run on reactor threads.

Notable cases:
- dht/token-sharding.hh: this_smp_shard_count() is used as a default
  parameter value. This is safe since all callers are on reactor threads,
  but the expression is now evaluated at each call site rather than being
  a reference to a global variable.
- service/storage_service.hh, locator/abstract_replication_strategy.hh,
  ent/encryption/encryption.cc: used in default member initializers and
  constructor member-init-lists. Objects are always constructed on reactor
  threads.

Not changed:
- scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context).
- Python test files: only reference smp::count in comments/strings.
2026-05-26 17:35:20 +03:00
Wojciech Mitros
ae0d77257f mv: fix view_update_builder losing fragments across batch boundaries
When a mutation generates more view updates than max_rows_for_view_updates
(100), view_update_builder::build_some() splits the work into multiple
batches. There was a bug in how fragments were read between batches:

When should_stop_updates() returned true, the old code called stop()
which returned stop_iteration::yes without reading the next fragments.
On the next build_some() call, read_both_next_fragments() was called
at the start, which advanced BOTH readers - skipping any fragment that
was already read but not yet consumed. A row could be not consumed if
either:
- the 100th (last in the batch) update was a row insertion and we still
  had insertions/updates remaining
- the 100th (last in the batch) update was a row deletion and we still
  had deletions/updates remaining
For the most common case where work is split in batches, i.e. range
deletions, we couldn't hit this because range delete generates only
view row deletions.
On tables with a single materialized view, we also couldn't get this
for any batches with less than 50 statements (unless the batch also
contained range deletions), because one non-range-delete update can
generate up to 2 view updates.
Howeveer, for a range of scenarios outside these 2, we could lose
view updates, resulting in persistent inconsistencies.

The fix:
- read_*_next_fragment() now accept a stop_iteration parameter, so the
  next fragments are always read after consuming (even when stopping),
  but stop_iteration::yes is correctly propagated to break the loop.
- build_some() no longer re-reads fragments at the start. Instead, an
  initialize() method performs the initial read once at construction.
- because now we only advance readers after consuming, we won't advance
  readers after end_of_partition, so we extend the break condition to
  accept either readers evaluating to `false` or them being at the
  end_of_partition. We also handle the optimization with
  _skip_row_updates

Fixes: scylladb/scylladb#29155

Closes scylladb/scylladb#29498
2026-05-26 14:15:12 +02:00
Avi Kivity
f165b396fd schema_builder: make shard_count an explicit constructor parameter
A recent Seastar update deprecated smp::count and introduced
this_smp_shard_count() as a replacement. One difference is that
this_smp_shard_count() wants to run on a reactor thread.

This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE)
that nevertheless use a schema, as the schema_builder constructor
references smp::count. If we replace it with this_smp_shard_count()
then it will crash when running without a reactor.

To fix, remove the implicit this_smp_shard_count() call from raw_schema's
constructor and require callers to pass shard_count explicitly to
schema_builder. This allows tests that don't run on a reactor thread
to construct schemas without crashing.

Production code and reactor-based tests pass this_smp_shard_count().
Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test,
wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test)
pass a fixed shard count of 1.

Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE)
but also contains one plain BOOST_AUTO_TEST_CASE
(test_empty_key_view_comparison) that constructs a schema_builder without
a reactor context. This test also receives a fixed shard count of 1.
2026-05-26 11:55:56 +03:00
Botond Dénes
597d4252dc types: abstract_type::from_string() switch to fragmented buffers (interface)
Change input: str::string_view -> utils::chunked_string_view.
Change return value: bytes -> managed_bytes.

This patch only changes the interface, with some to_bytes() sprinkled in
the internals to deal with recursive calls.
Internals will be updated in the next patch, to keep the churn of
updating callers separate from the actually important changes.
2026-05-26 09:08:06 +03:00
Nadav Har'El
96dd3121e7 Merge 'cql: rewrite CassIO SAI metadata index to regular secondary index' from Szymon Wasik
CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index:

```sql
CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table>
ON <keyspace>.<table> (ENTRIES(metadata_s))
USING 'org.apache.cassandra.index.sai.StorageAttachedIndex';
```

ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with:

```
StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns
```

This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`).

CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern.

Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries:

- **Detection**: SAI class name + single `ENTRIES` target on a non-frozen `map` column
- **Rewrite**: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries)
- **Warning**: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI

The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication.

After this change, the CassIO schema setup succeeds on ScyllaDB:
- `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index
- The index is functional and can accelerate metadata filtering queries
- A CQL warning makes the rewrite transparent to operators
- SAI on non-vector, non-map-entries columns is still rejected as before
- Vector SAI indexes continue to be rewritten to `vector_index` as before

- `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name)
- `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias
- `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected

All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped).

Fixes: SCYLLADB-2113
Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added.

Closes scylladb/scylladb#29981

* github.com:scylladb/scylladb:
  cql: rewrite CassIO SAI metadata index to regular secondary index
  db/config: add enable_cassio_compatibility flag
2026-05-26 00:19:03 +03:00
Avi Kivity
305346a3ec Merge 'Don't materialize collections into intermediate representations' from Botond Dénes
Collections have an age-old problem in ScyllaDB: they had to be unserialized into an intermediate representation for any access or manipulation. The intermediate representation needs effort to produce and also requires additional memory to store. Both can be significant for large collections. This intermediate representation is then either discarded immediately after use, or re-serialized again.
This problem was significant enough for us to consider the use of collections as somewhat of an anti-pattern. But our customers keep using it. Alternator is also a heavy user of collections.

This PR aims to solve this problem once and for all.  The plan is as follows:
* Promote direct use of the serialized collection format:
    - Add accessor methods to `collection_mutation_view` which read from the serialized format directly: `tomb()`, `size()` and `begin()`/`end()`.
    - Add a `collection_mutation_writer` which provides container semantics for generating a serialized `collection_mutation` directly on the go (`push_back()`).
* Replace all usage of `collection_mutation_description`, `collection_mutation_view_description` and friends with use of the new infrastructure.
* Drop the old infrastructure, to avoid accidental regressions.

Continues the work started by https://github.com/scylladb/scylladb/pull/29033 and takes it to its conclusion.

To help focus review, here is a summary of the patches:
* [1, 2] preparatory refactoring: drop some unused abstract_type params
* [3, 6] introduce new infrastructure to write and read serialized collections directly; this is the meat of the PR
* [6, -1) replace all usage of old materializing infrastructure with usage of the new one
* [-1] drop old infrastructure

**Command:**
```
dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error
```

| Metric                   |  Before |   After | Change     |
|--------------------------|--------:|--------:|------------|
| Throughput (median tps)  | 315,760 | 332,021 | **+5.1%**  |
| Instructions/op (median) |  53,776 |  48,681 | **-9.5%**  |
| CPU cycles/op (median)   |  17,365 |  16,471 | **-5.1%**  |
| Allocations/op           |    85.1 |    82.1 | **-3.5%**  |

**Significant improvement.** Throughput is up ~5%, and both instruction count and cycle count are meaningfully reduced.

---

**Command:**
```
dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write
```

| Metric                   |    Before |    After | Change    |
|--------------------------|----------:|---------:|-----------|
| Throughput (median tps)  |   150,823 |  149,678 | **-0.8%** |
| Instructions/op (median) |   108,388 |  103,858 | **-4.2%** |
| CPU cycles/op (median)   |    34,860 |   35,371 | **+1.5%** |
| Allocations/op           | ~105–108  | ~102–103 | **-3.0%** |

**Mixed, mostly neutral.** Throughput is essentially flat (within noise). Instructions/op improved by ~4%, allocations dropped slightly, but cycles/op edged up marginally.

---

**Command:**
```
dbuild -it -- build/release/scylla perf-alternator --workload write --developer-mode=1 --alternator-port=8000 --alternator-write-isolation=unsafe -c1 -m2G --default-log-level=error
```

| Metric                   |  Before |  After | Change    |
|--------------------------|--------:|-------:|-----------|
| Throughput (median tps)  |  55,777 | 56,051 | **+0.5%** |
| Instructions/op (median) | 246,215 |246,610 | **+0.2%** |
| CPU cycles/op (median)   |  77,641 | 77,020 | **-0.8%** |
| Allocations/op           |   340.4 |  335.4 | **-1.5%** |

**Essentially neutral.** All metrics are within noise margins. Slight reduction in allocations and cycles, negligible otherwise.

---

The change has a **clear, substantial positive effect on reads** (~5% throughput gain, ~9.5% fewer instructions per op).
The write and alternator paths are **unaffected in practice** — changes there are within measurement noise. No regressions are apparent.
This is expected: https://github.com/scylladb/scylladb/pull/29033 did the heavy lifting when it comes to the write path, this PR finishes the job, mostly improving reads.

Fixes: #3602

Improvement, no backport.

Closes scylladb/scylladb#29127

* github.com:scylladb/scylladb:
  mutation/collection_mutation: make collection_mutation::_data private
  mutation_collection: drop collection_mutation_description and friends
  test: move away from collection_mutation_description
  tree: move away from collection_mutation_description
  test: move away from collection_mutation_view::with_deserialized()
  tree: move away from collection_mutation_view::with_deserialized()
  types: fix indendation, left broken by previous commit
  types: move away from collection_mutation_view::with_deserialized()
  types: serialize_for_cql(): use throwing_assert() instead of SCYLLA_ASSERT()
  schema: column_computation: move away from collection_mutation_view::with_deserialized()
  mutation: move away from collection_mutation_view::with_deserialized()
  alternator: move away from collection_mutation_view::with_deserialized()
  cdc: move away from collection_mutation_view::with_deserialized()
  mutation/collection_mutation: printer: don't deserialize collections
  mutation/collection_mutation: difference(): don't deserialize collections
  mutation/collection_mutation: merge(): don't deserialize collections
  mutation/collection_mutation: extract compact_and_expire() to free function
  mutation/collection_mutation: refactor empty(), is_any_live() and last_update()
  compaction_garbage_collector: pass collection_mutation to collect()
  test/boost/mutation_test: add tests for collection_mutation_{view,writer}
  mutation/collaction_mutation: collection_mutation_view: add methods to inspect content
  mutation/collection_mutation: add collection_mutation_writer
  mutation/collection_mutation: collection_mutation(): generate valid collection
  mutation/collection_mutation: collection_mutation(): remove unused abstract_type param
  mutation/atomic_cell: drop unused type param from from_bytes()
2026-05-21 17:10:40 +03:00
Piotr Dulikowski
6148316f66 Merge 'db/view/view_building_coordinator: add flag to mark if any remote work was finished' from Michał Jadwiszczak
There is small windows just after view building coordinator releases
group0 guard and before it waits on view_building_state_machine's CV,
when the coordinator may miss CV broadcast triggered by finished remote
work.

To fix it, this patch adds a boolean flag, which is set to true before
broadcasting the CV and is checked before awaiting on the CV.

Fixes SCYLLADB-2029

The problem is not critical but it should be backported to 2025.4 and newer version, all of them contains view building coordinator.

Closes scylladb/scylladb#27313

* github.com:scylladb/scylladb:
  test/cluster/test_view_building_coordinator: add reproducer
  db/view/view_building_coordinator: add flag to mark if any remote work was finished
2026-05-21 15:11:58 +02:00
Szymon Wasik
242eb96b16 db/config: add enable_cassio_compatibility flag
Add a new live-updatable boolean configuration option
'enable_cassio_compatibility' (default: false).

When enabled, it allows ScyllaDB to rewrite CassIO's SAI index DDL
on map entries to a regular secondary index, so that LangChain/CassIO
applications can run without DDL errors.

The flag is disabled by default to avoid affecting users who don't
need CassIO compatibility.
2026-05-21 13:26:02 +02:00
Pavel Emelyanov
4b13b24695 Merge 's3: make S3 connection pool size configurable per scheduling group' from Ernest Zaslavsky
The S3 client creates a separate HTTP connection pool per scheduling group. Previously, the pool size was hardcoded as shares/100, yielding 1-10 connections. This was not tunable and could under-provision connections for groups with low share counts.
**Changes**
- A missing include (short_streams.hh) in sstables_loader.cc is added first to fix CMake builds where the header is not transitively included.
- The hardcoded per-share divisor is replaced with a per-shard connection budget. The new `object_storage_connections_per_shard` config option (default 128) specifies the total number of connections available on each shard. Connections are distributed proportionally across scheduling groups based on their shares: `max_connections = budget * group_shares / total_shares`. Remainder connections are assigned to the group with the most shares. When a new scheduling group client is created, all existing groups are rebalanced via `set_maximum_connections`. Creation and rebalance are serialized with a semaphore to prevent concurrent rebalances from racing.
- The config option is made live-updateable: a `storage_manager` observer propagates changes to all existing S3 clients, triggering rebalance under the same semaphore.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1704
No backport needed since this change affects KS on object storage which is not operational yet.

Closes scylladb/scylladb#29719

* github.com:scylladb/scylladb:
  s3: make connections_per_shard live-updateable
  s3: distribute connection pool proportionally across scheduling groups
2026-05-21 12:12:36 +03:00
Andrzej Jackowski
f8156702de tree: add missing -present to copyright headers
~2076 files used "Copyright (C) YYYY-present ScyllaDB" while
~88 files used "Copyright (C) YYYY ScyllaDB". This
inconsistency leads to unnecessary code review discussions
and gradual spread of the less common format.

Standardize all ScyllaDB copyright headers to use -present.

Fixes SCYLLADB-1984

Closes scylladb/scylladb#29876
2026-05-21 10:57:42 +02:00
Michał Hudobski
cf372ba87b index: fix local vector index locality detection after schema reload
When index metadata was deserialized from system tables during schema
reload, target_parser::is_local() failed to recognize local vector
indexes. It only handled the non-vector JSON format {"pk": [...],
"ck": [...]}, but vector indexes serialize their targets as
{"pk": [...], "tc": "..."}. As a result, every local vector index was
incorrectly marked as global after a schema reload.

Fix this by introducing vector_index::is_local() that recognizes the
vector-specific target format, and dispatching to it from the schema
deserialization code based on the index class name. This keeps
target_parser as secondary-index-specific and follows the same dispatch
pattern already used for target serialization.

Also remove the now-unused has_vector_index_on_column() helper (its
callers were removed by #29407).
2026-05-21 10:35:48 +02:00
Botond Dénes
636e2877e2 tree: move away from collection_mutation_description
Use collection_mutation_writer instead.

Add to_managed_bytes() to cql3::raw_value to help avoid some copies.

A special note for sstables/kl/reader.cc: this conversion is not
straighforward, so we accumulate a list of cells and feed to the writer
at the end. This is sub-optimal but this code is rarely used, best to be
conservative.
2026-05-21 10:23:29 +03:00
Botond Dénes
35a776d043 tree: move away from collection_mutation_view::with_deserialized()
Use the collection_mutation_view directly.

This is the remainder after the previous patches collecting larger
changes by module.
2026-05-21 10:23:29 +03:00
Botond Dénes
24fdfa34dd mutation/collection_mutation: collection_mutation(): remove unused abstract_type param 2026-05-21 08:34:21 +03:00
Ernest Zaslavsky
86f678a592 s3: make connections_per_shard live-updateable
Wire the object_storage_connections_per_shard config option as
LiveUpdate so it can be changed at runtime without restart. When
the value changes, the storage_manager observer propagates it to
all existing S3 clients, which rebalance their connection pools
under the rebalance semaphore.
2026-05-20 20:45:14 +03:00
Ernest Zaslavsky
b9e1dcc0fe s3: distribute connection pool proportionally across scheduling groups
The S3 client creates a separate HTTP connection pool per scheduling
group. Previously, the pool size was computed per-group using a
per-share multiplier (connections = shares * multiplier), which did
not account for the total number of groups sharing the shard's
connection budget.

Replace the per-share multiplier with a per-shard connection budget:
the new object_storage_connections_per_shard config option (default
100) specifies the total number of connections available on each
shard. When a new scheduling group's client is created, connections
are distributed proportionally across all groups based on their
shares (connections = budget * group_shares / total_shares), and
existing groups are rebalanced via set_maximum_connections.

When the endpoint_config has an explicit max_connections override,
it is used directly without proportional distribution.
2026-05-20 20:45:04 +03:00
Marcin Maliszkiewicz
83823149e9 Merge 'audit: implement audit_rules config' from Andrzej Jackowski
This patch series adds `audit_rules`, a new audit configuration option for fine-grained, role-aware audit filtering with per-rule sink routing. Rules can be configured in `scylla.yaml` or updated live through `system.config` without restarting the node. Each rule specifies target sinks (`table`, `syslog`), statement categories, qualified table name patterns, and role patterns. Table and role patterns use POSIX `fnmatch` with extended glob syntax. For table-scoped categories (`DML`, `DDL`, `QUERY`), a rule matches only when the category, role, and qualified table name all match. For table-independent categories (`AUTH`, `ADMIN`, `DCL`), the table filter is ignored. Empty category or role lists match nothing; an empty table list matches nothing only for table-scoped categories. The new rules are additive with the existing `audit_categories`, `audit_keyspaces`, and `audit_tables` settings: both mechanisms are evaluated for each audit event, and the final sink set is the union of all matches.

To avoid evaluating glob patterns on every audit event, audit rules use a preprocessed cache of known roles and tables. The cache is kept in sync through group0 role/table snapshots, role-change notifications, and schema migration notifications. For known entities, rule matching uses precomputed role/table rule sets; unknown entities fall back to direct rule evaluation. When `audit_rules` is empty, per-event rule matching returns immediately and does not evaluate glob patterns. Audit still keeps known role/table metadata in sync while audit is enabled, so rules can be enabled later through live configuration updates without restarting the node.

**Performance**
Measured with `perf-simple-query --smp 1 --duration 100` against a null syslog socket. Results show no regression when audit is disabled, and audit-rules performance has at most 1% more instructions than legacy config for equivalent workloads:

```
===============================================================================================================================================================================
Configuration                                     | Binary     |         throughput (tps) | insns/op                 | cpu_cycles/op            | alloc/op | logal/op | task/op
===============================================================================================================================================================================
audit=none [1]                                    | baseline   |                 206922.4 |                  36591.6 |                  15348.3 |     58.1 |      0.0 |    14.1
audit=none [1]                                    | this PR    |        207856.4  (+0.5%) |         36544.9  (-0.1%) |         15274.0  (-0.5%) |     58.1 |      0.0 |    14.1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit=syslog keyspaces=ks [2]                     | baseline   |                  94871.8 |                  54163.0 |                  27172.4 |     72.0 |      0.0 |    24.0
audit=syslog keyspaces=ks [2]                     | this PR    |         96138.4  (+1.3%) |         54072.3  (-0.2%) |         26699.3  (-1.7%) |     72.0 |      0.0 |    24.0
audit=syslog audit-rules=ks [3]                   | this PR    |         95142.1  (+0.3%) |         54457.8  (+0.5%) |         26953.8  (-0.8%) |     72.0 |      0.0 |    24.0
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
audit=syslog keyspaces=ks-non-existent [4]        | baseline   |                 213997.8 |                  36735.6 |                  14848.1 |     58.1 |      0.0 |    14.1
audit=syslog keyspaces=ks-non-existent [4]        | this PR    |        219297.2  (+2.5%) |         36667.3  (-0.2%) |         14500.1  (-2.3%) |     58.1 |      0.0 |    14.1
audit=syslog audit-rules=ks-non-existent [5]      | this PR    |        211038.7  (-1.4%) |         36999.7  (+0.7%) |         15048.6  (+1.4%) |     58.1 |      0.0 |    14.1
===============================================================================================================================================================================

[1] ./scylla perf-simple-query --smp 1 --duration 100 --audit "none"
[2] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock"
[3] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks.*"],"roles":["*"]}]' --audit-unix-socket-path "/tmp/audit-null.sock"
[4] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-keyspaces "ks-non-existent" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path "/tmp/audit-null.sock"
[5] ./scylla perf-simple-query --smp 1 --duration 100 --audit "syslog" --audit-rules '[{"sinks":["syslog"],"categories":["DCL","DDL","AUTH","DML","QUERY"],"qualified_table_names":["ks-non-existent.*"],"roles":["*"]}]' --audit-unix-socket-path "/tmp/audit-null.sock"

audit-null.sock was created with `socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null`
```

Fixes: SCYLLADB-1430
No backport: new feature

Closes scylladb/scylladb#29267

* github.com:scylladb/scylladb:
  test: alternator: audit: rules filtering and batch bypass
  test: perf: add --audit-rules option to perf-simple-query
  docs: add audit rules section to the auditing guide
  test: audit: cover role and schema cache notifications
  test: audit: cover audit rules cluster behavior
  audit: rebuild rule caches on group0 snapshot and role changes
  audit: refresh rule caches on schema, role, and config changes
  audit: route matching rules to configured sinks
  test: cover preprocessed audit rule cache
  audit: add preprocessed rule matching cache
  audit: pass sink targets to storage helpers
  test: audit: cover rule matching semantics
  audit: add rule matching and sink helpers
  test: audit: cover audit_rules configuration
  config: add live audit_rules option
  test: cover audit rule parsing and validation
  audit: define audit_rule type with parsing and validation
2026-05-20 14:10:45 +02:00
Avi Kivity
6df04c9e5b Update seastar submodule
Changed seastar::http::experimental to seastar::http to reflect
graduation of the seastar http API.

Changed call to seastar::rename_file() (in sstables/storage.cc,
sstables/sstable_directory.cc, sstable/sstables.cc and
db/hints/internal/hint_storage.cc) to reflect new default parameter.

Updated scylla_gdb test helper get_task() to work with updated
accept loop in Seatar. This is just test code (attempts to find
a task to operate on), not used in real scylla-gdb.py work, but
nevertheless the adjustment keeps backward compatibility.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1798
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2043

* seastar 485a62b2...510f3148 (43):
  > reactor_backend: fix iocb double-free and shutdown hang during AIO teardown
  > file: fix default DMA alignment
  > http: add to_reply() to redirect_exception with extra-header support
  > core: propagate syscall errors via `coroutine::exception`
  > file: assert dma alignments are powers of two
  > doc: Document undocumented io_tester features and fix output example
  > backtrace: print the build_id along with the backtrace
  > reactor: default to oneline backtraces
  > Merge 'json: formatter: support types with user-defined conversion to sstring' from Benny Halevy
    tests: json_formatter: test formatter::write with string types
    json: formatter: support types with user-defined conversion to sstring
  > httpd_test: fix build failure with Seastar_SSTRING=OFF
  > net/tls: introduce ssl_call wrapper for SSL I/O
  > build: disable unused command line argument error for C++ module
  > coroutine/generator: fix setup of generator's waiting task
  > tests/tls: set 1000-day validity for self-signed CA cert
  > net: tls: openssl: disable certificate compression
  > reactor: reduce steady_clock::now() calls per scheduling quantum
  > fair_queue: remove notify_request_finished()
  > loop: use small_vector for parallel_for_each_state incomplete futures
  > dodge false sharing in spinlock
  > Merge 'Handle nowait support for reads and writes independently' from Pavel Emelyanov
    file: Change nowait_works mode detection
    file: Introduce read-only nowait_mode
    filesystem: Make nowait_works bit a enum class too
    file: Make nowait_works bit a enum class
  > Merge 'net/tls: improve OpenSSL error queue hygiene' from Gellért Peresztegi-Nagy
    net/tls: assert clean error queue before SSL operations
    net/tls: clear error queue after successful SSL operations
    net/tls: clear error queue after successful SSL_CTX_new
    net/tls: drain error queue on unexpected error codes
    net/tls: use make_openssl_error for BIO creation failure
  > vla.hh: add missing includes
  > Merge 'smp: make smp::count non-static' from Avi Kivity
    smp: convert all smp::count usages to instance-aware alternatives
    smp: add per-instance shard_count and this_smp() infrastructure
    disk_params: document pre-init smp::count access with explicit 0
    reactor_backend: document pre-init smp::count access with explicit 0
    tests: alien_test: pass shard count to alien thread explicitly
  > build: fix cmake missing ninja on Ubuntu 26.04
  > rpc: Fix uint64 wraparound of expired timeout in send_entry()
  > Merge 'Generalize some RPC tests' from Pavel Emelyanov
    tests: Generalize async connection-based scheduling RPC tests
    tests: Generalize sync connection-based scheduling RPC tests
    tests: Remove redundant variadic/nonvariadic RPC tuple tests
    tests: Generalize max timeout RPC tests
  > net: tls: openssl: Share BIO ptrs across shards
  > http: fix compilation on clang 22 with c++26
  > build: openssl tools needed for test cert generation
  > reactor: support rename2
  > future: fix forwarding of reference types
  > Merge 'Zero-copy http chunked data sink' from Pavel Emelyanov
    http: Make chunked data sink zero-copy
    tests/prometheus_http: Rewrite on top of http::client
    tests/httpd: Rewrite content_length_limit on top of http::client
  > tests: Replace ad-hoc http_consumer with production HTTP parser
  > Merge 'co_return to accept same expressions and types as return' from Alexey Bashtanov
    tests/unit/{coroutines,futures}: strict types on co_return and set_value
    api: introduce version 10:
    core/{coroutine,future}: make `co_return` more strict with types
    core/{coroutine,future}: preparations to fix `co_return` type semantics
  > Merge 'Perftune.py: add special handling for mlx5 rss queues number calculation' from Vladislav Zolotarov
    perftune.py: NetPerfTuner: enhance RSS (a.k.a. "Rx") queues accounting for mlx5 devices
    perftune.py: update docstring of NetPerfTuner.__get_rps_cpus() method
    perftune.py: add a method that parses and models the output of the 'ethtool -l' command for a given interface
  > httpd: rewrite do_accepts/do_accept_one as coroutines
  > file: add mmap support to file
  > http: Move client code out of experimental namespace
  > file: add hugetlbfs support to file system detection
  > tests: Replace test_source_impl with util::as_input_stream
  > tests: Replace buf_source_impl with util::as_input_stream
  > Merge 'rpc_tester: expose throuput for rpc tester' from Marcin Szopa
    rpc_tester: remove unused payload size variable from job_rpc_streaming class
    rpc_tester: add start time tracking for throughput calculation, print throughput and msg/s for job_rpc
    rpc_tester: refactor result emission to use dedicated functions for messages and throughput
  > iostream: cast first argument of `std::min` to `size_t`

Closes scylladb/scylladb#29952
2026-05-20 13:47:12 +03:00
Andrzej Jackowski
f3a7e2e3dc config: add live audit_rules option
Operators need to configure audit rules through YAML, CQL,
and CLI with live-update support so routing can be
reconfigured without restart.

Add audit_rules as a LiveUpdate config option with YAML
decoding, JSON parsing for CQL updates, CLI --audit-rules
flag, and a custom serializer that avoids double-quoting
the JSON array.

Refs SCYLLADB-1430
2026-05-20 06:55:14 +02:00
Szymon Malewski
6b2fce03f9 alternator: optional stripping of http response headers
In Alternator's HTTP API, response headers can dominate bandwidth for
small payloads. The Server, Date, and Content-Type headers were sent on
every response but many clients never use them.

This patch introduces three Alternator config options:
  - alternator_http_response_server_header,
  - alternator_http_response_disable_date_header,
  - alternator_http_response_disable_content_type_header,
which allow customizing or suppressing the respective HTTP response
headers. All three options support live update (no restart needed).
The Server header is no longer sent by default; the Date and
Content-Type defaults preserve the existing behavior.

The Server and Date header suppression uses Seastar's
set_server_header() and set_generate_date_header() APIs added in
https://github.com/scylladb/seastar/pull/3217. This patch also
fixes deprecation warnings from older Seastar HTTP APIs.

Tests are in test/alternator/test_http_headers.py.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70

Closes scylladb/scylladb#28288
2026-05-19 10:47:13 +03:00
Piotr Dulikowski
26671d4d5f Merge 'Refactor view_update_builder' from Wojciech Mitros
This series improves the readability and structure of
view_update_builder, the component that generates materialized view
updates from base-table mutations.

The first four patches are pure renames and refactoring with no
semantic changes:

  1. Document that the builder operates on a single base partition.
  2. Rename member fields to clearly distinguish readers (the
     mutation_reader streams) from the cached fragments (the last
     mutation_fragment_v2 read from each stream).
  3. Rename advance/on_results methods to names that describe what
     they actually do: read the next fragment, or generate view
     updates.
  4. Extract partition-start handling into its own method.

The next two patches are minor optimizations:

  5. Simplify clustering-row handling by moving the row out of the
     fragment before applying the tombstone, avoiding an unnecessary
     memory-usage recalculation in the reader permit.
  6. Replace deep copies with moves in the existing-only tail path,
     matching the pattern used everywhere else.

Finally, patch 7 deduplicates the fragment-consuming logic by
extracting the three repeated blocks into consume_both_fragments(),
consume_update_fragment(), and consume_existing_fragment().

Code reorganization - no backport needed

Closes scylladb/scylladb#29497

* github.com:scylladb/scylladb:
  mv: deduplicate code for consuming fragments in view_update_builder
  mv: avoid unnecessary copies of existing rows in generate_updates()
  mv: simplify clustering row handling in generate_updates()
  mv: rename methods in view_update_builder for clarity
  mv: rename view_update_builder readers and cached fragments
  mv: drop redundant std::move from partition key extraction
  mv: document single-partition builder scope
2026-05-18 15:52:26 +02:00
Piotr Dulikowski
5efb43195e Merge 'db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE' from Michał Jadwiszczak
After recent change (1a32ccd) `make_update_indices_mutations()` is unconditionally adding a mutation for `system.view_building_tasks`, even when no indices were being dropped.

In a mixed-version cluster, the older node may not have this table, causing the Raft schema applier to fail with 'Can't find a column family with UUID ...'.

This patch fixes the bug by emitting the mutation when indices are actually dropped (i.e., when the view building cleanup code path was entered).

Fixes: SCYLLADB-2026
Refs: scylladb#26557

scylladb#26557 wasn't backported, so this patch also doesn't need to be.

Closes scylladb/scylladb#29908

* github.com:scylladb/scylladb:
  db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE
  db/view_building_task_mutation_builder: add `empty()` method
2026-05-18 15:37:02 +02:00
Michał Jadwiszczak
a9b2baf36b db/schema_tables: don't emit empty view_building_tasks mutation on ALTER TABLE
After recent change (1a32ccd) `make_update_indices_mutations()` is unconditionally
adding a mutation for `system.view_building_tasks`, even when no indices were being dropped.

In a mixed-version cluster, the older node may not have this table,
causing the Raft schema applier to fail with 'Can't find a column
family with UUID ...'.

This patch fixes the bug by emitting the mutation when indices are actually
dropped (i.e., when the view building cleanup code path was entered).

Fixes: SCYLLADB-2026
Refs: scylladb#26557
2026-05-18 10:01:21 +02:00
Michał Jadwiszczak
82eb5611ab db/view_building_task_mutation_builder: add empty() method
The method allows to check if the builder contains any changes,
so it will allow to skip emitting empty mutation.
2026-05-18 09:54:26 +02:00
Michał Jadwiszczak
c767ac7ef3 test/cluster/test_view_building_coordinator: add reproducer
Add test which reproduces scylladb/scylladb#27298
2026-05-18 09:18:56 +02:00
Michał Jadwiszczak
c7f65131bf db/view/view_building_coordinator: add flag to mark if any remote work was finished
In the main coordinator loop (`view_building_coordinator::run()`),
there is small windows just after view building coordinator releases
group0 guard and before it waits on view_building_state_machine's CV,
when the coordinator may miss CV broadcast triggered by finished remote
work (`view_building_coordinator::work_on_tasks()`).

To fix it, this patch adds a boolean flag, which is set to true before
broadcasting the CV by finished/failed RPC call
and is checked before awaiting on the CV.

Fixes scylladb/scylladb#27298
2026-05-18 09:18:07 +02:00
Petr Gusev
9e3209e4a3 cql: refactor add_tablet_info to take tablet_routing_info directly
Change add_tablet_info() to accept locator::tablet_routing_info instead
of destructured (tablet_replica_set, token_range) pair. This simplifies
all three call sites.

Remove the empty-replicas guard inside add_tablet_info(): the only
producer of tablet_routing_info is tablet ERM's check_locality(), which
returns either nullopt (correctly routed) or info with replicas copied
from tablet_info — a tablet always has replicas. All callers already
check for nullopt before calling add_tablet_info(), so by the time we
enter the function replicas are guaranteed non-empty.
2026-05-15 12:28:33 +02:00
Michał Jadwiszczak
b175f5b97d db/view/view_building_worker: add more logs when flushing base table
Add debug logs around flushing the base table to see how long does it
take in case of some stalls in view building.

Refs SCYLLADB-1261
2026-05-14 10:23:42 +02:00
Piotr Dulikowski
3c2c814215 Merge 'db/view/view_building: replace system keyspace functions with mutation builder' from Michał Jadwiszczak
`system.view_building_tasks` is a single partition table, so it makes more sense to use a mutation builder and generate 1 mutation per group0 command instead of generating multiple mutations.

This PR removes all `make_..._mutation()` system keyspace functions related to view building tasks and replaces them with mutation builder.

Refs https://github.com/scylladb/scylladb/issues/25929

This patch doesn't fix any bug, it only reduces number of generated mutations, no need to backport it.

Closes scylladb/scylladb#26557

* github.com:scylladb/scylladb:
  db/system_keyspace: replace `make_remove_view_building_task_mutation()` with mutation builder
  db/view/view_building_task_mutation_builder: make uuid generator optional
  db/system_keyspace: replace `make_view_building_task_mutation()` with mutation builder
  db/view/view_building_task_mutation_builder: add helper method
2026-05-13 16:10:55 +02:00
Michał Jadwiszczak
1a32ccd8f6 db/system_keyspace: replace make_remove_view_building_task_mutation() with mutation builder
Again, get rid of system keyspace method in favor of mutation builder,
because `system.view_building_tasks` is a single parition table.
2026-05-13 10:06:18 +02:00
Michał Jadwiszczak
2561cc1546 db/view/view_building_task_mutation_builder: make uuid generator optional
After scylladb/scylladb#28929 `task_uuid_generator` became necassary
dependency of `view_building_task_mutation_builder`.

However to create the generator we need `view_building_state`, which in
some parts of the code (schema_tables.cc, migration_manager.cc) requires
remote proxy to be obtained.

But sometimes we need the mutation builder to just remove some view
building task. In those cases, we don't need the uuid generator and the
remote proxy requirement is not necassary.
2026-05-13 09:58:27 +02:00
Michał Jadwiszczak
e002665aa7 db/system_keyspace: replace make_view_building_task_mutation() with mutation builder
`system.view_building_tasks` is a single partition table, so it makes
more sense to use a mutation builder and generate 1 mutation per group0
command instead of generating multiple mutations.
2026-05-12 21:49:18 +02:00
Michał Jadwiszczak
4227cab5cb db/view/view_building_task_mutation_builder: add helper method
Add a method to set all task's fields.
2026-05-12 21:28:06 +02:00
Botond Dénes
e95eb21a16 Merge 'Tablet-aware restore' from Pavel Emelyanov
The mechanics of the restore is like this

- A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters
  - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests
  - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet
- The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas
- Each replica handles the RPC verb by
  - Reading the snapshot_sstables table
  - Filtering the read sstable infos against current node and tablet being handled
  - Downloading and attaching the filtered sstables

This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code.

This is first step towards SCYLLADB-197 and lacks many things. In particular
- the API only works for single-DC cluster
- the caller needs to "lock" tablet boundaries with min/max tablet count
- not abortable
- no progress tracking
- sub-optimal (re-kicking API on restore will re-download everything again)
- not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node)
- nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup)

Other follow-up items:
- have an actual swagger object specification for `backup_location`

Closes #28436
Closes #28657
Closes #28773

Closes scylladb/scylladb#28763

* github.com:scylladb/scylladb:
  docs: Update topology_over_raft.md with `restore` transition kind
  test: Add test for backup vs migration race
  test: Restore resilience test
  sstables_loader: Fail tablet-restore task if not all sstables were downloaded
  sstables_loader: mark sstables as downloaded after attaching
  sstables_loader: return shared_sstable from attach_sstable
  db: add update_sstable_download_status method
  db: add downloaded column to snapshot_sstables
  db: extract snapshot_sstables TTL into class constant
  test: Add a test for tablet-aware restore
  tablets: Implement tablet-aware cluster-wide restore
  messaging: Add RESTORE_TABLET RPC verb
  sstables_loader: Add method to download and attach sstables for a tablet
  tablets: Add restore_config to tablet_transition_info
  sstables_loader: Add restore_tablets task skeleton
  test: Add rest_client helper to kick newly introduced API endpoint
  api: Add /storage_service/tablets/restore endpoint skeleton
  sstables_loader: Add keyspace and table arguments to manfiest loading helper
  sstables_loader_helpers: just reformat the code
  sstables_loader_helpers: generalize argument and variable names
  sstables_loader_helpers: generalize get_sstables_for_tablet
  sstables_loader_helpers: add token getters for tablet filtering
  sstables_loader_helpers: remove underscores from struct members
  sstables_loader: move download_sstable and get_sstables_for_tablet
  sstables_loader: extract single-tablet SST filtering
  sstables_loader: make download_sstable static
  sstables_loader: fix formating of the new `download_sstable` function
  sstables_loader: extract single SST download into a function
  sstables_loader: add shard_id to minimal_sst_info
  sstables_loader: add function for parsing backup manifests
  split utility functions for creating test data from database_test
  export make_storage_options_config from lib/test_services
  rjson: Add helpers for conversions to dht::token and sstable_id
  Add system_distributed_keyspace.snapshot_sstables
  add get_system_distributed_keyspace to cql_test_env
  code: Add system_distributed_keyspace dependency to sstables_loader
  storage_service: Export export handle_raft_rpc() helper
  storage_service: Export do_tablet_operation()
  storage_service: Split transit_tablet() into two
  tablets: Add braces around tablet_transition_kind::repair switch
2026-05-12 16:24:13 +03:00
Piotr Dulikowski
7c2b1ea0b5 Merge 'view_building: fix tombstone_warn_threshold warnings' from Michał Jadwiszczak
`system.view_building_tasks` is a single-partition Raft group0 table (pk = `"view_building"`, CK = timeuuid). When `clean_finished_tasks()` deletes hundreds of finished tasks, the physical rows remain in SSTables until compaction. Any subsequent read of the partition counts every column of every tombstoned row
  as a dead cell, triggering `tombstone_warn_threshold` warnings in large clusters.

Two-part fix:

**1. Range tombstones instead of row tombstones (commits 2–3)**

Instead of one row tombstone per finished task, find the minimum alive task UUID (`min_alive_uuid`) and emit a single range tombstone `[before_all, min_alive_uuid)` covering all tasks below that boundary. This reduces the tombstone count significantly and also benefits future compaction.

**2. Bounded scan with `min_task_id` (commits 4–6)**

Even with range tombstones, physical rows remain until compaction and still count as dead cells during reads. The only way to avoid them is to not read them at all.

   - Add a `min_task_id timeuuid` static column to `system.view_building_tasks`.
   - On every GC, write `min_task_id = min_alive_uuid` atomically with the range tombstone (same Raft batch).
   - On reload, read `min_task_id` first using a **static-only partition slice** (empty `_row_ranges` + `always_return_static_content`): the SSTable reader stops immediately after the static row before processing any clustering tombstones — zero dead cells counted.
   - Use `AND id >= min_task_id` as a lower bound for the main task scan, skipping all tombstoned rows.

The static-only read and the bounded scan are gated on the `VIEW_BUILDING_TASKS_MIN_TASK_ID` cluster feature so mixed-version clusters fall back to the full scan.

The issue is not critical, so the fix shouldn't be backported.

Fixes SCYLLADB-657

Closes scylladb/scylladb#28929

* github.com:scylladb/scylladb:
  test/cluster/test_view_building_coordinator: add reproducer for tombstone threshold warning
  docs: document tombstone avoidance in view_building_tasks
  view_building: add `task_uuid_generator` to `view_building_task_mutation_builder`
  view_building: introduce `task_uuid_generator`
  view_building: store `min_alive_uuid` in view building state
  view_building: set min_task_id when GC-ing finished tasks
  view_building: add min_task_id support to view_building_task_mutation_builder
  view_building: add min_task_id static column and bounded scan to system_keyspace
  view_building: use range tombstone when GC-ing finished tasks
  view_building: add range tombstone support to view_building_task_mutation_builder
  view_building: introduce VIEW_BUILDING_TASKS_MIN_TASK_ID cluster feature
2026-05-12 12:38:25 +03:00
Pavel Emelyanov
1c0f8ab66e Merge 'sstables: introduce --abort-on-malformed-sstable-error' from Botond Dénes
When a malformed sstable error occurs, it is usually caused by actual sstable corruption — a cosmic ray, a bad disk write, etc. However, it can also be caused by memory corruption, where a data structure in memory happens to be read as sstable data. In the latter case, having a coredump of the process at the moment of the error is invaluable for post-mortem debugging, since the exception throwing/catching machinery destroys the stack frames that would point to the corruption site.

This patch series introduces `--abort-on-malformed-sstable-error`, a new command-line option (with `LiveUpdate` support) that, when set, causes the server to call `std::abort()` instead of throwing an exception whenever any sstable parse error is detected. This covers all code paths:

- Direct `throw malformed_sstable_exception(...)` sites (migrated to `throw_malformed_sstable_exception()`)
- Direct `throw bufsize_mismatch_exception(...)` sites (migrated to `throw_bufsize_mismatch_exception()`)
- `parse_assert()` failures (via `on_parse_error()`)
- BTI parse errors (via `on_bti_parse_error()`)

The implementation places the flag and helper functions in `sstables/sstables.cc`, next to the existing `on_parse_error()` / `on_bti_parse_error()` infrastructure.

The flag defaults to `false`, preserving current behaviour. It is intended to be enabled temporarily when investigating suspected memory corruption.

**Commit breakdown:**
1. Infrastructure: flag, getter/setter, and throw helpers in `sstables/sstables.cc`; config option wired up in `main.cc`
2. `on_parse_error()` and `on_bti_parse_error()` check the new flag
3. All ~50 `throw malformed_sstable_exception(...)` sites migrated
4. Both `throw bufsize_mismatch_exception(...)` sites migrated

Refs: SCYLLADB-1087
Backport: new feature, no backport

Closes scylladb/scylladb#29324

* github.com:scylladb/scylladb:
  sstables: migrate all bufsize_mismatch_exception throw sites to throw_bufsize_mismatch_exception()
  sstables: migrate all malformed_sstable_exception throw sites to throw_malformed_sstable_exception()
  sstables: make on_parse_error() and on_bti_parse_error() respect --abort-on-malformed-sstable-error
  sstables: disable abort-on-malformed-sstable-error in tests that corrupt sstables on purpose
  sstables: introduce --abort-on-malformed-sstable-error infrastructure
  sstables: refactor parse_path() to return std::expected<> instead of throwing
2026-05-12 12:38:25 +03:00
Ernest Zaslavsky
7eb921a142 db: add update_sstable_download_status method
Add a method to update the downloaded status of a specific SSTable
entry in system_distributed.snapshot_sstables. This will be used
by the tablet restore process to mark SSTables as downloaded after
they have been successfully attached to the local table.
2026-05-12 10:40:23 +03:00
Ernest Zaslavsky
83ec7e22b9 db: add downloaded column to snapshot_sstables
Add a 'downloaded' boolean column to the snapshot_sstables table
schema and the corresponding field to the snapshot_sstable_entry
struct. Update insert_snapshot_sstable() and get_snapshot_sstables()
to write and read this column.
This column will be used to track which SSTables have been
successfully downloaded during a tablet restore operation.

Co-authored-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:23 +03:00
Ernest Zaslavsky
61c627a7c0 db: extract snapshot_sstables TTL into class constant
Move the TTL value used for snapshot_sstables rows from a local
variable in insert_snapshot_sstable() to a class-level constant
SNAPSHOT_SSTABLES_TTL_SECONDS, making it reusable by other methods.
2026-05-12 10:40:23 +03:00
Robert Bindar
2f19d84ad7 Add system_distributed_keyspace.snapshot_sstables
This patch adds the snapshot_sstables table with the following
schema:
```cql
CREATE TABLE system_distributed.snapshot_sstables (
    snapshot_name text,
    keyspace text, table text,
    datacenter text, rack text,
    id uuid,
    first_token bigint, last_token bigint,
    toc_name text, prefix text)
  PRIMARY KEY ((snapshot_name, keyspace, table, datacenter, rack), first_token, id);
```
The table will be populated by the coordinator node during the restore
phase (and later on during the backup phase to accomodate live-restore).
The content of this table is meant to be consumed by the restore worker nodes
which will use this data to filter and file-based download sstables.

Fixes SCYLLADB-263

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2026-05-12 10:40:21 +03:00
Taras Veretilnyk
881776b441 db: add large data metrics for rows, cells, and collections
Previously only large_partition_exceeding_threshold was exposed as a
metric. Add three new counters to large_data_handler::stats and register
corresponding Prometheus metrics:
- large_rows_exceeding_threshold
- large_cell_exceeding_threshold
- large_collection_exceeding_threshold

The counters are incremented in maybe_record_large_rows() and
maybe_record_large_cells() following the same pattern used by the
existing partition metric.
2026-05-11 23:11:17 +02:00
Botond Dénes
ad7ac62835 Merge ' Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key' from Dimitrios Symonidis
Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomesv PRIMARY KEY ((table_id, node_owner), generation).

This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan  must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1562
No need to backport this, keyspace over object storage is experimental feature

Closes scylladb/scylladb#29659

* github.com:scylladb/scylladb:
  db, sstables: add node_owner to sstables registry primary key
  db, sstables: rename sstables registry column owner to table_id
2026-05-11 14:08:19 +03:00
Botond Dénes
f6dc2cb5f8 sstables: introduce --abort-on-malformed-sstable-error infrastructure
Add the --abort-on-malformed-sstable-error command-line option and the
supporting infrastructure. When set, any malformed sstable error will
abort the process and generate a coredump instead of throwing an
exception. This is useful for debugging memory corruption that may
manifest as apparent sstable corruption.

The implementation introduces:
- throw_malformed_sstable_exception() and throw_bufsize_mismatch_exception()
  helper functions in sstables/sstables.cc, which check the new flag and
  either abort (with logging) or throw the appropriate exception.
- set_abort_on_malformed_sstable_error() / abort_on_malformed_sstable_error()
  to control the per-process atomic flag.
- abort_on_malformed_sstable_error config option (LiveUpdate, default false)
  wired up in main.cc alongside abort_on_internal_error.

Call-site migration will follow in subsequent commits.
2026-05-11 11:58:14 +03:00
Botond Dénes
c3daa6379c sstables: refactor parse_path() to return std::expected<> instead of throwing
make_entry_descriptor() and the two overloads of parse_path() used to signal
parse failures by throwing malformed_sstable_exception, which made parse_path()
expensive to use as a probe (e.g. to classify directory entries).

Change make_entry_descriptor() and both parse_path() overloads to return
std::expected<T, sstring>, where the sstring carries the error message on
failure, eliminating the exception overhead at probe call sites.

Call sites that previously caught malformed_sstable_exception to treat the
path as a non-SSTable file (utils/directories.cc, db/snapshot/backup_task.cc,
tools/scylla-sstable.cc) now check the expected result directly.

Call sites where a parse failure is a genuine error (sstable_directory.cc,
sstables.cc, tools/schema_loader.cc, tools/scylla-sstable.cc) re-throw
explicitly as malformed_sstable_exception using the error string, preserving
the existing error propagation behaviour.
2026-05-11 11:58:14 +03:00
Botond Dénes
9b2dfab2e5 Merge 'Don't use database.get_config() to fetch calculate_view_update_throttling_delay option' from Pavel Emelyanov
This option is used in two places -- proxy and view-update-generator both need it to calculate the calculate_view_update_throttling_delay() value. This PR moves the option onto view_update_backlog top-level service, makes the calculating helper be method of that class and patches the callers to use it. This eliminates more places that abuse database as db::config accessor.

Code dependencies refactoring, not backporting

Closes scylladb/scylladb#29635

* github.com:scylladb/scylladb:
  view: Turn calculate_view_update_throttling_delay into node_update_backlog member
  view: Place view_flow_control_delay_limit_in_ms on node_update_backlog
  view: Add node_update_backlog reference to view_update_generator
2026-05-11 10:30:24 +03:00
Botond Dénes
3f72852d8c Merge 'Fix missing format string placeholders across the codebase (33 bugs across 14 modules )' from Yaniv Kaul
Fix 28 format string bugs plus 5 related format argument bugs across 14 modules
where `{}` placeholders were missing or arguments were wrong, causing arguments to
be silently dropped or misleading output from the `{fmt}` library.

Inspired by https://github.com/scylladb/scylladb/pull/29143 (which fixed a single
instance in `replica/table.cc`), a comprehensive audit of the entire codebase was
performed to find all similar issues.

- **Missing `{}` placeholder** (21 instances): format string simply lacks `{}` for a
  passed argument, e.g. `format("msg for table {}", group_id, table_id)` -- `group_id`
  is silently dropped
- **Spurious comma breaking C++ string literal concatenation** (2 instances): a comma
  after a string literal prevents adjacent-literal concatenation, turning the
  continuation into a format argument instead of part of the format string
- **Printf-style `%s` in fmtlib context** (4 instances): `%s` has no meaning in fmtlib
  and appears as literal text while the argument is silently ignored
- **Extra spurious argument** (1 instance): an extraneous `t.tomb()` argument inserted
  between correct arguments, causing wrong values in the wrong slots

- **Wrong variable in error message** (4 instances in `types/map.hh`): error messages
  for oversized map keys/values reported `map_size` (total entry count) instead of the
  actual `elem.first.size()` or `elem.second.size()` that exceeded the limit
- **Swapped argument order** (1 instance in `data_dictionary/data_dictionary.cc`):
  format string says `"Extraneous options for {type}: {values}"` but the values and
  type arguments were passed in reverse order

| Module | Bugs Fixed | Files |
|--------|:---------:|-------|
| `replica/` | 1 | `table.cc` |
| `service/` | 4 | `raft_group0.cc`, `storage_service.cc` |
| `db/` | 6 | `heat_load_balance.cc`, `commitlog_replayer.cc`, `view_update_generator.cc`, `view_building_worker.cc`, `row_locking.cc` |
| `cql3/` | 2 | `prepare_expr.cc`, `statement_restrictions.cc` |
| `transport/` | 4 | `event_notifier.cc` |
| `sstables/` | 3 | `partition_reversing_data_source.cc`, `reader.cc` |
| `alternator/` | 1 | `conditions.cc` |
| `cdc/` | 1 | `split.cc` |
| `raft/` | 1 | `server.cc` |
| `utils/` | 2 | `gcp/object_storage.cc`, `s3/client.cc` |
| `mutation/` | 1 | `mutation_partition.hh` |
| `ent/` | 2 | `kmip_host.cc`, `kms_host.cc` |
| `types/` | 4 | `map.hh` |
| `data_dictionary/` | 1 | `data_dictionary.cc` |

The `{fmt}` library's compile-time checker validates that each `{}` placeholder
references a valid argument, but does **not** verify the reverse -- that every
argument has a corresponding placeholder. Extra arguments are silently ignored
at both compile time and runtime.

Build verified with `dbuild ninja build/dev/scylla` -- compiles cleanly.

---

**Note:** Commits were amended to fix the author name from "Yaniv Michael Kaul" to "Yaniv Kaul".

Closes scylladb/scylladb#29448

* github.com:scylladb/scylladb:
  data_dictionary: fix swapped arguments in extraneous options error
  types: fix wrong variable in map key/value size error messages
  ent: fix missing format placeholders in encryption error/log messages
  mutation: fix spurious argument in shadowable_tombstone formatter
  utils: fix missing format placeholders in object storage log messages
  raft: fix missing format placeholder in server ostream operator
  cdc: fix missing format placeholder in error message
  alternator: fix missing format placeholder in error message
  sstables: fix missing format placeholders in error messages
  transport: fix printf-style format specifiers in fmtlib log calls
  cql3: fix missing format placeholders in error messages
  db: fix missing format placeholders in log and error messages
  service: fix missing format placeholders in log messages
  replica: fix missing format placeholder in cleanup log message
2026-05-11 07:04:42 +03:00
Nadav Har'El
df8c9b17b8 Merge 'alternator: Graduate Alternator Streams from experimental' from Piotr Szymaniak
As a final step for https://scylladb.atlassian.net/browse/SCYLLADB-461 we need to graduate Alternator Streams from experimental.
So let's remove `--experimental-features=alternator-streams` and map the obsolete config string to `UNUSED` for backward compatibility. Also, remove the related gating of the feature.
Finally, stop providing the config flag in test configs.

Fixes SCYLLADB-1680
Fixes #16367

To documentation tracked by https://scylladb.atlassian.net/browse/SCYLLADB-462 still remains.

This PR needs to hit 2026.2, so (only) if it branches before the PR is merged to `master`, we'd need to backport.

Closes scylladb/scylladb#29604

* github.com:scylladb/scylladb:
  test: Stop providing alternator-streams experimental flag
  alternator: Graduate Alternator Streams from experimental
2026-05-10 22:10:03 +03:00
Yaniv Kaul
fdebed5746 db: fix missing format placeholders in log and error messages
Fix six format string bugs where arguments were silently dropped:

- heat_load_balance.cc: pp value was passed but had no {} placeholder.
- commitlog_replayer.cc: column_family_id was passed but table= had
  no {} placeholder.
- view_update_generator.cc: _sstables_with_tables.size() was passed
  but had no {} placeholder.
- view_building_worker.cc: exception pointer was passed but the
  trailing colon had no {} placeholder.
- row_locking.cc: partition key and clustering key were passed in
  error messages but had no {} placeholders.

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
2026-05-10 17:49:50 +03:00