Commit Graph

54228 Commits

Author SHA1 Message Date
Marcin Maliszkiewicz
1dc975c491 Merge 'table_helper: observe detached setup_table() future' from Andrzej Jackowski
During shutdown, group0 may be torn down while
cache_table_info() has a detached setup_table() future
in flight. This causes raft_group_not_found to propagate
as an abandoned failed future.

Add .handle_exception() to log the failure at debug level
instead of leaving the future unobserved.

Fixes: SCYLLADB-2224

Backport to 2026.2 and 2026.1, because the test failed on 2026.1

Closes scylladb/scylladb#30093

* github.com:scylladb/scylladb:
  test: table_helper: verify detached setup failure is consumed
  table_helper: observe detached setup_table() future
2026-06-01 19:32:34 +02:00
Aleksandra Martyniuk
33af16d808 test/cluster/test_tablets: increase timeout for test_multi_rf_of_many_keyspaces_0_N
Multi-RF change handles multiple keyspaces concurrently, but tablet
rebuilds are not all started at once — the load balancer considers
machine load when scheduling them. With 3 keyspaces each having a base
table and materialized view, the total operation time approaches the
default 200s CQL timeout on slow/busy CI machines (observed at ~191s).

Double the timeout to 400s to provide sufficient margin.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2042.

Closes scylladb/scylladb#30018
2026-06-01 20:07:03 +03:00
Michael Litvak
a7a7f02392 test: test_cdc_with_tablets: add read barrier
Add group0 read barrier in test_cdc_with_tablets whenever we observed a
condition such as tablet count change or cdc stream change, and we want
to proceed to check that cdc tables are consistent with the change. For
example, when we wait for tablet count change and then check the cdc
streams changed as well.

The problem is that when we observe the tablet count change, for
example, even though the cdc streams are changed in the same group0
operation, we may observe it during the group0 apply, when the operation
is only partially applied. The read barrier ensures that the change we
observed is fully applied.

Fixes SCYLLADB-2352

Closes scylladb/scylladb#30177
2026-06-01 13:56:01 +02:00
Pavel Emelyanov
d299554a00 sstables: demote verbose I/O logging from debug to trace
Demote the following sstable operation logs from DEBUG to TRACE level:
- Reading component files
- Writing component files
- Touching temp directories
- Removing temp directories

These are low-level, per-file operations that generate excessive log
volume with minimal diagnostic value. They don't deserve debug level
and should only appear when sstable subsystem is explicitly set to
trace for detailed I/O troubleshooting.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#30108
2026-06-01 14:14:53 +03:00
Avi Kivity
57ad4fda70 build: drop python3-pytest-sugar from install-dependencies.sh --future
The python3-pytest-sugar package was orphaned from Fedora [1] and
isn't provided by Fedora 45.

Drop it from the future toolchain so it can build. Arguably it doesn't
belong in the current toolchain either (it's not necessary, just nice)
but I don't want to regenerated the toolchain just for that.

[1] 094443596a

Closes scylladb/scylladb#30141
2026-06-01 13:13:41 +02:00
Yaron Kaikov
aa32d2c425 ci: replace trigger_ci with scylla-ci-route workflow
Replace the old trigger_ci.yaml workflow with a new scylla-ci-route.yaml
that adds smarter CI routing logic including docs-only PR detection,
conflict checking, Jenkins job triggering, and label-based CI options.

Fixes: https://scylladb.atlassian.net/browse/RELENG-52

Closes scylladb/scylladb#29026
2026-06-01 13:54:56 +03:00
Botond Dénes
bb81dbf65e Merge 'guardrails: Add replica-side large data guardrails' from Taras Veretilnyk
Adds write-path guardrails that reject or warn on mutations targeting partitions, rows, or collections that already exceed configured size thresholds, based on SSTable `large_data_record` metadata.
ScyllaDB already detects and records large partitions/rows/cells in `system.large_data_records` after compaction, but takes no preventive action on the write path. Once a partition grows past operational limits it causes latency spikes, OOM, and repair failures. These guardrails let operators set hard and soft thresholds so that writes to already-oversized data are rejected (hard) or logged as warnings (soft) before they make the problem worse.
- **Intrusive index over SSTable metadata**: A per-table `large_data_record_index` maintains three `boost::intrusive::multiset`s (partitions, rows, cells) using `auto_unlink` hooks directly on `large_data_record`. SSTable destruction automatically removes records from the index — no explicit deregistration needed.
- **Virtual dispatch for zero-cost disabled path**: `large_data_guardrail_base` → `noop_large_data_guardrail` / `large_data_guardrail`. Tables without guardrails enabled pay only a virtual call to a no-op. No index is built or maintained for disabled tables.
-  **Schema storage**: The per-table flag is stored as a scylla_tables column, following the tablets pattern: only write a live cell when enabled, omit entirely when disabled. The CQL feature gate prevents enabling until all nodes are upgraded.
- **Write-path integration**: The guardrail check runs in `do_apply` after the frozen mutation is deserialized but before it is applied to the memtable. Hint replay and Paxos learn skip the check via `skip_large_data_guardrails`.
Uses existing `large_*_warn_threshold` config options as soft limits and new `large_*_fail_threshold` options as hard limits. Checked dimensions:
- Partition size (bytes)
- Partition row count
- Row size (bytes)
- Collection element count

Backport is not required

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-180

Closes scylladb/scylladb#29733

* github.com:scylladb/scylladb:
  test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests
  test/cqlpy: add large collection guardrail tests
  test/cqlpy: add large row guardrail tests
  test/cqlpy: add large partition guardrail tests
  test/boost: add large_data_guardrail unit tests
  test/cluster: add large data guardrails rolling upgrade test
  replica: wire large_data_guardrail into the write path
  schema: add per-table large_data_guardrails_enabled flag
  db: implement large_data_guardrail
  db: implement large_data_record_index
  sstables: add intrusive index hook to large_data_record
  db: add large_collection_elements_fail_threshold config option
  db: add large_row_fail_threshold_mb config option
  db: add rows_count_fail_threshold config option
  db: add large_partition_fail_threshold_mb config option
  replica: introduce large_data_exception
2026-06-01 13:26:00 +03:00
Nadav Har'El
b254a9826a test/cluster: add pylib-style nodetool.py
Tests in test/cqlpy use a tiny nodetool-like library, where calls to
nodetool.flush() are translated to the parallel REST API request on
Scylla - but use an external "nodetool" command when running the test
against Cassandra.

Some tests/cluster also began using test/cqlpy/nodetool.py, but it is
NOT a good fit for test/cluster tests, because:

1. It falls back to using the external "nodetool" when it thinks the
   REST API is not available. In cluster tests, no such fallback is
   needed (these tests can't be run on Cassandra). If the REST API is
   down, the test should fail - not fall back to an irrelevant method.

2. The nodetool.flush() et al. functions are not async, and cluster
   tests are supposed (by design...) to only use async APIs.

3. test/cqlpy/nodetool.py was not written in the "style" defined for
   the test/cluster codebase - specifically they don't have docstrings
   or strong typing.

This patch introduces test/pylib/nodetool.py, based on
test/cqlpy/nodetool.py but fixing all the above problems - there are
no Cassandra fallbacks, there are docstrings and type hints, and
all the functions are async.

We also fix the test/cluster tests that used test/cqlpy/nodetool.py to
switch to test/pylib/nodetool.py. Of course it means the newly async
functions need to be "await"ed, not just called, so this patch changes
that too.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#30129
2026-06-01 13:03:29 +03:00
Jenkins Promoter
194435279e Update pgo profiles - aarch64 2026-06-01 05:26:51 +03:00
Piotr Szymaniak
21f1380df1 test/pylib: fix starting server cleanup race
test_localnodes_joining_nodes stops a server while manager.server_add() is still waiting for that server to finish startup. Stopping the process can make the background add_server() fail and run its cleanup path first, removing the server from ScyllaCluster.starting. When the stop request later resumes, its own self.starting.pop(server_id) raises KeyError, which the manager returns as HTTP 500.

The opposite ordering is possible as well: server_stop() can remove the entry before add_server() reaches its finally block.

Make cleanup of ScyllaCluster.starting idempotent in both paths. add_server() remains the normal cleanup path, while server_stop() provides fallback cleanup when it wins the race.

Fixes SCYLLADB-2314

Closes scylladb/scylladb#30128
2026-05-31 23:22:44 +03:00
Nadav Har'El
33dce2b7fc Merge 'cql3: statement_restrictions: continue exploitation of predicate work' from Avi Kivity
In 6165124fcc, we changed analysis of expressions in the WHERE clause
to use predicates, an annotated form of an expression that constrains a column
when the expression is set to true.

Here, we exploit this work to simplify the analysis further, reusing already computed
attributes rather than re-analyzing the expression.

Not backporting, this is a refactor with no functional change and no bugs fixed.

Closes scylladb/scylladb#30049

* github.com:scylladb/scylladb:
  cql3: statement_restrictions: simplify find_idx to return only the index
  cql3: statement_restrictions: replace has_only_eq_binops with tracked booleans
  cql3: statement_restrictions: use index-selection predicates for value_for_index_partition_key
  cql3: statement_restrictions: replace find_clustering_order with predicate order field
  cql3: statement_restrictions: replace has_partition_token with variant check
  cql3: statement_restrictions: replace has_slice with predicate is_slice check
  cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column
  cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering
  cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool
  cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column
  cql3: statement_restrictions: replace find_needs_filtering with predicate op check
  cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool
  cql3: statement_restrictions: replace find_binop column extraction with predicate on field
  cql3: statement_restrictions: set op on all binary-operator-derived predicates
2026-05-31 23:22:43 +03:00
Yaniv Michael Kaul
a396a6b664 utils/histogram: remove redundant std::function from meter_timer
meter_timer stored the callback in a std::function member (32 bytes) and
then passed it to the timer constructor, which stored its own copy as a
noncopyable_function (another 32 bytes). This double-storage is wasteful.

Remove the std::function member and pass the callback directly to the
timer as a noncopyable_function. This saves 32 bytes per meter_timer
instance.

meter_timer is embedded in timed_rate_moving_average (used 4x in
row_cache::stats, 3x in table_stats CAS histograms, etc.), so this
saves 128 bytes per row_cache and 96 bytes per table_stats with CAS
histograms allocated.

Benchmark: neutral (no hot-path change, only reduces struct size).

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes
Backport: no, improvement

Closes scylladb/scylladb#30145
2026-05-31 19:10:03 +03:00
Benny Halevy
d4d43213f6 cql3/statements/describe_statement: use chunked_vector to prevent oversized allocations
Running the 5000 tables scenario using tablets following Scylla warnings appeared:
```
2026-02-23T23:18:31.903 schema-scale-tablets-5000t-2026-1-db-node-77930459-4 !WARNING | scylla[5208]  [shard  1:sl:d] seastar_memory - oversized allocation: 655360 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x320cf9f 0x320cba0 0x1826a28 0x2fb8f97 0x180340e 0x447855e 0x4461c5a 0x161c3c6 0x161c4b3 0x161e9b7 0x551f43c 0x54df6ca /opt/scylladb/libreloc/libc.so.6+0x72463 /opt/scylladb/libreloc/libc.so.6+0xf55ab
seastar::current_backtrace_tasklocal() at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:85
seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136
seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169
seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:865
seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:928
cql3::statements::(anonymous namespace)::tables(data_dictionary::database const&, seastar::lw_shared_ptr<data_dictionary::keyspace_metadata> const&, std::optional<bool>) [clone .resume] at ././seastar/src/core/memory.cc:1727
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<cql3::description, std::allocator<cql3::description> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/coroutine:247
```

This patch replaces the use of `std::vector<description>` with `utils::chunked_vector`
to prevent the large allocation.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-852

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#30146
2026-05-31 14:50:18 +03:00
Avi Kivity
503add224d cql3: statement_restrictions: simplify find_idx to return only the index
The expression returned as the second element of find_idx()'s pair was
stored in view_indexed_table_select_statement::_used_index_restrictions
but never read — dead code. Simplify find_idx() to return just the
optional<index>, and remove the dead member and constructor parameter
from view_indexed_table_select_statement.

The now unused _idx_restrictions is also removed.
2026-05-29 17:18:21 +03:00
Avi Kivity
23d6f458ec cql3: statement_restrictions: replace has_only_eq_binops with tracked booleans
Replace has_only_eq_binops() (which uses find_in_expression to search
the expression tree for non-EQ binary operators) with precomputed
_pk_is_all_eq and _ck_is_all_eq booleans tracked incrementally during
predicate construction. Each predicate's equality field is checked as
it is processed, covering single-column PK/CK predicates, multi-column
CK predicates, and token predicates.

This removes the last find_in_expression call in
statement_restrictions.cc, and eliminates has_only_eq_binops entirely.
2026-05-29 17:13:40 +03:00
Avi Kivity
9e70771600 cql3: statement_restrictions: use index-selection predicates for value_for_index_partition_key
Instead of rebuilding predicates from the expression tree at
build_value_for_index_partition_key_fn() time via to_predicate_on_column,
capture the indexed column's predicates directly in do_find_idx() when
the index is chosen. Store them as _idx_column_predicates and use them
to build the value function without any redundant expression analysis.

This eliminates build_value_for_fn and its call to to_predicate_on_column
from this code path.
2026-05-29 17:13:37 +03:00
Avi Kivity
a402fe1a65 cql3: statement_restrictions: replace find_clustering_order with predicate order field
In build_range_from_raw_bounds_fn(), replace find_clustering_order()
(which uses find_binop to search for a binary_operator with clustering
comparison order) with a direct check of the predicate's order field.
Since each multi-column predicate's filter is a single binary_operator,
extract it directly with as<binary_operator>() instead of searching.
2026-05-29 16:50:02 +03:00
Avi Kivity
b3c1ee230b cql3: statement_restrictions: replace has_partition_token with variant check
Replace has_token_restrictions()'s call to has_partition_token() (which
uses find_binop to search the expression tree for token function calls)
with a direct check on the _partition_range_restrictions variant, which
already records whether token restrictions exist.
2026-05-29 16:50:02 +03:00
Avi Kivity
e10c124cd3 cql3: statement_restrictions: replace has_slice with predicate is_slice check
In the clustering prefix construction loop, replace the has_slice()
call (which uses find_binop to search the merged predicate's expression
tree for slice operators) with a direct check on the individual
predicate vector's is_slice field.
2026-05-29 16:50:02 +03:00
Avi Kivity
4c282f588a cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column
In calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index(),
the code extracted multi-column boolean factors from
_clustering_columns_restrictions. Since multi-column and single-column
CK restrictions cannot be mixed (the constructor enforces this), when
_has_multi_column is true, ALL factors are multi-column. Simplify to
just adding _clustering_columns_restrictions directly when
_has_multi_column is set.

This removes the last caller of contains_multi_column_restriction(),
allowing the function (and its find_binop call) to be removed.
2026-05-29 16:50:02 +03:00
Avi Kivity
dca2cc512e cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering
These helper functions were wrappers around find_binop that are no
longer called, since their call sites have been replaced by
predicate-based checks.
2026-05-29 16:50:02 +03:00
Avi Kivity
eb98aea466 cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool
Replace the has_slice_or_needs_filtering() call on
_partition_key_restrictions (which uses find_binop to walk the
expression tree) with a precomputed _pk_has_slice_or_needs_filtering
boolean tracked incrementally during predicate construction in the
partition key branch.
2026-05-29 16:50:02 +03:00
Avi Kivity
6e27c3a185 cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column
In clustering_key_restrictions_need_filtering(), replace the
contains_multi_column_restriction() call (which uses find_binop to
search for a tuple_constructor LHS in the expression tree) with the
precomputed _has_multi_column boolean that is already tracked
incrementally during predicate construction.
2026-05-29 16:50:01 +03:00
Avi Kivity
ae7eb860a5 cql3: statement_restrictions: replace find_needs_filtering with predicate op check
In the clustering prefix construction loop, replace the
find_needs_filtering() call (which walks the merged predicate's
expression tree looking for needs-filtering binary operators) with a
check on the individual predicate vector. This uses the per-predicate
op field directly instead of searching the expression tree.
2026-05-29 16:50:01 +03:00
Avi Kivity
556262a165 cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool
Replace the two find_binop(_clustering_columns_restrictions, is_on_collection)
calls with a precomputed _ck_is_on_collection boolean that is tracked
incrementally during predicate construction. This avoids walking the
expression tree at each call site.

The is_on_collection check detects CONTAINS/CONTAINS_KEY operators,
which indicate collection restrictions on a clustering key column.
2026-05-29 16:50:01 +03:00
Avi Kivity
569c85032e cql3: statement_restrictions: replace find_binop column extraction with predicate on field
In add_clustering_restrictions_to_idx_ck_prefix(), find_binop was used
to locate any binary_operator in the predicate's filter just to extract
the column from its LHS. Since the predicate already stores this
information in its 'on' field (as on_column for single-column
predicates), use it directly instead of searching the expression tree.
2026-05-29 16:50:01 +03:00
Avi Kivity
240d9be5e2 cql3: statement_restrictions: set op on all binary-operator-derived predicates
The to_predicates() function had fallthrough paths for operators like
LIKE and NOT_IN that created predicates without setting the op field.
This meant predicate-based checks like 'p.op && needs_filtering(*p.op)'
would miss these operators.

Fix by inlining the predicate construction at the fallthrough points
(instead of using cannot_solve_on_column) and setting .op = oper.op.
This ensures all predicates derived from binary operators carry their
operator type, enabling reliable predicate-based analysis.

The cannot_solve_on_column helper is now unused and removed.
2026-05-29 16:50:01 +03:00
Taras Veretilnyk
9abf594397 test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests
Per-table toggle: disabled-at-create, alter-disable, alter-reenable.
LWT exemption: Paxos learn must bypass the guardrail.
Multi-category independence: all three guardrails warn/reject
independently when SSTable records span partition, row, and collection
categories.
2026-05-29 12:51:43 +02:00
Dimitrios Symonidis
4c0a991017 test/cluster: fix proxy resource leak in internode compression test
The test_internode_compression_between_datacenters test was flaky due to
proxy servers and leased host IPs not being cleaned up on failure paths.
If any exception occurred after proxies were started (e.g. during
server_start or driver_connect), the asyncio.Server listeners remained
bound and leased hosts were never released back to HostRegistry. On
subsequent test runs, this caused EADDRINUSE (errno 98) when trying to
bind the same address:port.

Wrap the proxy/server lifecycle in try/finally to ensure proxies are
always stopped and hosts are always released, regardless of whether
the test succeeds or fails.

Fixes: SCYLLADB-2183

Closes scylladb/scylladb#30127
2026-05-29 13:51:43 +03:00
Taras Veretilnyk
7d365844a3 test/cqlpy: add large collection guardrail tests
Tests for collection element-count guardrail: hard-limit rejection,
disabled-when-zero, soft-limit log warning, and no-warning below
threshold.
2026-05-29 12:51:43 +02:00
Taras Veretilnyk
19a9e45da8 test/cqlpy: add large row guardrail tests
Tests for row-size guardrail: hard-limit rejection, disabled-when-zero,
soft-limit log warning, and no-warning below threshold.
2026-05-29 12:51:42 +02:00
Taras Veretilnyk
67b659e2bf test/cqlpy: add large partition guardrail tests
Tests for partition size and row-count guardrails: hard-limit rejection,
disabled-when-zero, soft-limit log warnings, and no-warning below
threshold.  Includes shared helpers and log assertion utilities used by
subsequent commits.
2026-05-29 12:51:42 +02:00
Taras Veretilnyk
ff84b1dbc4 test/boost: add large_data_guardrail unit tests
8 tests covering the record_compare template comparator,
intrusive multiset equal_range grouping with heterogeneous
lookup_key, and auto_unlink on record destruction.
2026-05-29 12:51:42 +02:00
Taras Veretilnyk
0201c1530e test/cluster: add large data guardrails rolling upgrade test
Simulated rolling upgrade: start a 2-node cluster where one node
suppresses the LARGE_DATA_GUARDRAILS feature, verify that enabling
guardrails is rejected, then upgrade the old node and verify that
enabling guardrails succeeds.
2026-05-29 12:51:31 +02:00
Anna Stuchlik
e162728537 docs: fix Confluent CLI commands and add LDAP auth link
- Fix confluent local stop/start commands to use 'services' subcommand
  (confluent local services stop/start) in kafka-connector.rst
- Add LDAP Authentication link to Additional Resources in authentication.rst

Fixes https://github.com/scylladb/scylladb/issues/29121
Fixes https://github.com/scylladb/scylladb/issues/23029

Closes scylladb/scylladb#30116
2026-05-29 13:47:02 +03:00
Pavel Emelyanov
5d0371620d test/backup: Reduce s3 logging from trace to debug
Change s3 log level from TRACE to DEBUG in backup tests.

TRACE level generates excessive log volume with too much low-level
detail about S3 operations. While it was usefult in the early days
of S3 client, nowadays DEBUG level likely provides sufficient
diagnostic information for backup test troubleshooting.

The reduced log volume significantly improves test performance, which
is the main outcome of this change:
- Less I/O time writing logs during test execution
- Faster teardown: each test scans all server logs for errors, and
  smaller logs mean faster grep operations (23.3s → 9.97s for 8-node
  cluster teardown)

Impact on test_restore_with_streaming_scopes[topology4] (8 nodes):
- Log volume: 49 MB → 23 MB (reduced by half)
- Test runtime: 82.55s → 57.53s (30% faster)
- Teardown time: 23.3s → 9.97s (57% faster)

Tests that start smaller clusters also have notable timing improvements

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#30109
2026-05-29 13:46:10 +03:00
Anna Stuchlik
44b4995e01 doc: remove i3en and i4i from AWS instance recommendations
Remove i3en and i4i instance types from the Cloud Instance
Recommendations page, as i7 and i8 instances are better alternatives.

Fixes https://github.com/scylladb/scylladb/issues/29666

Closes scylladb/scylladb#30103
2026-05-29 13:44:38 +03:00
Pavel Emelyanov
24c0ea6b19 sstables_loader: Prevent table destruction during tablet restore download
Similar to e5e6608f20 ("sstables_loader: prevent use-after-free on
table drop during streaming") which fixed the same class of race for
load_and_stream, the tablet restore path also holds a replica::table&
reference across the download_sstable() coroutine without preventing
concurrent table destruction.

If DROP KEYSPACE is applied while download_sstable() is writing SSTable
components to the table's data directory, the directory is removed
mid-write causing ENOENT → abort (with --abort-on-internal-error).

Fix by acquiring a stream_in_progress() phaser guard after
find_column_family() and before download_sstable(). table::stop()
calls _pending_streams_phaser.close() which blocks until all
outstanding guards are released, keeping the table alive for the
duration of the download.

Fixes: SCYLLADB-2187

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#30094
2026-05-29 13:43:37 +03:00
Pavel Emelyanov
8b2ff16cae schema: Move grace_period from schema_ctxt to schema_registry
The schema_registry_grace_period field on schema_ctxt was only used by
schema_registry itself for eviction timing. Move it to be a direct member
of schema_registry, passed at init() time. This removes one db::config
dependency from schema_ctxt.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Closes scylladb/scylladb#30038
2026-05-29 13:42:23 +03:00
Botond Dénes
1384c9523e Merge 'Simplify handler injection call sites to use appropriate existing API' from Pavel Emelyanov
Several error injection call sites use the verbose handler-lambda API when simpler alternatives already exist in the framework. This series converts them to use the appropriate overloads, reducing boilerplate and making the injection intent immediately obvious from the call site.

Cleaning up in-code debugging facilities, no need to backport

Closes scylladb/scylladb#29962

* github.com:scylladb/scylladb:
  error_injection: Convert handler-style breakpoints to wait_for_message sugar
  error_injection: Convert no-op handler injections to enter()/is_enabled()
  error_injection: Convert handler-throw injections to lambda-throw style
  utils: Add share_messages parameter to breakpoint injection API
2026-05-29 13:41:09 +03:00
Botond Dénes
3ae88e31bd Merge 'test/pylib: stop using random ports for MinIO and JMX' from Piotr Smaron
Replace random port selection in MinIO and JMX test helpers with fixed
ports on unique per-test loopback IPs, eliminating TOCTOU races.

Commits:
- kmip_wrapper: default hostname to 127.0.0.1
- nodetool: bind JMX to the per-module loopback IP with fixed port 7199
- minio: use fixed service and console ports on a unique HostRegistry IP
  instead of probing the ephemeral range; raise on start failure

Fixes: SCYLLADB-1817

Minor improvement, no need to backport.

Closes scylladb/scylladb#29741

* github.com:scylladb/scylladb:
  test/pylib: use fixed MinIO ports on unique loopback IPs
  test/nodetool: bind JMX to per-module loopback IP
  test/pylib: default KMIP wrapper to loopback
2026-05-29 13:40:24 +03:00
Taras Veretilnyk
23881db289 replica: wire large_data_guardrail into the write path
Thread the per-table large_data_guardrail through the write path so
that mutations exceeding configured thresholds are rejected before
being applied to the memtable.

The guardrail is selected in database::do_apply — either the table's
own guardrail or a static noop when skip_large_data_guardrails is set.
It flows through apply_in_memory → table::apply →
memtable::apply, where the check runs after partition_builder
deserializes the frozen mutation. For large mutations (>128KB), the
check runs after unfreeze_gently instead.
2026-05-29 12:18:33 +02:00
Taras Veretilnyk
5a0974e781 schema: add per-table large_data_guardrails_enabled flag
Add a per-table large_data_guardrails_enabled flag controlled via the CQL
table property WITH large_data_guardrails_enabled = true|false.

Store the flag as a boolean column in system_schema_ext.scylla_tables.
Only write a live cell when enabled; when disabled (the default), omit
the cell entirely so that old nodes that don't know this column can
still read the SSTable during rolling upgrade or rollback.  When the
property transitions from true to false via ALTER TABLE, a tombstone is
written in make_update_table_mutations to override the previous live
cell — this is safe because the CQL feature gate ensures all nodes are
upgraded before the property can be set to true.

Gate the CQL property behind the LARGE_DATA_GUARDRAILS cluster feature:
attempting to set large_data_guardrails_enabled = true before all nodes
advertise the feature raises a ConfigurationException.
2026-05-29 12:18:33 +02:00
Botond Dénes
46631692cd mutation_fragment_stream_validator: use legacy byte order for same-token partition key comparison
When two partition keys share the same token, their relative order is
determined by their raw serialized bytes (legacy_tri_compare), which
matches the physical on-disk order in SSTables.  The validator was
using partition_key::tri_compare instead — a type-aware comparator
that can disagree with byte order for types like timeuuid.

The result was a false-positive "out-of-order partition key" error
for any two same-token partitions whose timeuuid (or other type-aware)
order is the reverse of their byte order.  In scrub mode this caused
the second partition to be silently dropped.

Fixes: SCYLLADB-2304

Closes scylladb/scylladb#30120
2026-05-29 11:54:20 +02:00
Tomasz Grabiec
5ceabcbcc5 Merge 'tablets: fix update_tablet_metadata failures during bootstrap' from Aleksandra Martyniuk
When partition_split_builder splits a tablet metadata partition into
multiple mutations, the first mutation gets the partition tombstone
and/or static row while subsequent mutations contain only clustered
rows. The hint logic would correctly clear tokens (marking a full
partition read) upon seeing the tombstone in the first mutation, but
then re-add tokens when processing the subsequent row-only mutations.
This caused update_tablet_metadata to attempt a point update via
mutate_tablet_map_async on a tablet map that doesn't exist yet during
bootstrap, throwing no_such_tablet_map and failing the snapshot transfer.

Fix by adding a full_read flag to table_hint. Once a full partition read
is decided (due to partition tombstone, range tombstone, static row, or
row deletion), the flag prevents subsequent mutations for the same table
from re-adding tokens. Additionally, fall back to a full partition read
when the tablet map is missing locally, which happens when the joining
node receives tablet metadata for a table it has never seen before.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2303.

Needs backports to 2026.1+. 2026.1 introduces the regression with b17a36c071

Closes scylladb/scylladb#30115

* github.com:scylladb/scylladb:
  tablets: fall back to full partition read when tablet map is missing
  tablets: fix hint re-adding tokens after full partition read decision
2026-05-29 11:53:36 +02:00
Raphael S. Carvalho
ea3615de1e compaction: fail resharding when out of space prevention is activated
When out-of-space prevention is activated, the compaction manager is
drained and disabled. This caused resharding to silently succeed without
actually processing any SSTables, because:

1. run_custom_job() calls start_compaction() which returns nullopt when
   is_disabled() is true, and run_custom_job() would just return
   immediately — appearing as a successful no-op.

2. reshard() used throw_if_stopping::no, so even within the compaction
   task executor, stopping would be silently swallowed rather than
   propagated as an exception.

The SSTable loader interprets a successful return from resharding as
"all SSTables processed", so it proceeds without error, leaving
the unprocessed SSTables orphaned and their data missing from the table.

Fix this with two changes:

- run_custom_job(): when start_compaction() returns nullopt, check
  is_disabled() and throw via make_disabled_exception() rather than
  returning silently. This ensures callers are always informed when
  a job was skipped because compaction is disabled (e.g. due to disk
  space pressure), as opposed to a benign skip (e.g. table removed).

- reshard(): change throw_if_stopping::no to throw_if_stopping::yes.
  Resharding is mandatory for correct SSTable loading — unlike reshape
  which is optional and can be safely skipped, resharding failure must
  be propagated to the caller so the loader does not proceed with
  incomplete data.

Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2085.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#30041
2026-05-29 12:48:16 +03:00
Botond Dénes
091e3f5191 Merge 'test.py: reduce resource metrics gathering overhead' from Evgeniy Naydanov
Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O.

Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise.

Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent()
that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the
loop control to both space out iterations and allow immediate shutdown responsiveness.

Fixes SCYLLADB-2141

Closes scylladb/scylladb#29987

* github.com:scylladb/scylladb:
  test: use non-blocking cpu_percent in SystemResourceMonitor
  test.py: reduce cgroup overhead in resource metrics gathering
2026-05-29 10:52:17 +03:00
Yaniv Michael Kaul
f90b066405 cql3: lazily allocate _idx_opt behind unique_ptr
Motivation:
The secondary_index::index object stored in statement_restrictions is
approximately 128 bytes (containing index_metadata with its sstring name,
UUID id, and unordered_map options, plus a target_column sstring). This
field is only populated for queries that use secondary indexing, yet every
prepared statement's restrictions object pays the full inline cost.

Replace std::optional<secondary_index::index> with
std::unique_ptr<secondary_index::index>. This reduces the inline size
from 136 bytes to 8 bytes, saving 128 bytes per non-index-using
prepared statement cached in the prepared statement cache.

The semantics are preserved: null unique_ptr is equivalent to
std::nullopt, and the dereference patterns (-> and *) work identically.
The find_idx() method that returns a copy constructs an optional from
the dereferenced pointer when non-null.

Tests:
- statement_restrictions_test builds and passes
- Full release build compiles cleanly

Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes
Backport: no, improvement

Closes scylladb/scylladb#30046
2026-05-28 21:35:25 +03:00
Taras Veretilnyk
f7ffc64703 db: implement large_data_guardrail
Checks partition size, row count, row size, and collection element count
against config thresholds using large_data_record_index lookups.
Warns on soft limit, throws large_data_exception on hard limit.
2026-05-28 18:29:32 +02:00
Aleksandra Martyniuk
491db28fbf tablets: fall back to full partition read when tablet map is missing
When update_tablet_metadata receives a hint with non-empty tokens for a
table whose tablet map doesn't exist locally yet, it would call
mutate_tablet_map_async which throws no_such_tablet_map. This happens
during bootstrap when the joining node receives tablet metadata for a
table it has never seen before.

Fix by checking has_tablet_map() before attempting the point update. If
the map is missing, fall back to do_update_tablet_metadata_partition
which reads the full partition from system.tablets and creates the map.
2026-05-28 16:23:45 +02:00