Demote the following sstable operation logs from DEBUG to TRACE level:
- Reading component files
- Writing component files
- Touching temp directories
- Removing temp directories
These are low-level, per-file operations that generate excessive log
volume with minimal diagnostic value. They don't deserve debug level
and should only appear when sstable subsystem is explicitly set to
trace for detailed I/O troubleshooting.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#30108
The python3-pytest-sugar package was orphaned from Fedora [1] and
isn't provided by Fedora 45.
Drop it from the future toolchain so it can build. Arguably it doesn't
belong in the current toolchain either (it's not necessary, just nice)
but I don't want to regenerated the toolchain just for that.
[1] 094443596aClosesscylladb/scylladb#30141
Replace the old trigger_ci.yaml workflow with a new scylla-ci-route.yaml
that adds smarter CI routing logic including docs-only PR detection,
conflict checking, Jenkins job triggering, and label-based CI options.
Fixes: https://scylladb.atlassian.net/browse/RELENG-52Closesscylladb/scylladb#29026
Adds write-path guardrails that reject or warn on mutations targeting partitions, rows, or collections that already exceed configured size thresholds, based on SSTable `large_data_record` metadata.
ScyllaDB already detects and records large partitions/rows/cells in `system.large_data_records` after compaction, but takes no preventive action on the write path. Once a partition grows past operational limits it causes latency spikes, OOM, and repair failures. These guardrails let operators set hard and soft thresholds so that writes to already-oversized data are rejected (hard) or logged as warnings (soft) before they make the problem worse.
- **Intrusive index over SSTable metadata**: A per-table `large_data_record_index` maintains three `boost::intrusive::multiset`s (partitions, rows, cells) using `auto_unlink` hooks directly on `large_data_record`. SSTable destruction automatically removes records from the index — no explicit deregistration needed.
- **Virtual dispatch for zero-cost disabled path**: `large_data_guardrail_base` → `noop_large_data_guardrail` / `large_data_guardrail`. Tables without guardrails enabled pay only a virtual call to a no-op. No index is built or maintained for disabled tables.
- **Schema storage**: The per-table flag is stored as a scylla_tables column, following the tablets pattern: only write a live cell when enabled, omit entirely when disabled. The CQL feature gate prevents enabling until all nodes are upgraded.
- **Write-path integration**: The guardrail check runs in `do_apply` after the frozen mutation is deserialized but before it is applied to the memtable. Hint replay and Paxos learn skip the check via `skip_large_data_guardrails`.
Uses existing `large_*_warn_threshold` config options as soft limits and new `large_*_fail_threshold` options as hard limits. Checked dimensions:
- Partition size (bytes)
- Partition row count
- Row size (bytes)
- Collection element count
Backport is not required
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-180Closesscylladb/scylladb#29733
* github.com:scylladb/scylladb:
test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests
test/cqlpy: add large collection guardrail tests
test/cqlpy: add large row guardrail tests
test/cqlpy: add large partition guardrail tests
test/boost: add large_data_guardrail unit tests
test/cluster: add large data guardrails rolling upgrade test
replica: wire large_data_guardrail into the write path
schema: add per-table large_data_guardrails_enabled flag
db: implement large_data_guardrail
db: implement large_data_record_index
sstables: add intrusive index hook to large_data_record
db: add large_collection_elements_fail_threshold config option
db: add large_row_fail_threshold_mb config option
db: add rows_count_fail_threshold config option
db: add large_partition_fail_threshold_mb config option
replica: introduce large_data_exception
Tests in test/cqlpy use a tiny nodetool-like library, where calls to
nodetool.flush() are translated to the parallel REST API request on
Scylla - but use an external "nodetool" command when running the test
against Cassandra.
Some tests/cluster also began using test/cqlpy/nodetool.py, but it is
NOT a good fit for test/cluster tests, because:
1. It falls back to using the external "nodetool" when it thinks the
REST API is not available. In cluster tests, no such fallback is
needed (these tests can't be run on Cassandra). If the REST API is
down, the test should fail - not fall back to an irrelevant method.
2. The nodetool.flush() et al. functions are not async, and cluster
tests are supposed (by design...) to only use async APIs.
3. test/cqlpy/nodetool.py was not written in the "style" defined for
the test/cluster codebase - specifically they don't have docstrings
or strong typing.
This patch introduces test/pylib/nodetool.py, based on
test/cqlpy/nodetool.py but fixing all the above problems - there are
no Cassandra fallbacks, there are docstrings and type hints, and
all the functions are async.
We also fix the test/cluster tests that used test/cqlpy/nodetool.py to
switch to test/pylib/nodetool.py. Of course it means the newly async
functions need to be "await"ed, not just called, so this patch changes
that too.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#30129
test_localnodes_joining_nodes stops a server while manager.server_add() is still waiting for that server to finish startup. Stopping the process can make the background add_server() fail and run its cleanup path first, removing the server from ScyllaCluster.starting. When the stop request later resumes, its own self.starting.pop(server_id) raises KeyError, which the manager returns as HTTP 500.
The opposite ordering is possible as well: server_stop() can remove the entry before add_server() reaches its finally block.
Make cleanup of ScyllaCluster.starting idempotent in both paths. add_server() remains the normal cleanup path, while server_stop() provides fallback cleanup when it wins the race.
Fixes SCYLLADB-2314
Closesscylladb/scylladb#30128
In 6165124fcc, we changed analysis of expressions in the WHERE clause
to use predicates, an annotated form of an expression that constrains a column
when the expression is set to true.
Here, we exploit this work to simplify the analysis further, reusing already computed
attributes rather than re-analyzing the expression.
Not backporting, this is a refactor with no functional change and no bugs fixed.
Closesscylladb/scylladb#30049
* github.com:scylladb/scylladb:
cql3: statement_restrictions: simplify find_idx to return only the index
cql3: statement_restrictions: replace has_only_eq_binops with tracked booleans
cql3: statement_restrictions: use index-selection predicates for value_for_index_partition_key
cql3: statement_restrictions: replace find_clustering_order with predicate order field
cql3: statement_restrictions: replace has_partition_token with variant check
cql3: statement_restrictions: replace has_slice with predicate is_slice check
cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column
cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering
cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool
cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column
cql3: statement_restrictions: replace find_needs_filtering with predicate op check
cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool
cql3: statement_restrictions: replace find_binop column extraction with predicate on field
cql3: statement_restrictions: set op on all binary-operator-derived predicates
meter_timer stored the callback in a std::function member (32 bytes) and
then passed it to the timer constructor, which stored its own copy as a
noncopyable_function (another 32 bytes). This double-storage is wasteful.
Remove the std::function member and pass the callback directly to the
timer as a noncopyable_function. This saves 32 bytes per meter_timer
instance.
meter_timer is embedded in timed_rate_moving_average (used 4x in
row_cache::stats, 3x in table_stats CAS histograms, etc.), so this
saves 128 bytes per row_cache and 96 bytes per table_stats with CAS
histograms allocated.
Benchmark: neutral (no hot-path change, only reduces struct size).
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes
Backport: no, improvement
Closesscylladb/scylladb#30145
Running the 5000 tables scenario using tablets following Scylla warnings appeared:
```
2026-02-23T23:18:31.903 schema-scale-tablets-5000t-2026-1-db-node-77930459-4 !WARNING | scylla[5208] [shard 1:sl:d] seastar_memory - oversized allocation: 655360 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x320cf9f 0x320cba0 0x1826a28 0x2fb8f97 0x180340e 0x447855e 0x4461c5a 0x161c3c6 0x161c4b3 0x161e9b7 0x551f43c 0x54df6ca /opt/scylladb/libreloc/libc.so.6+0x72463 /opt/scylladb/libreloc/libc.so.6+0xf55ab
seastar::current_backtrace_tasklocal() at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:85
seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136
seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169
seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:865
seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:928
cql3::statements::(anonymous namespace)::tables(data_dictionary::database const&, seastar::lw_shared_ptr<data_dictionary::keyspace_metadata> const&, std::optional<bool>) [clone .resume] at ././seastar/src/core/memory.cc:1727
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<cql3::description, std::allocator<cql3::description> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/coroutine:247
```
This patch replaces the use of `std::vector<description>` with `utils::chunked_vector`
to prevent the large allocation.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-852
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#30146
The expression returned as the second element of find_idx()'s pair was
stored in view_indexed_table_select_statement::_used_index_restrictions
but never read — dead code. Simplify find_idx() to return just the
optional<index>, and remove the dead member and constructor parameter
from view_indexed_table_select_statement.
The now unused _idx_restrictions is also removed.
Replace has_only_eq_binops() (which uses find_in_expression to search
the expression tree for non-EQ binary operators) with precomputed
_pk_is_all_eq and _ck_is_all_eq booleans tracked incrementally during
predicate construction. Each predicate's equality field is checked as
it is processed, covering single-column PK/CK predicates, multi-column
CK predicates, and token predicates.
This removes the last find_in_expression call in
statement_restrictions.cc, and eliminates has_only_eq_binops entirely.
Instead of rebuilding predicates from the expression tree at
build_value_for_index_partition_key_fn() time via to_predicate_on_column,
capture the indexed column's predicates directly in do_find_idx() when
the index is chosen. Store them as _idx_column_predicates and use them
to build the value function without any redundant expression analysis.
This eliminates build_value_for_fn and its call to to_predicate_on_column
from this code path.
In build_range_from_raw_bounds_fn(), replace find_clustering_order()
(which uses find_binop to search for a binary_operator with clustering
comparison order) with a direct check of the predicate's order field.
Since each multi-column predicate's filter is a single binary_operator,
extract it directly with as<binary_operator>() instead of searching.
Replace has_token_restrictions()'s call to has_partition_token() (which
uses find_binop to search the expression tree for token function calls)
with a direct check on the _partition_range_restrictions variant, which
already records whether token restrictions exist.
In the clustering prefix construction loop, replace the has_slice()
call (which uses find_binop to search the merged predicate's expression
tree for slice operators) with a direct check on the individual
predicate vector's is_slice field.
In calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index(),
the code extracted multi-column boolean factors from
_clustering_columns_restrictions. Since multi-column and single-column
CK restrictions cannot be mixed (the constructor enforces this), when
_has_multi_column is true, ALL factors are multi-column. Simplify to
just adding _clustering_columns_restrictions directly when
_has_multi_column is set.
This removes the last caller of contains_multi_column_restriction(),
allowing the function (and its find_binop call) to be removed.
Replace the has_slice_or_needs_filtering() call on
_partition_key_restrictions (which uses find_binop to walk the
expression tree) with a precomputed _pk_has_slice_or_needs_filtering
boolean tracked incrementally during predicate construction in the
partition key branch.
In clustering_key_restrictions_need_filtering(), replace the
contains_multi_column_restriction() call (which uses find_binop to
search for a tuple_constructor LHS in the expression tree) with the
precomputed _has_multi_column boolean that is already tracked
incrementally during predicate construction.
In the clustering prefix construction loop, replace the
find_needs_filtering() call (which walks the merged predicate's
expression tree looking for needs-filtering binary operators) with a
check on the individual predicate vector. This uses the per-predicate
op field directly instead of searching the expression tree.
Replace the two find_binop(_clustering_columns_restrictions, is_on_collection)
calls with a precomputed _ck_is_on_collection boolean that is tracked
incrementally during predicate construction. This avoids walking the
expression tree at each call site.
The is_on_collection check detects CONTAINS/CONTAINS_KEY operators,
which indicate collection restrictions on a clustering key column.
In add_clustering_restrictions_to_idx_ck_prefix(), find_binop was used
to locate any binary_operator in the predicate's filter just to extract
the column from its LHS. Since the predicate already stores this
information in its 'on' field (as on_column for single-column
predicates), use it directly instead of searching the expression tree.
The to_predicates() function had fallthrough paths for operators like
LIKE and NOT_IN that created predicates without setting the op field.
This meant predicate-based checks like 'p.op && needs_filtering(*p.op)'
would miss these operators.
Fix by inlining the predicate construction at the fallthrough points
(instead of using cannot_solve_on_column) and setting .op = oper.op.
This ensures all predicates derived from binary operators carry their
operator type, enabling reliable predicate-based analysis.
The cannot_solve_on_column helper is now unused and removed.
Per-table toggle: disabled-at-create, alter-disable, alter-reenable.
LWT exemption: Paxos learn must bypass the guardrail.
Multi-category independence: all three guardrails warn/reject
independently when SSTable records span partition, row, and collection
categories.
The test_internode_compression_between_datacenters test was flaky due to
proxy servers and leased host IPs not being cleaned up on failure paths.
If any exception occurred after proxies were started (e.g. during
server_start or driver_connect), the asyncio.Server listeners remained
bound and leased hosts were never released back to HostRegistry. On
subsequent test runs, this caused EADDRINUSE (errno 98) when trying to
bind the same address:port.
Wrap the proxy/server lifecycle in try/finally to ensure proxies are
always stopped and hosts are always released, regardless of whether
the test succeeds or fails.
Fixes: SCYLLADB-2183
Closesscylladb/scylladb#30127
Tests for partition size and row-count guardrails: hard-limit rejection,
disabled-when-zero, soft-limit log warnings, and no-warning below
threshold. Includes shared helpers and log assertion utilities used by
subsequent commits.
8 tests covering the record_compare template comparator,
intrusive multiset equal_range grouping with heterogeneous
lookup_key, and auto_unlink on record destruction.
Simulated rolling upgrade: start a 2-node cluster where one node
suppresses the LARGE_DATA_GUARDRAILS feature, verify that enabling
guardrails is rejected, then upgrade the old node and verify that
enabling guardrails succeeds.
Change s3 log level from TRACE to DEBUG in backup tests.
TRACE level generates excessive log volume with too much low-level
detail about S3 operations. While it was usefult in the early days
of S3 client, nowadays DEBUG level likely provides sufficient
diagnostic information for backup test troubleshooting.
The reduced log volume significantly improves test performance, which
is the main outcome of this change:
- Less I/O time writing logs during test execution
- Faster teardown: each test scans all server logs for errors, and
smaller logs mean faster grep operations (23.3s → 9.97s for 8-node
cluster teardown)
Impact on test_restore_with_streaming_scopes[topology4] (8 nodes):
- Log volume: 49 MB → 23 MB (reduced by half)
- Test runtime: 82.55s → 57.53s (30% faster)
- Teardown time: 23.3s → 9.97s (57% faster)
Tests that start smaller clusters also have notable timing improvements
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#30109
Similar to e5e6608f20 ("sstables_loader: prevent use-after-free on
table drop during streaming") which fixed the same class of race for
load_and_stream, the tablet restore path also holds a replica::table&
reference across the download_sstable() coroutine without preventing
concurrent table destruction.
If DROP KEYSPACE is applied while download_sstable() is writing SSTable
components to the table's data directory, the directory is removed
mid-write causing ENOENT → abort (with --abort-on-internal-error).
Fix by acquiring a stream_in_progress() phaser guard after
find_column_family() and before download_sstable(). table::stop()
calls _pending_streams_phaser.close() which blocks until all
outstanding guards are released, keeping the table alive for the
duration of the download.
Fixes: SCYLLADB-2187
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#30094
The schema_registry_grace_period field on schema_ctxt was only used by
schema_registry itself for eviction timing. Move it to be a direct member
of schema_registry, passed at init() time. This removes one db::config
dependency from schema_ctxt.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Closesscylladb/scylladb#30038
Several error injection call sites use the verbose handler-lambda API when simpler alternatives already exist in the framework. This series converts them to use the appropriate overloads, reducing boilerplate and making the injection intent immediately obvious from the call site.
Cleaning up in-code debugging facilities, no need to backport
Closesscylladb/scylladb#29962
* github.com:scylladb/scylladb:
error_injection: Convert handler-style breakpoints to wait_for_message sugar
error_injection: Convert no-op handler injections to enter()/is_enabled()
error_injection: Convert handler-throw injections to lambda-throw style
utils: Add share_messages parameter to breakpoint injection API
Replace random port selection in MinIO and JMX test helpers with fixed
ports on unique per-test loopback IPs, eliminating TOCTOU races.
Commits:
- kmip_wrapper: default hostname to 127.0.0.1
- nodetool: bind JMX to the per-module loopback IP with fixed port 7199
- minio: use fixed service and console ports on a unique HostRegistry IP
instead of probing the ephemeral range; raise on start failure
Fixes: SCYLLADB-1817
Minor improvement, no need to backport.
Closesscylladb/scylladb#29741
* github.com:scylladb/scylladb:
test/pylib: use fixed MinIO ports on unique loopback IPs
test/nodetool: bind JMX to per-module loopback IP
test/pylib: default KMIP wrapper to loopback
Thread the per-table large_data_guardrail through the write path so
that mutations exceeding configured thresholds are rejected before
being applied to the memtable.
The guardrail is selected in database::do_apply — either the table's
own guardrail or a static noop when skip_large_data_guardrails is set.
It flows through apply_in_memory → table::apply →
memtable::apply, where the check runs after partition_builder
deserializes the frozen mutation. For large mutations (>128KB), the
check runs after unfreeze_gently instead.
Add a per-table large_data_guardrails_enabled flag controlled via the CQL
table property WITH large_data_guardrails_enabled = true|false.
Store the flag as a boolean column in system_schema_ext.scylla_tables.
Only write a live cell when enabled; when disabled (the default), omit
the cell entirely so that old nodes that don't know this column can
still read the SSTable during rolling upgrade or rollback. When the
property transitions from true to false via ALTER TABLE, a tombstone is
written in make_update_table_mutations to override the previous live
cell — this is safe because the CQL feature gate ensures all nodes are
upgraded before the property can be set to true.
Gate the CQL property behind the LARGE_DATA_GUARDRAILS cluster feature:
attempting to set large_data_guardrails_enabled = true before all nodes
advertise the feature raises a ConfigurationException.
When two partition keys share the same token, their relative order is
determined by their raw serialized bytes (legacy_tri_compare), which
matches the physical on-disk order in SSTables. The validator was
using partition_key::tri_compare instead — a type-aware comparator
that can disagree with byte order for types like timeuuid.
The result was a false-positive "out-of-order partition key" error
for any two same-token partitions whose timeuuid (or other type-aware)
order is the reverse of their byte order. In scrub mode this caused
the second partition to be silently dropped.
Fixes: SCYLLADB-2304
Closesscylladb/scylladb#30120
When partition_split_builder splits a tablet metadata partition into
multiple mutations, the first mutation gets the partition tombstone
and/or static row while subsequent mutations contain only clustered
rows. The hint logic would correctly clear tokens (marking a full
partition read) upon seeing the tombstone in the first mutation, but
then re-add tokens when processing the subsequent row-only mutations.
This caused update_tablet_metadata to attempt a point update via
mutate_tablet_map_async on a tablet map that doesn't exist yet during
bootstrap, throwing no_such_tablet_map and failing the snapshot transfer.
Fix by adding a full_read flag to table_hint. Once a full partition read
is decided (due to partition tombstone, range tombstone, static row, or
row deletion), the flag prevents subsequent mutations for the same table
from re-adding tokens. Additionally, fall back to a full partition read
when the tablet map is missing locally, which happens when the joining
node receives tablet metadata for a table it has never seen before.
Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2303.
Needs backports to 2026.1+. 2026.1 introduces the regression with b17a36c071Closesscylladb/scylladb#30115
* github.com:scylladb/scylladb:
tablets: fall back to full partition read when tablet map is missing
tablets: fix hint re-adding tokens after full partition read decision
When out-of-space prevention is activated, the compaction manager is
drained and disabled. This caused resharding to silently succeed without
actually processing any SSTables, because:
1. run_custom_job() calls start_compaction() which returns nullopt when
is_disabled() is true, and run_custom_job() would just return
immediately — appearing as a successful no-op.
2. reshard() used throw_if_stopping::no, so even within the compaction
task executor, stopping would be silently swallowed rather than
propagated as an exception.
The SSTable loader interprets a successful return from resharding as
"all SSTables processed", so it proceeds without error, leaving
the unprocessed SSTables orphaned and their data missing from the table.
Fix this with two changes:
- run_custom_job(): when start_compaction() returns nullopt, check
is_disabled() and throw via make_disabled_exception() rather than
returning silently. This ensures callers are always informed when
a job was skipped because compaction is disabled (e.g. due to disk
space pressure), as opposed to a benign skip (e.g. table removed).
- reshard(): change throw_if_stopping::no to throw_if_stopping::yes.
Resharding is mandatory for correct SSTable loading — unlike reshape
which is optional and can be safely skipped, resharding failure must
be propagated to the caller so the loader does not proceed with
incomplete data.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2085.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#30041
Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O.
Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise.
Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent()
that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the
loop control to both space out iterations and allow immediate shutdown responsiveness.
Fixes SCYLLADB-2141
Closesscylladb/scylladb#29987
* github.com:scylladb/scylladb:
test: use non-blocking cpu_percent in SystemResourceMonitor
test.py: reduce cgroup overhead in resource metrics gathering
Motivation:
The secondary_index::index object stored in statement_restrictions is
approximately 128 bytes (containing index_metadata with its sstring name,
UUID id, and unordered_map options, plus a target_column sstring). This
field is only populated for queries that use secondary indexing, yet every
prepared statement's restrictions object pays the full inline cost.
Replace std::optional<secondary_index::index> with
std::unique_ptr<secondary_index::index>. This reduces the inline size
from 136 bytes to 8 bytes, saving 128 bytes per non-index-using
prepared statement cached in the prepared statement cache.
The semantics are preserved: null unique_ptr is equivalent to
std::nullopt, and the dereference patterns (-> and *) work identically.
The find_idx() method that returns a copy constructs an optional from
the dereferenced pointer when non-null.
Tests:
- statement_restrictions_test builds and passes
- Full release build compiles cleanly
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
AI-assisted: Yes
Backport: no, improvement
Closesscylladb/scylladb#30046
Checks partition size, row count, row size, and collection element count
against config thresholds using large_data_record_index lookups.
Warns on soft limit, throws large_data_exception on hard limit.
When update_tablet_metadata receives a hint with non-empty tokens for a
table whose tablet map doesn't exist locally yet, it would call
mutate_tablet_map_async which throws no_such_tablet_map. This happens
during bootstrap when the joining node receives tablet metadata for a
table it has never seen before.
Fix by checking has_tablet_map() before attempting the point update. If
the map is missing, fall back to do_update_tablet_metadata_partition
which reads the full partition from system.tablets and creates the map.
Per-table index over large_data_records from all live SSTables.
Uses three intrusive multisets (partitions, rows, cells) with member
hooks directly on large_data_record. Auto-unlink handles cleanup
when SSTables are destroyed. Aggregation (max across SSTables for
the same key) happens at lookup time via equal_range.
Add a set_member_hook<auto_unlink> to large_data_record so it can
participate directly in a boost::intrusive::multiset without wrapper
nodes or side maps. The hook is not listed in describe_type() and
is therefore not serialized.
Also add a non-const get_large_data_records() overload to sstable
so that register_sstable can obtain mutable references to records
for hook manipulation.
Clang 22 and below ignore method accessibility when checking concepts. Clang 23 now [1] checks
accessibility. Make relevant methods public so concepts that check them have access.
The problem was that the concepts were evaluated at the use-site, which was a friend, but should have
been evaluated in some friendless global context. After the clang fix, the problems in our code were exposed.
[1] ac3c588739
Preparing for a new toolchain, so not backporting.
Closesscylladb/scylladb#30053
* github.com:scylladb/scylladb:
compacting_reader: make consume() methods public
mutation_fragment_v1_stream: make consume() methods public