scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 21:17:01 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	1dc975c491	Merge 'table_helper: observe detached setup_table() future' from Andrzej Jackowski During shutdown, group0 may be torn down while cache_table_info() has a detached setup_table() future in flight. This causes raft_group_not_found to propagate as an abandoned failed future. Add .handle_exception() to log the failure at debug level instead of leaving the future unobserved. Fixes: SCYLLADB-2224 Backport to 2026.2 and 2026.1, because the test failed on 2026.1 Closes scylladb/scylladb#30093 * github.com:scylladb/scylladb: test: table_helper: verify detached setup failure is consumed table_helper: observe detached setup_table() future	2026-06-01 19:32:34 +02:00
Aleksandra Martyniuk	33af16d808	test/cluster/test_tablets: increase timeout for test_multi_rf_of_many_keyspaces_0_N Multi-RF change handles multiple keyspaces concurrently, but tablet rebuilds are not all started at once — the load balancer considers machine load when scheduling them. With 3 keyspaces each having a base table and materialized view, the total operation time approaches the default 200s CQL timeout on slow/busy CI machines (observed at ~191s). Double the timeout to 400s to provide sufficient margin. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2042. Closes scylladb/scylladb#30018	2026-06-01 20:07:03 +03:00
Michael Litvak	a7a7f02392	test: test_cdc_with_tablets: add read barrier Add group0 read barrier in test_cdc_with_tablets whenever we observed a condition such as tablet count change or cdc stream change, and we want to proceed to check that cdc tables are consistent with the change. For example, when we wait for tablet count change and then check the cdc streams changed as well. The problem is that when we observe the tablet count change, for example, even though the cdc streams are changed in the same group0 operation, we may observe it during the group0 apply, when the operation is only partially applied. The read barrier ensures that the change we observed is fully applied. Fixes SCYLLADB-2352 Closes scylladb/scylladb#30177	2026-06-01 13:56:01 +02:00
Pavel Emelyanov	d299554a00	sstables: demote verbose I/O logging from debug to trace Demote the following sstable operation logs from DEBUG to TRACE level: - Reading component files - Writing component files - Touching temp directories - Removing temp directories These are low-level, per-file operations that generate excessive log volume with minimal diagnostic value. They don't deserve debug level and should only appear when sstable subsystem is explicitly set to trace for detailed I/O troubleshooting. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30108	2026-06-01 14:14:53 +03:00
Avi Kivity	57ad4fda70	build: drop python3-pytest-sugar from install-dependencies.sh --future The python3-pytest-sugar package was orphaned from Fedora [1] and isn't provided by Fedora 45. Drop it from the future toolchain so it can build. Arguably it doesn't belong in the current toolchain either (it's not necessary, just nice) but I don't want to regenerated the toolchain just for that. [1] `094443596a` Closes scylladb/scylladb#30141	2026-06-01 13:13:41 +02:00
Yaron Kaikov	aa32d2c425	ci: replace trigger_ci with scylla-ci-route workflow Replace the old trigger_ci.yaml workflow with a new scylla-ci-route.yaml that adds smarter CI routing logic including docs-only PR detection, conflict checking, Jenkins job triggering, and label-based CI options. Fixes: https://scylladb.atlassian.net/browse/RELENG-52 Closes scylladb/scylladb#29026	2026-06-01 13:54:56 +03:00
Botond Dénes	bb81dbf65e	Merge 'guardrails: Add replica-side large data guardrails' from Taras Veretilnyk Adds write-path guardrails that reject or warn on mutations targeting partitions, rows, or collections that already exceed configured size thresholds, based on SSTable `large_data_record` metadata. ScyllaDB already detects and records large partitions/rows/cells in `system.large_data_records` after compaction, but takes no preventive action on the write path. Once a partition grows past operational limits it causes latency spikes, OOM, and repair failures. These guardrails let operators set hard and soft thresholds so that writes to already-oversized data are rejected (hard) or logged as warnings (soft) before they make the problem worse. - Intrusive index over SSTable metadata: A per-table `large_data_record_index` maintains three `boost::intrusive::multiset`s (partitions, rows, cells) using `auto_unlink` hooks directly on `large_data_record`. SSTable destruction automatically removes records from the index — no explicit deregistration needed. - Virtual dispatch for zero-cost disabled path: `large_data_guardrail_base` → `noop_large_data_guardrail` / `large_data_guardrail`. Tables without guardrails enabled pay only a virtual call to a no-op. No index is built or maintained for disabled tables. - Schema storage: The per-table flag is stored as a scylla_tables column, following the tablets pattern: only write a live cell when enabled, omit entirely when disabled. The CQL feature gate prevents enabling until all nodes are upgraded. - Write-path integration: The guardrail check runs in `do_apply` after the frozen mutation is deserialized but before it is applied to the memtable. Hint replay and Paxos learn skip the check via `skip_large_data_guardrails`. Uses existing `large__warn_threshold` config options as soft limits and new `large__fail_threshold` options as hard limits. Checked dimensions: - Partition size (bytes) - Partition row count - Row size (bytes) - Collection element count Backport is not required Fixes https://scylladb.atlassian.net/browse/SCYLLADB-180 Closes scylladb/scylladb#29733 * github.com:scylladb/scylladb: test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests test/cqlpy: add large collection guardrail tests test/cqlpy: add large row guardrail tests test/cqlpy: add large partition guardrail tests test/boost: add large_data_guardrail unit tests test/cluster: add large data guardrails rolling upgrade test replica: wire large_data_guardrail into the write path schema: add per-table large_data_guardrails_enabled flag db: implement large_data_guardrail db: implement large_data_record_index sstables: add intrusive index hook to large_data_record db: add large_collection_elements_fail_threshold config option db: add large_row_fail_threshold_mb config option db: add rows_count_fail_threshold config option db: add large_partition_fail_threshold_mb config option replica: introduce large_data_exception	2026-06-01 13:26:00 +03:00
Nadav Har'El	b254a9826a	test/cluster: add pylib-style nodetool.py Tests in test/cqlpy use a tiny nodetool-like library, where calls to nodetool.flush() are translated to the parallel REST API request on Scylla - but use an external "nodetool" command when running the test against Cassandra. Some tests/cluster also began using test/cqlpy/nodetool.py, but it is NOT a good fit for test/cluster tests, because: 1. It falls back to using the external "nodetool" when it thinks the REST API is not available. In cluster tests, no such fallback is needed (these tests can't be run on Cassandra). If the REST API is down, the test should fail - not fall back to an irrelevant method. 2. The nodetool.flush() et al. functions are not async, and cluster tests are supposed (by design...) to only use async APIs. 3. test/cqlpy/nodetool.py was not written in the "style" defined for the test/cluster codebase - specifically they don't have docstrings or strong typing. This patch introduces test/pylib/nodetool.py, based on test/cqlpy/nodetool.py but fixing all the above problems - there are no Cassandra fallbacks, there are docstrings and type hints, and all the functions are async. We also fix the test/cluster tests that used test/cqlpy/nodetool.py to switch to test/pylib/nodetool.py. Of course it means the newly async functions need to be "await"ed, not just called, so this patch changes that too. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#30129	2026-06-01 13:03:29 +03:00
Jenkins Promoter	194435279e	Update pgo profiles - aarch64	2026-06-01 05:26:51 +03:00
Piotr Szymaniak	21f1380df1	test/pylib: fix starting server cleanup race test_localnodes_joining_nodes stops a server while manager.server_add() is still waiting for that server to finish startup. Stopping the process can make the background add_server() fail and run its cleanup path first, removing the server from ScyllaCluster.starting. When the stop request later resumes, its own self.starting.pop(server_id) raises KeyError, which the manager returns as HTTP 500. The opposite ordering is possible as well: server_stop() can remove the entry before add_server() reaches its finally block. Make cleanup of ScyllaCluster.starting idempotent in both paths. add_server() remains the normal cleanup path, while server_stop() provides fallback cleanup when it wins the race. Fixes SCYLLADB-2314 Closes scylladb/scylladb#30128	2026-05-31 23:22:44 +03:00
Nadav Har'El	33dce2b7fc	Merge 'cql3: statement_restrictions: continue exploitation of predicate work' from Avi Kivity In `6165124fcc`, we changed analysis of expressions in the WHERE clause to use predicates, an annotated form of an expression that constrains a column when the expression is set to true. Here, we exploit this work to simplify the analysis further, reusing already computed attributes rather than re-analyzing the expression. Not backporting, this is a refactor with no functional change and no bugs fixed. Closes scylladb/scylladb#30049 * github.com:scylladb/scylladb: cql3: statement_restrictions: simplify find_idx to return only the index cql3: statement_restrictions: replace has_only_eq_binops with tracked booleans cql3: statement_restrictions: use index-selection predicates for value_for_index_partition_key cql3: statement_restrictions: replace find_clustering_order with predicate order field cql3: statement_restrictions: replace has_partition_token with variant check cql3: statement_restrictions: replace has_slice with predicate is_slice check cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column cql3: statement_restrictions: replace find_needs_filtering with predicate op check cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool cql3: statement_restrictions: replace find_binop column extraction with predicate on field cql3: statement_restrictions: set op on all binary-operator-derived predicates	2026-05-31 23:22:43 +03:00
Yaniv Michael Kaul	a396a6b664	utils/histogram: remove redundant std::function from meter_timer meter_timer stored the callback in a std::function member (32 bytes) and then passed it to the timer constructor, which stored its own copy as a noncopyable_function (another 32 bytes). This double-storage is wasteful. Remove the std::function member and pass the callback directly to the timer as a noncopyable_function. This saves 32 bytes per meter_timer instance. meter_timer is embedded in timed_rate_moving_average (used 4x in row_cache::stats, 3x in table_stats CAS histograms, etc.), so this saves 128 bytes per row_cache and 96 bytes per table_stats with CAS histograms allocated. Benchmark: neutral (no hot-path change, only reduces struct size). Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: Yes Backport: no, improvement Closes scylladb/scylladb#30145	2026-05-31 19:10:03 +03:00
Benny Halevy	d4d43213f6	cql3/statements/describe_statement: use chunked_vector to prevent oversized allocations Running the 5000 tables scenario using tablets following Scylla warnings appeared: ``` 2026-02-23T23:18:31.903 schema-scale-tablets-5000t-2026-1-db-node-77930459-4 !WARNING \| scylla[5208] [shard 1:sl:d] seastar_memory - oversized allocation: 655360 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at 0x320cf9f 0x320cba0 0x1826a28 0x2fb8f97 0x180340e 0x447855e 0x4461c5a 0x161c3c6 0x161c4b3 0x161e9b7 0x551f43c 0x54df6ca /opt/scylladb/libreloc/libc.so.6+0x72463 /opt/scylladb/libreloc/libc.so.6+0xf55ab seastar::current_backtrace_tasklocal() at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:85 seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136 seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169 seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:865 seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:928 cql3::statements::(anonymous namespace)::tables(data_dictionary::database const&, seastar::lw_shared_ptr<data_dictionary::keyspace_metadata> const&, std::optional<bool>) [clone .resume] at ././seastar/src/core/memory.cc:1727 std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<cql3::description, std::allocator<cql3::description> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/coroutine:247 ``` This patch replaces the use of `std::vector<description>` with `utils::chunked_vector` to prevent the large allocation. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-852 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#30146	2026-05-31 14:50:18 +03:00
Avi Kivity	503add224d	cql3: statement_restrictions: simplify find_idx to return only the index The expression returned as the second element of find_idx()'s pair was stored in view_indexed_table_select_statement::_used_index_restrictions but never read — dead code. Simplify find_idx() to return just the optional<index>, and remove the dead member and constructor parameter from view_indexed_table_select_statement. The now unused _idx_restrictions is also removed.	2026-05-29 17:18:21 +03:00
Avi Kivity	23d6f458ec	cql3: statement_restrictions: replace has_only_eq_binops with tracked booleans Replace has_only_eq_binops() (which uses find_in_expression to search the expression tree for non-EQ binary operators) with precomputed _pk_is_all_eq and _ck_is_all_eq booleans tracked incrementally during predicate construction. Each predicate's equality field is checked as it is processed, covering single-column PK/CK predicates, multi-column CK predicates, and token predicates. This removes the last find_in_expression call in statement_restrictions.cc, and eliminates has_only_eq_binops entirely.	2026-05-29 17:13:40 +03:00
Avi Kivity	9e70771600	cql3: statement_restrictions: use index-selection predicates for value_for_index_partition_key Instead of rebuilding predicates from the expression tree at build_value_for_index_partition_key_fn() time via to_predicate_on_column, capture the indexed column's predicates directly in do_find_idx() when the index is chosen. Store them as _idx_column_predicates and use them to build the value function without any redundant expression analysis. This eliminates build_value_for_fn and its call to to_predicate_on_column from this code path.	2026-05-29 17:13:37 +03:00
Avi Kivity	a402fe1a65	cql3: statement_restrictions: replace find_clustering_order with predicate order field In build_range_from_raw_bounds_fn(), replace find_clustering_order() (which uses find_binop to search for a binary_operator with clustering comparison order) with a direct check of the predicate's order field. Since each multi-column predicate's filter is a single binary_operator, extract it directly with as<binary_operator>() instead of searching.	2026-05-29 16:50:02 +03:00
Avi Kivity	b3c1ee230b	cql3: statement_restrictions: replace has_partition_token with variant check Replace has_token_restrictions()'s call to has_partition_token() (which uses find_binop to search the expression tree for token function calls) with a direct check on the _partition_range_restrictions variant, which already records whether token restrictions exist.	2026-05-29 16:50:02 +03:00
Avi Kivity	e10c124cd3	cql3: statement_restrictions: replace has_slice with predicate is_slice check In the clustering prefix construction loop, replace the has_slice() call (which uses find_binop to search the merged predicate's expression tree for slice operators) with a direct check on the individual predicate vector's is_slice field.	2026-05-29 16:50:02 +03:00
Avi Kivity	4c282f588a	cql3: statement_restrictions: replace contains_multi_column_restriction filter with _has_multi_column In calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index(), the code extracted multi-column boolean factors from _clustering_columns_restrictions. Since multi-column and single-column CK restrictions cannot be mixed (the constructor enforces this), when _has_multi_column is true, ALL factors are multi-column. Simplify to just adding _clustering_columns_restrictions directly when _has_multi_column is set. This removes the last caller of contains_multi_column_restriction(), allowing the function (and its find_binop call) to be removed.	2026-05-29 16:50:02 +03:00
Avi Kivity	dca2cc512e	cql3: statement_restrictions: remove unused find_needs_filtering and has_slice_or_needs_filtering These helper functions were wrappers around find_binop that are no longer called, since their call sites have been replaced by predicate-based checks.	2026-05-29 16:50:02 +03:00
Avi Kivity	eb98aea466	cql3: statement_restrictions: replace has_slice_or_needs_filtering with tracked bool Replace the has_slice_or_needs_filtering() call on _partition_key_restrictions (which uses find_binop to walk the expression tree) with a precomputed _pk_has_slice_or_needs_filtering boolean tracked incrementally during predicate construction in the partition key branch.	2026-05-29 16:50:02 +03:00
Avi Kivity	6e27c3a185	cql3: statement_restrictions: replace contains_multi_column_restriction with _has_multi_column In clustering_key_restrictions_need_filtering(), replace the contains_multi_column_restriction() call (which uses find_binop to search for a tuple_constructor LHS in the expression tree) with the precomputed _has_multi_column boolean that is already tracked incrementally during predicate construction.	2026-05-29 16:50:01 +03:00
Avi Kivity	ae7eb860a5	cql3: statement_restrictions: replace find_needs_filtering with predicate op check In the clustering prefix construction loop, replace the find_needs_filtering() call (which walks the merged predicate's expression tree looking for needs-filtering binary operators) with a check on the individual predicate vector. This uses the per-predicate op field directly instead of searching the expression tree.	2026-05-29 16:50:01 +03:00
Avi Kivity	556262a165	cql3: statement_restrictions: replace find_binop is_on_collection with tracked bool Replace the two find_binop(_clustering_columns_restrictions, is_on_collection) calls with a precomputed _ck_is_on_collection boolean that is tracked incrementally during predicate construction. This avoids walking the expression tree at each call site. The is_on_collection check detects CONTAINS/CONTAINS_KEY operators, which indicate collection restrictions on a clustering key column.	2026-05-29 16:50:01 +03:00
Avi Kivity	569c85032e	cql3: statement_restrictions: replace find_binop column extraction with predicate on field In add_clustering_restrictions_to_idx_ck_prefix(), find_binop was used to locate any binary_operator in the predicate's filter just to extract the column from its LHS. Since the predicate already stores this information in its 'on' field (as on_column for single-column predicates), use it directly instead of searching the expression tree.	2026-05-29 16:50:01 +03:00
Avi Kivity	240d9be5e2	cql3: statement_restrictions: set op on all binary-operator-derived predicates The to_predicates() function had fallthrough paths for operators like LIKE and NOT_IN that created predicates without setting the op field. This meant predicate-based checks like 'p.op && needs_filtering(*p.op)' would miss these operators. Fix by inlining the predicate construction at the fallthrough points (instead of using cannot_solve_on_column) and setting .op = oper.op. This ensures all predicates derived from binary operators carry their operator type, enabling reliable predicate-based analysis. The cannot_solve_on_column helper is now unused and removed.	2026-05-29 16:50:01 +03:00
Taras Veretilnyk	9abf594397	test/cqlpy: add per-table toggle, LWT exemption, and multi-category tests Per-table toggle: disabled-at-create, alter-disable, alter-reenable. LWT exemption: Paxos learn must bypass the guardrail. Multi-category independence: all three guardrails warn/reject independently when SSTable records span partition, row, and collection categories.	2026-05-29 12:51:43 +02:00
Dimitrios Symonidis	4c0a991017	test/cluster: fix proxy resource leak in internode compression test The test_internode_compression_between_datacenters test was flaky due to proxy servers and leased host IPs not being cleaned up on failure paths. If any exception occurred after proxies were started (e.g. during server_start or driver_connect), the asyncio.Server listeners remained bound and leased hosts were never released back to HostRegistry. On subsequent test runs, this caused EADDRINUSE (errno 98) when trying to bind the same address:port. Wrap the proxy/server lifecycle in try/finally to ensure proxies are always stopped and hosts are always released, regardless of whether the test succeeds or fails. Fixes: SCYLLADB-2183 Closes scylladb/scylladb#30127	2026-05-29 13:51:43 +03:00
Taras Veretilnyk	7d365844a3	test/cqlpy: add large collection guardrail tests Tests for collection element-count guardrail: hard-limit rejection, disabled-when-zero, soft-limit log warning, and no-warning below threshold.	2026-05-29 12:51:43 +02:00
Taras Veretilnyk	19a9e45da8	test/cqlpy: add large row guardrail tests Tests for row-size guardrail: hard-limit rejection, disabled-when-zero, soft-limit log warning, and no-warning below threshold.	2026-05-29 12:51:42 +02:00
Taras Veretilnyk	67b659e2bf	test/cqlpy: add large partition guardrail tests Tests for partition size and row-count guardrails: hard-limit rejection, disabled-when-zero, soft-limit log warnings, and no-warning below threshold. Includes shared helpers and log assertion utilities used by subsequent commits.	2026-05-29 12:51:42 +02:00
Taras Veretilnyk	ff84b1dbc4	test/boost: add large_data_guardrail unit tests 8 tests covering the record_compare template comparator, intrusive multiset equal_range grouping with heterogeneous lookup_key, and auto_unlink on record destruction.	2026-05-29 12:51:42 +02:00
Taras Veretilnyk	0201c1530e	test/cluster: add large data guardrails rolling upgrade test Simulated rolling upgrade: start a 2-node cluster where one node suppresses the LARGE_DATA_GUARDRAILS feature, verify that enabling guardrails is rejected, then upgrade the old node and verify that enabling guardrails succeeds.	2026-05-29 12:51:31 +02:00
Anna Stuchlik	e162728537	docs: fix Confluent CLI commands and add LDAP auth link - Fix confluent local stop/start commands to use 'services' subcommand (confluent local services stop/start) in kafka-connector.rst - Add LDAP Authentication link to Additional Resources in authentication.rst Fixes https://github.com/scylladb/scylladb/issues/29121 Fixes https://github.com/scylladb/scylladb/issues/23029 Closes scylladb/scylladb#30116	2026-05-29 13:47:02 +03:00
Pavel Emelyanov	5d0371620d	test/backup: Reduce s3 logging from trace to debug Change s3 log level from TRACE to DEBUG in backup tests. TRACE level generates excessive log volume with too much low-level detail about S3 operations. While it was usefult in the early days of S3 client, nowadays DEBUG level likely provides sufficient diagnostic information for backup test troubleshooting. The reduced log volume significantly improves test performance, which is the main outcome of this change: - Less I/O time writing logs during test execution - Faster teardown: each test scans all server logs for errors, and smaller logs mean faster grep operations (23.3s → 9.97s for 8-node cluster teardown) Impact on test_restore_with_streaming_scopes[topology4] (8 nodes): - Log volume: 49 MB → 23 MB (reduced by half) - Test runtime: 82.55s → 57.53s (30% faster) - Teardown time: 23.3s → 9.97s (57% faster) Tests that start smaller clusters also have notable timing improvements Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30109	2026-05-29 13:46:10 +03:00
Anna Stuchlik	44b4995e01	doc: remove i3en and i4i from AWS instance recommendations Remove i3en and i4i instance types from the Cloud Instance Recommendations page, as i7 and i8 instances are better alternatives. Fixes https://github.com/scylladb/scylladb/issues/29666 Closes scylladb/scylladb#30103	2026-05-29 13:44:38 +03:00
Pavel Emelyanov	24c0ea6b19	sstables_loader: Prevent table destruction during tablet restore download Similar to `e5e6608f20` ("sstables_loader: prevent use-after-free on table drop during streaming") which fixed the same class of race for load_and_stream, the tablet restore path also holds a replica::table& reference across the download_sstable() coroutine without preventing concurrent table destruction. If DROP KEYSPACE is applied while download_sstable() is writing SSTable components to the table's data directory, the directory is removed mid-write causing ENOENT → abort (with --abort-on-internal-error). Fix by acquiring a stream_in_progress() phaser guard after find_column_family() and before download_sstable(). table::stop() calls _pending_streams_phaser.close() which blocks until all outstanding guards are released, keeping the table alive for the duration of the download. Fixes: SCYLLADB-2187 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#30094	2026-05-29 13:43:37 +03:00
Pavel Emelyanov	8b2ff16cae	schema: Move grace_period from schema_ctxt to schema_registry The schema_registry_grace_period field on schema_ctxt was only used by schema_registry itself for eviction timing. Move it to be a direct member of schema_registry, passed at init() time. This removes one db::config dependency from schema_ctxt. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30038	2026-05-29 13:42:23 +03:00
Botond Dénes	1384c9523e	Merge 'Simplify handler injection call sites to use appropriate existing API' from Pavel Emelyanov Several error injection call sites use the verbose handler-lambda API when simpler alternatives already exist in the framework. This series converts them to use the appropriate overloads, reducing boilerplate and making the injection intent immediately obvious from the call site. Cleaning up in-code debugging facilities, no need to backport Closes scylladb/scylladb#29962 * github.com:scylladb/scylladb: error_injection: Convert handler-style breakpoints to wait_for_message sugar error_injection: Convert no-op handler injections to enter()/is_enabled() error_injection: Convert handler-throw injections to lambda-throw style utils: Add share_messages parameter to breakpoint injection API	2026-05-29 13:41:09 +03:00
Botond Dénes	3ae88e31bd	Merge 'test/pylib: stop using random ports for MinIO and JMX' from Piotr Smaron Replace random port selection in MinIO and JMX test helpers with fixed ports on unique per-test loopback IPs, eliminating TOCTOU races. Commits: - kmip_wrapper: default hostname to 127.0.0.1 - nodetool: bind JMX to the per-module loopback IP with fixed port 7199 - minio: use fixed service and console ports on a unique HostRegistry IP instead of probing the ephemeral range; raise on start failure Fixes: SCYLLADB-1817 Minor improvement, no need to backport. Closes scylladb/scylladb#29741 * github.com:scylladb/scylladb: test/pylib: use fixed MinIO ports on unique loopback IPs test/nodetool: bind JMX to per-module loopback IP test/pylib: default KMIP wrapper to loopback	2026-05-29 13:40:24 +03:00
Taras Veretilnyk	23881db289	replica: wire large_data_guardrail into the write path Thread the per-table large_data_guardrail through the write path so that mutations exceeding configured thresholds are rejected before being applied to the memtable. The guardrail is selected in database::do_apply — either the table's own guardrail or a static noop when skip_large_data_guardrails is set. It flows through apply_in_memory → table::apply → memtable::apply, where the check runs after partition_builder deserializes the frozen mutation. For large mutations (>128KB), the check runs after unfreeze_gently instead.	2026-05-29 12:18:33 +02:00
Taras Veretilnyk	5a0974e781	schema: add per-table large_data_guardrails_enabled flag Add a per-table large_data_guardrails_enabled flag controlled via the CQL table property WITH large_data_guardrails_enabled = true\|false. Store the flag as a boolean column in system_schema_ext.scylla_tables. Only write a live cell when enabled; when disabled (the default), omit the cell entirely so that old nodes that don't know this column can still read the SSTable during rolling upgrade or rollback. When the property transitions from true to false via ALTER TABLE, a tombstone is written in make_update_table_mutations to override the previous live cell — this is safe because the CQL feature gate ensures all nodes are upgraded before the property can be set to true. Gate the CQL property behind the LARGE_DATA_GUARDRAILS cluster feature: attempting to set large_data_guardrails_enabled = true before all nodes advertise the feature raises a ConfigurationException.	2026-05-29 12:18:33 +02:00
Botond Dénes	46631692cd	mutation_fragment_stream_validator: use legacy byte order for same-token partition key comparison When two partition keys share the same token, their relative order is determined by their raw serialized bytes (legacy_tri_compare), which matches the physical on-disk order in SSTables. The validator was using partition_key::tri_compare instead — a type-aware comparator that can disagree with byte order for types like timeuuid. The result was a false-positive "out-of-order partition key" error for any two same-token partitions whose timeuuid (or other type-aware) order is the reverse of their byte order. In scrub mode this caused the second partition to be silently dropped. Fixes: SCYLLADB-2304 Closes scylladb/scylladb#30120	2026-05-29 11:54:20 +02:00
Tomasz Grabiec	5ceabcbcc5	Merge 'tablets: fix update_tablet_metadata failures during bootstrap' from Aleksandra Martyniuk When partition_split_builder splits a tablet metadata partition into multiple mutations, the first mutation gets the partition tombstone and/or static row while subsequent mutations contain only clustered rows. The hint logic would correctly clear tokens (marking a full partition read) upon seeing the tombstone in the first mutation, but then re-add tokens when processing the subsequent row-only mutations. This caused update_tablet_metadata to attempt a point update via mutate_tablet_map_async on a tablet map that doesn't exist yet during bootstrap, throwing no_such_tablet_map and failing the snapshot transfer. Fix by adding a full_read flag to table_hint. Once a full partition read is decided (due to partition tombstone, range tombstone, static row, or row deletion), the flag prevents subsequent mutations for the same table from re-adding tokens. Additionally, fall back to a full partition read when the tablet map is missing locally, which happens when the joining node receives tablet metadata for a table it has never seen before. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2303. Needs backports to 2026.1+. 2026.1 introduces the regression with `b17a36c071` Closes scylladb/scylladb#30115 * github.com:scylladb/scylladb: tablets: fall back to full partition read when tablet map is missing tablets: fix hint re-adding tokens after full partition read decision	2026-05-29 11:53:36 +02:00
Raphael S. Carvalho	ea3615de1e	compaction: fail resharding when out of space prevention is activated When out-of-space prevention is activated, the compaction manager is drained and disabled. This caused resharding to silently succeed without actually processing any SSTables, because: 1. run_custom_job() calls start_compaction() which returns nullopt when is_disabled() is true, and run_custom_job() would just return immediately — appearing as a successful no-op. 2. reshard() used throw_if_stopping::no, so even within the compaction task executor, stopping would be silently swallowed rather than propagated as an exception. The SSTable loader interprets a successful return from resharding as "all SSTables processed", so it proceeds without error, leaving the unprocessed SSTables orphaned and their data missing from the table. Fix this with two changes: - run_custom_job(): when start_compaction() returns nullopt, check is_disabled() and throw via make_disabled_exception() rather than returning silently. This ensures callers are always informed when a job was skipped because compaction is disabled (e.g. due to disk space pressure), as opposed to a benign skip (e.g. table removed). - reshard(): change throw_if_stopping::no to throw_if_stopping::yes. Resharding is mandatory for correct SSTable loading — unlike reshape which is optional and can be safely skipped, resharding failure must be propagated to the caller so the loader does not proceed with incomplete data. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-2085. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#30041	2026-05-29 12:48:16 +03:00
Botond Dénes	091e3f5191	Merge 'test.py: reduce resource metrics gathering overhead' from Evgeniy Naydanov Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O. Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise. Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent() that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the loop control to both space out iterations and allow immediate shutdown responsiveness. Fixes SCYLLADB-2141 Closes scylladb/scylladb#29987 * github.com:scylladb/scylladb: test: use non-blocking cpu_percent in SystemResourceMonitor test.py: reduce cgroup overhead in resource metrics gathering	2026-05-29 10:52:17 +03:00
Yaniv Michael Kaul	f90b066405	cql3: lazily allocate _idx_opt behind unique_ptr Motivation: The secondary_index::index object stored in statement_restrictions is approximately 128 bytes (containing index_metadata with its sstring name, UUID id, and unordered_map options, plus a target_column sstring). This field is only populated for queries that use secondary indexing, yet every prepared statement's restrictions object pays the full inline cost. Replace std::optional<secondary_index::index> with std::unique_ptr<secondary_index::index>. This reduces the inline size from 136 bytes to 8 bytes, saving 128 bytes per non-index-using prepared statement cached in the prepared statement cache. The semantics are preserved: null unique_ptr is equivalent to std::nullopt, and the dereference patterns (-> and *) work identically. The find_idx() method that returns a copy constructs an optional from the dereferenced pointer when non-null. Tests: - statement_restrictions_test builds and passes - Full release build compiles cleanly Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> AI-assisted: Yes Backport: no, improvement Closes scylladb/scylladb#30046	2026-05-28 21:35:25 +03:00
Taras Veretilnyk	f7ffc64703	db: implement large_data_guardrail Checks partition size, row count, row size, and collection element count against config thresholds using large_data_record_index lookups. Warns on soft limit, throws large_data_exception on hard limit.	2026-05-28 18:29:32 +02:00
Aleksandra Martyniuk	491db28fbf	tablets: fall back to full partition read when tablet map is missing When update_tablet_metadata receives a hint with non-empty tokens for a table whose tablet map doesn't exist locally yet, it would call mutate_tablet_map_async which throws no_such_tablet_map. This happens during bootstrap when the joining node receives tablet metadata for a table it has never seen before. Fix by checking has_tablet_map() before attempting the point update. If the map is missing, fall back to do_update_tablet_metadata_partition which reads the full partition from system.tablets and creates the map.	2026-05-28 16:23:45 +02:00

1 2 3 4 5 ...

54228 Commits