scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-22 07:42:16 +00:00

Author	SHA1	Message	Date
Anna Stuchlik	1f7d20f701	doc: label Migration from Vnodes to Tablets as experimental The procedure to migrate a vnodes-based keyspace to tablets-based keyspace has been labeled as experimental. Fixes SCYLLADB-1932 Closes scylladb/scylladb#29834	2026-05-11 17:07:39 +03:00
Yaniv Michael Kaul	377bbeb076	docs: fix invalid UUID characters in examples Replace UUIDs containing non-hexadecimal characters (like 'g', 'n', 'y') with valid UUIDs in documentation examples. Fixes #26797 Closes scylladb/scylladb#29674	2026-05-11 17:05:30 +03:00
Calle Wilund	2cc1a2c406	storage_service: Disable snapshots after raft decommission Fixes: SCYLLADB-1693 In case we abort a decommission operation, the snapshot/backup mechanism need to remain open. This change moves it to after raft_decommission. In the case of a cluster snapshot, our nodes ownership or not of tables will be serialized by raft anyway, so should remain consistent. In that case we at worst coordinate from a node in "leave" status In the case of a local snapshot, ownership matters less, only sstables on disk, which should not change. In the case of backup, this operates on a snapshot, state of which is not affected. Adds an injection point for testing. v2: - Added injection point to ensure test can abort decommission Closes scylladb/scylladb#29667	2026-05-11 17:04:09 +03:00
Anna Stuchlik	4c01556f79	doc: mark Vector Search in Alternator as Cloud-only This commit adds the information missing from the Alternator docs that Vector Search is only available in ScyllaDB Cloud. Fixes https://github.com/scylladb/scylladb/issues/29661 Closes scylladb/scylladb#29664	2026-05-11 17:03:20 +03:00
Avi Kivity	f5ffbd3c3e	cql3: restrictions: reindent statement_restrictions.cc `6165124fcc` has left statement_restrictions.cc scarred and deformed. Restore it to standard 4-space indentation. This patch contains only whitespace changes. Closes scylladb/scylladb#29598	2026-05-11 17:02:14 +03:00
Yaniv Michael Kaul	3cba27d25f	topology: propagate error messages through raft_topology_cmd_result When a topology command (e.g., rebuild) fails on a target node, the exception message was being swallowed at multiple levels: 1. raft_topology_cmd_handler caught exceptions and returned a bare fail status with no error details. 2. exec_direct_command_helper saw the fail status and threw a generic "failed status returned from {id}" message. 3. The rebuilding handler caught that and stored a hardcoded "streaming failed" message. This meant users only saw "rebuild failed: streaming failed" instead of the actionable error from the safety check (e.g., "it is unsafe to use source_dc=dc2 to rebuild keyspace=..."). Fix by: - Adding an error_message field to raft_topology_cmd_result (with [[version 2026.2]] for wire compatibility). - Populating error_message with the exception text in the handler's catch blocks. - Including error_message in the exception thrown by exec_direct_command_helper. - Passing the actual error through to rtbuilder.done() instead of the hardcoded "streaming failed". A follow-up test is in https://github.com/scylladb/scylladb/pull/29363 Fixes: SCYLLADB-1404 Closes scylladb/scylladb#29362	2026-05-11 17:01:15 +03:00
Yaniv Michael Kaul	cf9cde664c	.github/workflows/call_sync_milestone_to_jira.yml: add missing workflow permissions Add explicit empty permissions block (permissions: {}) since this workflow only syncs milestones to Jira using its own secrets and needs no GITHUB_TOKEN permissions. Fixes code scanning alert #171. Closes scylladb/scylladb#29184	2026-05-11 17:00:10 +03:00
Raphael S. Carvalho	20fe1e6f68	replica: Improve diagnostics when tablet split fails due to non-empty split-unready groups When finalizing a tablet split, all data must have been moved into split-ready compaction groups before the storage groups can be remapped to the new tablet count. If split-unready groups still hold data at that point, handle_tablet_split_completion() calls on_internal_error(), which previously only reported the tablet and table IDs — giving no insight into why the split-unready groups were not empty. Add fmt::formatter specializations for compaction_group and storage_group so the full state of the offending storage_group is included in the error message. The storage_group formatter emits: main=<cg>, merging=[<cg>...], split_ready=[<cg>...] Each compaction_group formatter emits: [sstables=[<sstable_desc>...], memtable_empty=<bool>, sstable_add_gate=<count>] where sstable_desc includes filename, origin, identifier and originating host, memtable_empty reflects whether all memtables have been flushed, and sstable_add_gate count reveals whether an in-flight sstable add is holding data in the group. Supporting changes: - compaction_group: add memtable_empty() const noexcept (delegates to memtable_list::empty()) and a const overload of sstable_add_gate() so both are accessible from a const compaction_group reference inside the formatter. - Promote sstable_desc from a local lambda in compaction_group_for_sstable to a static free function so it is reusable by the formatter. Refs https://scylladb.atlassian.net/browse/SCYLLADB-1019. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29178	2026-05-11 16:59:05 +03:00
Yaniv Michael Kaul	3674deea54	scylla-gdb: display ms-format sstable summary from partitions db footer For ms-format (trie-based) sstables, the traditional summary structure is not populated. Instead, read equivalent metadata from the _partitions_db_footer field: first_key, last_key, partition_count, and trie_root_position. This is a follow-up to the crash fix for SCYLLADB-1180, replacing the informational-only message with actual useful output. Refs: SCYLLADB-1180 Closes scylladb/scylladb#29164	2026-05-11 16:58:22 +03:00
Calle Wilund	db1b92c185	service::load_balancer: Add metrics for repair and rebuild count Fixes #21115 Adds cluster counter for repairs, and dc counter for rebuilds Closes scylladb/scylladb#28985	2026-05-11 16:57:46 +03:00
Piotr Smaron	71542206bc	cql: return InvalidRequest for oversized partition/clustering keys When a partition key or clustering key value exceeds the 64 KiB limit (65535 bytes serialized), Scylla used to raise a generic std::runtime_error "Key size too large: N > M" from the low-level compound-key serializer. That error surfaced to clients as a CQL server error (code 0x0000, "NoHostAvailable"-looking), which is both ugly and incompatible with Cassandra - Cassandra returns a clean InvalidRequest with the message "Key length of N is longer than maximum of M". Fix this at the single chokepoint: compound_type::serialize_value in keys/compound.hh. The serializer is on every path that materializes a key - INSERT/UPDATE/DELETE/BATCH build mutations through it, and SELECT builds partition and clustering ranges through it - so a single throw replacement produces a clean InvalidRequest consistently across all paths and all key shapes (single, compound PK, composite CK). The previous approach on this PR branch patched three call sites in cql3/restrictions/statement_restrictions.cc, which only covered SELECT, duplicated the check, and placed it mid-restrictions code (flagged in review). Dropping those changes in favour of the root-cause fix here. Un-xfail the tests this fixes: - test/cqlpy/test_key_length.py: test_insert_65k_pk, test_insert_65k_ck, test_where_65k_pk, test_where_65k_ck, test_insert_65k_ck_composite, test_insert_total_compound_pk_err, test_insert_total_composite_ck_err. - test/cqlpy/cassandra_tests/.../insert_test.py: testPKInsertWithValueOver64K, testCKInsertWithValueOver64K. - test/cqlpy/cassandra_tests/.../select_test.py: testPKQueryWithValueOver64K. test_insert_65k_pk_compound stays xfail: its oversized value gets rejected by the Python driver's CQL wire-protocol encoder (see CASSANDRA-19270) before reaching the server, so the fix can't apply. Updated its reason. testCKQueryWithValueOver64K stays xfail with an updated reason: Cassandra silently returns empty for an oversized clustering key in WHERE, while Scylla now throws InvalidRequest - a deliberate choice mirroring the partition-key case, documented in the discussion on #10366. Add three tight-boundary tests (addressing review feedback on the previous revision) that pin MAX+1 behaviour for SELECT and INSERT of both partition and clustering keys. Update test/cluster/dtest/limits_test.py to match the new message ("Key length of \\d+ is longer than maximum of 65535"). fixes #10366 fixes #12247 Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com> Closes scylladb/scylladb#23433	2026-05-11 16:56:35 +03:00
Piotr Smaron	959f67b345	cql: verify tuples length in multi-column IN restriction When a multi-column IN restriction contains tuples with a different number of elements than the number of restricted columns (e.g. `(b, c, d) IN ((1, 2), (2, 1, 4))`), Scylla would either produce an inconsistent error message or, for over-sized tuples, an internal type-mismatch error referencing the list literal representation. Validate each tuple's arity against the number of restricted columns while building the IN restriction and raise a clear "Expected N elements in value tuple, but got M" error in both the under- and over-sized cases. Fixes #13241 Co-authored-by: Alexander Turetskiy <someone.tur@gmail.com> Closes scylladb/scylladb#18407	2026-05-11 16:55:09 +03:00
Anna Stuchlik	a7b7019f90	doc: update the node size limit This commit increases the node size limit from 256 to 4096 CPUs based on `be1f566488` Fixes SCYLLADB-1676 Closes scylladb/scylladb#29602	2026-05-11 16:38:53 +03:00
Nadav Har'El	f1b2b9bd52	Merge 'Register `fulltext_index` custom index type' from Dawid Pawlik This PR adds the `fulltext_index` custom index class, laying the groundwork for full-text search in ScyllaDB. It focuses on the CQL-facing layer - schema validation, option parsing, and metadata - without implementing the search backend itself. Users can now write: ```cql CREATE CUSTOM INDEX ON t(content) USING 'fulltext_index' WITH OPTIONS = {'analyzer': 'english', 'positions': 'false'}; ``` The implementation follows the same custom index pattern established by vector search: a `custom_index` subclass registered in the factory map, with no backing materialized view. This keeps the door open for a CDC-based indexing pipeline similar to the one vector search uses. As part of this work, the option validation helpers (`validate_enumerated_option`, `validate_positive_option`, `validate_factor_option`) were extracted from `vector_index.cc` into a shared header so both index types can reuse them. The `custom_index` base class also gained a virtual `index_type_name()` method, giving each subclass a self-describing name for error messages without hardcoding strings in shared code. The PR is split into three commits: 1. Extract shared validation utilities and add `index_type_name()` to `custom_index` 2. Implement `fulltext_index` with column type and option validation 3. Integration tests covering creation, validation, describe, and metadata Fixes: SCYLLADB-1517 Fixes: SCYLLADB-1510 References: SCYLLADB-1516 Closes scylladb/scylladb#29658 * github.com:scylladb/scylladb: test/cqlpy: add integration tests for `fulltext_index` index: unify custom index description index: add `fulltext_index` custom index implementation index: extract option validation helpers	2026-05-11 16:16:58 +03:00
Nadav Har'El	fcfad51284	Merge 'cql3/selection: require EXECUTE on UDA REDUCEFUNC at SELECT time' from Marcin Maliszkiewicz selection::used_functions() pushed the UDA, its SFUNC and its FINALFUNC, but never the REDUCEFUNC. The reducefunc is invoked by the distributed aggregation path in service::mapreduce_service, so a user could cause it to run server-side without holding EXECUTE on it as long as the query took the mapreduce path. Also push agg.state_reduction_function so select_statement::check_access requires EXECUTE on it too. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1756 Backport: no, it's a minor fix and UDFs are experimental feature in Scylla Closes scylladb/scylladb#29717 * github.com:scylladb/scylladb: test/cqlpy: add test for EXECUTE permission on UDA sub-functions cql3/selection: require EXECUTE on UDA REDUCEFUNC at SELECT time	2026-05-11 16:14:38 +03:00
Gleb Natapov	c3d2f0bde9	raft_group0: remove finish_setup_after_join function The only thing it does not change a bootstrapping node to become a voter in case the cluster does not support limited voters feature. But the feature was introduced in 2025.2 and direct upgrade from 2025.1 to version newer than 2026.1 is not supported. But even if such upgrade is done the removed code has affect only during bootstrap, not during regular boot. Also remove the upgrade test since after the patch suppressing the feature on the first boot will no longer behave correctly.	2026-05-11 15:38:36 +03:00
Botond Dénes	cf37f541a0	Merge ' sstables_loader: ensure upload directory is empty when load_and_stream returns' from Taras Veretilnyk After `load_and_stream` (e.g. via `nodetool refresh --load-and-stream`) returns success, source sstable files in the `upload/` directory may still be on disk. `mark_for_deletion()` only sets an in-memory flag; the actual file deletion runs lazily when the last `shared_sstable` reference drops. This leaves a window between API success and physical deletion where a follow-up scan of the upload directory can detected sstables that will be deleted soon. This might cause failure because SSTable will be already wiped during processing. For fix: Force unlink to complete before `stream()` returns, so the upload directory is in a consistent state by the time the API reports success. For tablet streaming, partially-contained sstables participate in multiple per-tablet batches; eagerly unlinking after each batch would break the next batch that still needs to read the file. A `defer_unlinking` flag on the streamer postpones the explicit unlink until after all batches complete (called once at the end of `tablet_sstable_streamer::stream()`). Vnode streaming unlink eagerly at the end of `stream_sstable_mutations`. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1647 Backport is required, as it is a bug fix that was introduced in `517a4dc4df`. Closes scylladb/scylladb#29599 * github.com:scylladb/scylladb: sstables_loader: synchronously unlink streamed sstables before returning sstables: make sstable::unlink() idempotent	2026-05-11 14:43:46 +03:00
Asias He	0204372156	repair: Reject repair requests where start and end tokens are equal When a user calls the repair API with identical startToken and endToken values, the code creates a wrapping interval (T, T]. This causes unwrap() to split it into (-inf, T] and (T, +inf), covering the entire token ring and triggering a full repair. Reject such requests early with an error message matching Cassandra's behavior: "Start and end tokens must be different." Fixes: https://scylladb.atlassian.net/browse/CUSTOMER-358 Closes scylladb/scylladb#29821	2026-05-11 14:08:20 +03:00
Botond Dénes	ad7ac62835	Merge ' Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key' from Dimitrios Symonidis Add a node_owner column (locator::host_id) to system.sstables and make it part of the partition key, so the primary key becomesv PRIMARY KEY ((table_id, node_owner), generation). This is the first step toward moving the sstables registry into system_distributed: once distributed, each node's startup scan must read only the rows it owns, which requires the owning node to be part of the partition key. Partitioning by (table_id, node_owner) turns that scan into a single-partition read of exactly the local node's rows. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1562 No need to backport this, keyspace over object storage is experimental feature Closes scylladb/scylladb#29659 * github.com:scylladb/scylladb: db, sstables: add node_owner to sstables registry primary key db, sstables: rename sstables registry column owner to table_id	2026-05-11 14:08:19 +03:00
Botond Dénes	2edfb91070	sstables: migrate all bufsize_mismatch_exception throw sites to throw_bufsize_mismatch_exception() Replace the two remaining direct 'throw bufsize_mismatch_exception(...)' call sites with the new throw_bufsize_mismatch_exception() helper, which routes through throw_malformed_sstable_exception() and thus also respects the --abort-on-malformed-sstable-error flag. Affected files: - sstables/sstables.cc (1 site, in check_buf_size()) - sstables/m_format_read_helpers.cc (1 site, in check_buf_size())	2026-05-11 11:58:14 +03:00
Botond Dénes	d65c1523c2	sstables: migrate all malformed_sstable_exception throw sites to throw_malformed_sstable_exception() Replace all direct 'throw malformed_sstable_exception(...)' call sites with the new throw_malformed_sstable_exception() helper, which respects the --abort-on-malformed-sstable-error flag.	2026-05-11 11:58:14 +03:00
Botond Dénes	84c27658d9	sstables: make on_parse_error() and on_bti_parse_error() respect --abort-on-malformed-sstable-error Both functions now check abort_on_malformed_sstable_error() first. If set, they log the error and call std::abort() directly, generating a coredump. Otherwise they fall through to the existing on_internal_error() path, which is in turn controlled by --abort-on-internal-error.	2026-05-11 11:58:14 +03:00
Botond Dénes	4ebcc002d6	sstables: disable abort-on-malformed-sstable-error in tests that corrupt sstables on purpose Add scoped_no_abort_on_malformed_sstable_error RAII guard (modeled after seastar::testing::scoped_no_abort_on_internal_error) and use it in all tests that intentionally corrupt sstables and expect malformed_sstable_exception to be thrown rather than the process aborting.	2026-05-11 11:58:14 +03:00
Botond Dénes	f6dc2cb5f8	sstables: introduce --abort-on-malformed-sstable-error infrastructure Add the --abort-on-malformed-sstable-error command-line option and the supporting infrastructure. When set, any malformed sstable error will abort the process and generate a coredump instead of throwing an exception. This is useful for debugging memory corruption that may manifest as apparent sstable corruption. The implementation introduces: - throw_malformed_sstable_exception() and throw_bufsize_mismatch_exception() helper functions in sstables/sstables.cc, which check the new flag and either abort (with logging) or throw the appropriate exception. - set_abort_on_malformed_sstable_error() / abort_on_malformed_sstable_error() to control the per-process atomic flag. - abort_on_malformed_sstable_error config option (LiveUpdate, default false) wired up in main.cc alongside abort_on_internal_error. Call-site migration will follow in subsequent commits.	2026-05-11 11:58:14 +03:00
Botond Dénes	c3daa6379c	sstables: refactor parse_path() to return std::expected<> instead of throwing make_entry_descriptor() and the two overloads of parse_path() used to signal parse failures by throwing malformed_sstable_exception, which made parse_path() expensive to use as a probe (e.g. to classify directory entries). Change make_entry_descriptor() and both parse_path() overloads to return std::expected<T, sstring>, where the sstring carries the error message on failure, eliminating the exception overhead at probe call sites. Call sites that previously caught malformed_sstable_exception to treat the path as a non-SSTable file (utils/directories.cc, db/snapshot/backup_task.cc, tools/scylla-sstable.cc) now check the expected result directly. Call sites where a parse failure is a genuine error (sstable_directory.cc, sstables.cc, tools/schema_loader.cc, tools/scylla-sstable.cc) re-throw explicitly as malformed_sstable_exception using the error string, preserving the existing error propagation behaviour.	2026-05-11 11:58:14 +03:00
Gleb Natapov	5213aee99f	raft_group0: fix indentation after the last change	2026-05-11 11:56:26 +03:00
Gleb Natapov	5f7f72fa50	raft_group: drop unneeded checks	2026-05-11 11:55:39 +03:00
Marcin Maliszkiewicz	fa9d15d31a	test/cqlpy: add test for EXECUTE permission on UDA sub-functions Verify that SELECT of a UDA requires EXECUTE on its SFUNC, FINALFUNC, and REDUCEFUNC individually. If any one permission is missing, the query must be rejected at planning time (even on an empty table). The test is parameterized over the three sub-functions and uses Lua on Scylla or Java on Cassandra, so it runs on both backends. The REDUCEFUNC case is skipped on Cassandra since REDUCEFUNC is a Scylla extension. Refs SCYLLADB-1756	2026-05-11 10:23:39 +02:00
copilot-swe-agent[bot]	9e7d67612c	docs: fix typo in materialized views docs - "columns are" instead of "is" The MV Select Statement description was missing the word "columns" and used incorrect verb agreement, making the sentence grammatically broken and ambiguous. docs/cql/mv.rst: "which of the base table is included" → "which of the base table columns are included" Fixes #29662 Closes #29663 Co-authored-by: annastuchlik <37244380+annastuchlik@users.noreply.github.com>	2026-05-11 11:15:25 +03:00
Botond Dénes	eae15f4fdd	Merge 'Share timeout_config between services' from Pavel Emelyanov The timeout_config (more exactly -- updatable_timeout_config) is used by alternator/controller and transport/controller. Both create a local copy of that opbject by constructing one out of db::config. Also some options from this config are needed by storage_proxy, but since it doesn't have access to any timeout_config-s, it just uses db::config by getting it from the database. This PR introduces top-level sharded<updateable_timeout_config>, initializes it from db::config values and makes existing users plus storage_proxy us it where required. Motivation -- remove more replica::database::get_config() users. A side effect -- timeout_config is not duplicated by transport and alternator controllers. Components' dependencies cleanup, not backporting. Closes scylladb/scylladb#29636 * github.com:scylladb/scylladb: storage_proxy: Use shared updateable_timeout_config for CAS contention timeout alternator: Use shared updateable_timeout_config by reference cql_transport: Use shared updateable_timeout_config by reference storage_proxy: Use shared updateable_timeout_config by reference main: Introduce sharded<updateable_timeout_config> storage_proxy: Keep own updateable_timeout_config	2026-05-11 11:12:01 +03:00
Botond Dénes	9b2dfab2e5	Merge 'Don't use database.get_config() to fetch calculate_view_update_throttling_delay option' from Pavel Emelyanov This option is used in two places -- proxy and view-update-generator both need it to calculate the calculate_view_update_throttling_delay() value. This PR moves the option onto view_update_backlog top-level service, makes the calculating helper be method of that class and patches the callers to use it. This eliminates more places that abuse database as db::config accessor. Code dependencies refactoring, not backporting Closes scylladb/scylladb#29635 * github.com:scylladb/scylladb: view: Turn calculate_view_update_throttling_delay into node_update_backlog member view: Place view_flow_control_delay_limit_in_ms on node_update_backlog view: Add node_update_backlog reference to view_update_generator	2026-05-11 10:30:24 +03:00
Pavel Emelyanov	f39cbb1ec6	storage_proxy: Move maintenance_mode onto storage_proxy::config Stop reading maintenance_mode through replica::database's db::config. Add a properly typed maintenance_mode_enabled field to storage_proxy::config, populate it in main.cc from cfg->maintenance_mode() (same as messaging_service::config), and use a cached member in storage_proxy instead of db.local().get_config().maintenance_mode(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29637	2026-05-11 10:11:20 +03:00
Yaniv Michael Kaul	631f1e1654	compaction: set_skip_when_empty() for validation_errors metric Add .set_skip_when_empty() to compaction_manager::validation_errors. This metric only increments when scrubbing encounters out-of-order or invalid mutation fragments in SSTables, indicating data corruption. It is almost always zero and creates unnecessary reporting overhead. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29349	2026-05-11 09:12:40 +03:00
Yaniv Michael Kaul	b8a150e22c	build: add -ftime-trace support for compilation profiling Add a --time-trace flag to configure.py and a Scylla_TIME_TRACE CMake option that enable Clang's -ftime-trace on all C++ compilations. When enabled, each .o file produces a companion .json trace that can be analyzed with ClangBuildAnalyzer or loaded in chrome://tracing to identify slow headers and costly template instantiations. This is the first step toward data-driven build speed improvements. Refs #1 Usage: configure.py: ./configure.py --time-trace --mode dev CMake: cmake -DScylla_TIME_TRACE=ON -DCMAKE_BUILD_TYPE=Dev .. Closes scylladb/scylladb#29462	2026-05-11 08:55:33 +03:00
Dmitry Kropachev	85d0011b3c	gitignore: add missing rust build artifacts rust/**/target and Cargo.lock files under rust/inc/ and rust/wasmtime_bindings/ were not ignored, nor was test/resource/wasm/rust/target/. Closes scylladb/scylladb#28943	2026-05-11 07:06:26 +03:00
Botond Dénes	3f72852d8c	Merge 'Fix missing format string placeholders across the codebase (33 bugs across 14 modules )' from Yaniv Kaul Fix 28 format string bugs plus 5 related format argument bugs across 14 modules where `{}` placeholders were missing or arguments were wrong, causing arguments to be silently dropped or misleading output from the `{fmt}` library. Inspired by https://github.com/scylladb/scylladb/pull/29143 (which fixed a single instance in `replica/table.cc`), a comprehensive audit of the entire codebase was performed to find all similar issues. - Missing `{}` placeholder (21 instances): format string simply lacks `{}` for a passed argument, e.g. `format("msg for table {}", group_id, table_id)` -- `group_id` is silently dropped - Spurious comma breaking C++ string literal concatenation (2 instances): a comma after a string literal prevents adjacent-literal concatenation, turning the continuation into a format argument instead of part of the format string - Printf-style `%s` in fmtlib context (4 instances): `%s` has no meaning in fmtlib and appears as literal text while the argument is silently ignored - Extra spurious argument (1 instance): an extraneous `t.tomb()` argument inserted between correct arguments, causing wrong values in the wrong slots - Wrong variable in error message (4 instances in `types/map.hh`): error messages for oversized map keys/values reported `map_size` (total entry count) instead of the actual `elem.first.size()` or `elem.second.size()` that exceeded the limit - Swapped argument order (1 instance in `data_dictionary/data_dictionary.cc`): format string says `"Extraneous options for {type}: {values}"` but the values and type arguments were passed in reverse order \| Module \| Bugs Fixed \| Files \| \|--------\|:---------:\|-------\| \| `replica/` \| 1 \| `table.cc` \| \| `service/` \| 4 \| `raft_group0.cc`, `storage_service.cc` \| \| `db/` \| 6 \| `heat_load_balance.cc`, `commitlog_replayer.cc`, `view_update_generator.cc`, `view_building_worker.cc`, `row_locking.cc` \| \| `cql3/` \| 2 \| `prepare_expr.cc`, `statement_restrictions.cc` \| \| `transport/` \| 4 \| `event_notifier.cc` \| \| `sstables/` \| 3 \| `partition_reversing_data_source.cc`, `reader.cc` \| \| `alternator/` \| 1 \| `conditions.cc` \| \| `cdc/` \| 1 \| `split.cc` \| \| `raft/` \| 1 \| `server.cc` \| \| `utils/` \| 2 \| `gcp/object_storage.cc`, `s3/client.cc` \| \| `mutation/` \| 1 \| `mutation_partition.hh` \| \| `ent/` \| 2 \| `kmip_host.cc`, `kms_host.cc` \| \| `types/` \| 4 \| `map.hh` \| \| `data_dictionary/` \| 1 \| `data_dictionary.cc` \| The `{fmt}` library's compile-time checker validates that each `{}` placeholder references a valid argument, but does not verify the reverse -- that every argument has a corresponding placeholder. Extra arguments are silently ignored at both compile time and runtime. Build verified with `dbuild ninja build/dev/scylla` -- compiles cleanly. --- Note: Commits were amended to fix the author name from "Yaniv Michael Kaul" to "Yaniv Kaul". Closes scylladb/scylladb#29448 * github.com:scylladb/scylladb: data_dictionary: fix swapped arguments in extraneous options error types: fix wrong variable in map key/value size error messages ent: fix missing format placeholders in encryption error/log messages mutation: fix spurious argument in shadowable_tombstone formatter utils: fix missing format placeholders in object storage log messages raft: fix missing format placeholder in server ostream operator cdc: fix missing format placeholder in error message alternator: fix missing format placeholder in error message sstables: fix missing format placeholders in error messages transport: fix printf-style format specifiers in fmtlib log calls cql3: fix missing format placeholders in error messages db: fix missing format placeholders in log and error messages service: fix missing format placeholders in log messages replica: fix missing format placeholder in cleanup log message	2026-05-11 07:04:42 +03:00
Yaron Kaikov	5694c93c12	build: add collect-dist target to organize build artifacts Build artifacts are currently scattered across build/dist/$mode/redhat/, tools/python3/build/, tools/cqlsh/build/, etc. with unpredictable names. Add a new 'collect-dist' ninja target that gathers all distributable artifacts into a well-known structure: build/$mode/dist/rpm/ -- all binary RPMs (no SRPMs) build/$mode/dist/deb/ -- all .deb packages build/$mode/dist/tar/ -- relocatable tarballs (already here) The collection is done via a reusable 'collect_pkgs' ninja rule defined directly in configure.py, which knows all the source paths. No external script is needed. Fixes: SCYLLADB-75 Closes scylladb/scylladb#29475	2026-05-11 06:54:29 +03:00
Michael Litvak	274024a76b	configure.py: update compile_commands.json if stale configure.py creates compile_commands.json in the root directory as a symbolic link to the file in one of the build directories. If the file already exists it does nothing. However it may happen that the file exists but the target file does not exist. For example, if the build directory is removed and then building with a different mode. Then the file will remain as a stale symbolic link. To address this, when the file exists check also if it's a valid symbolic link. If not, then recreate it with a valid target. Closes scylladb/scylladb#29680	2026-05-10 22:17:16 +03:00
Piotr Szymaniak	459c1dc32f	test/alternator: stop avoiding tablets in Streams tests Alternator Streams now supports tablets, so stop skipping the TTL Streams test in tablet mode and stop forcing vnodes in the Streams audit test. Refs SCYLLADB-463 Closes scylladb/scylladb#29697	2026-05-10 22:13:15 +03:00
Nadav Har'El	df8c9b17b8	Merge 'alternator: Graduate Alternator Streams from experimental' from Piotr Szymaniak As a final step for https://scylladb.atlassian.net/browse/SCYLLADB-461 we need to graduate Alternator Streams from experimental. So let's remove `--experimental-features=alternator-streams` and map the obsolete config string to `UNUSED` for backward compatibility. Also, remove the related gating of the feature. Finally, stop providing the config flag in test configs. Fixes SCYLLADB-1680 Fixes #16367 To documentation tracked by https://scylladb.atlassian.net/browse/SCYLLADB-462 still remains. This PR needs to hit 2026.2, so (only) if it branches before the PR is merged to `master`, we'd need to backport. Closes scylladb/scylladb#29604 * github.com:scylladb/scylladb: test: Stop providing alternator-streams experimental flag alternator: Graduate Alternator Streams from experimental	2026-05-10 22:10:03 +03:00
Nadav Har'El	34136d3bc2	Merge 'vector_search: test: migrate CQL tests for vector search from C++/Boost to pytest' from Karol Nowacki Migrate vector search (ANN ordered select query) CQL tests from C++/Boost suite to pytest. This migration includes: - New pytest tests in `test/cqlpy/test_vector_search_with_vector_store_mock.py` - VectorStoreMock server as pytest fixture to simulate vector store responses The benefits of this migration are: - Extended test coverage to verify CQL protocol serialization and driver - Reduced overall test time (no compilation required for pytest) Fixes SCYLLADB-695 No backport needed as this is a refactoring. Closes scylladb/scylladb#29593 * github.com:scylladb/scylladb: vector_search: test: migrate paging warnings tests to Python vector_search: test: migrate local_vector_index to Python vector_search: test: migrate vector_index_with_additional_filtering_column to Python vector_search: test: migrate cql_error_contains_http_error_description to Python vector_search: test: migrate pk in restriction test to Python	2026-05-10 22:09:17 +03:00
Nadav Har'El	d4aa528834	Merge 'load_balancer: fix tablet allocator dropped table' from Ferenc Szili - Handle dropped tables gracefully in the tablet load balancer's `get_schema_and_rs()` instead of aborting with `on_internal_error` - The load balancer operates on a token metadata snapshot but accesses the live schema for table lookups. A DROP TABLE applied by another fiber between coroutine yield points can remove a table from the live schema while it still exists in the snapshot, causing an abort. `get_schema_and_rs()` now returns `std::optional` and logs a warning in debug log level instead of aborting when a table is missing. All callers skip dropped tables: - `make_sizing_plan`: skips to next table - `make_resize_plan`: skips to next table (merge suppression is moot) - `check_constraints`: returns `skip_info{}` with empty viable targets - `get_rs`: returns `nullptr`, checked by `check_constraints` The call chain is: `make_plan` → `make_internode_plan` → `check_constraints` → `get_rs` → `get_schema_and_rs`. The `make_internode_plan` coroutine has multiple `co_await` yield points (`maybe_yield`, `pick_candidate`) between building the candidate tablet list and checking replication constraints. A DROP TABLE schema mutation applied during any of these yields removes the table from `_db.get_tables_metadata()` while the candidate list still references it. Added `test_load_balancing_with_dropped_table` which simulates the race by capturing a token metadata snapshot, dropping the table, then calling `balance_tablets` with the stale snapshot. Fixes: SCYLLADB-1664 This fix needs to be backported to versions: 2025.4, 2026.1 Closes scylladb/scylladb#29585 * github.com:scylladb/scylladb: test: verify load balancer handles dropped tables gracefully tablet_allocator: handle dropped tables gracefully in get_schema_and_rs	2026-05-10 22:07:51 +03:00
Nadav Har'El	63927e07ea	Merge 'alternator/streams: keep disabled streams usable and purge on re-enable' from Piotr Szymaniak When an Alternator stream is disabled, the data should continue to be accessible so that consumers can finish reading. When the stream is later re-enabled, a new StreamArn is produced and only then the old data is purged. On disable, the existing CDC options (including preimage and postimage) are preserved so that DescribeStream can still report StreamViewType. All stream APIs continue to work on the disabled stream, with all shards reported as closed (EndingSequenceNumber set). No new CDC records are written; existing data expires via TTL after 24 hours. On re-enable, the old CDC log table is dropped as a separate Raft group0 schema change and a fresh one is created with a new UUID, giving a new StreamArn. This is Alternator-specific — CQL CDC keeps reusing the log table. Re-enabling is the only way to immediately purge old stream data. Old stream data is removed immediately upon re-enable (a discrepancy with DynamoDB, which keeps it readable for 24 hours through the old StreamArn). Tests updated to cover the new disable and re-enable behavior. Fixes #7239 Fixes SCYLLADB-523 Closes scylladb/scylladb#29413 * github.com:scylladb/scylladb: alternator/streams: remove dead next_iter in get_records test/alternator: fix stream wait timeouts to use wall-clock time docs/alternator: document stream disable/re-enable behavior alternator/streams: keep disabled streams usable and purge on re-enable	2026-05-10 22:04:35 +03:00
Nadav Har'El	e277f747bd	Merge 'Make collection unfreezing more efficient' from Botond Dénes Introduce `read_from_collection_cell_view()` which reads a `collection_mutation` directly from the IDL representation of a collection (`ser::collection_cell_view`). This cuts down the number of allocations required drastically compared to the current method of: IDL -> collection_mutatio_description -> collection_mutation Reduces the number of allocations to unfreeze a collection from O(collection_cell_count) -> O(1) (actually, due to buffer fragmentation, it is O(collection_size)). The new method is used when unfreezing frozen mutations and frozen mutation fragments. This is on the hot path: all writes with collections benefit. Add a `--collection` flag to `perf-simple-query` to allow measuring the performance improvement of this PR. With `dbuild -it -- build/release/scylla perf-simple-query --collection=16 -c1 -m2G --default-log-level=error --write` the number of allocations drop from ~123 to 102, which is a significant amount of allocations shaved off. Refs: https://github.com/scylladb/scylladb/issues/3602 (solves one use-case out of the many listed therein) Fixes: SCYLLADB-1046 Fixes: SCYLLADB-1077 Backport: this is an optimization so normally not a backport candidate, but we may have to backport to relieve certain customers Closes scylladb/scylladb#29033 * github.com:scylladb/scylladb: test/perf/perf_simple_query: add --collection=N test/boost/frozen_mutation_test: add freeze/unfreeze test for large collections mutation/mutation_partition_view: use read_from_collection_cell_view() to read collections mutation/collection_mutation: introduce read_from_collection_cell_view() mutation/atomic_cell: atomic_cell_type: add write() and serialized_size() mutation/collection_mutation: generalize serialize_collection_mutation mutation/mutation_partition_view: avoid copying collection mutation/mutation_partition_view: accept collection_mutation in the consume API partition_builder: add move variant of accept_*_cell() collection overloads	2026-05-10 20:39:08 +03:00
Nadav Har'El	2501a22b10	alternator: remove unneeded call to format() Removed a silly call to format() on a constant string without parameters.	2026-05-10 20:34:36 +03:00
Nadav Har'El	b3a62dc9d2	alternator: improve CONTAINS operator's validity checking Copilot who review the implementation of the CONTAINS operator complained that in some places we assume without checking that the user-providing parameter to CONTAINS has the expected structure. Not doing all the checks explicitly is actually not terrible in RapidJSON, because its methods like BeginMembers() always validate the type before trying to follow a pointer, throwing an exception if it the JSON value doesn't have the right type. But it's still cleaner to do these checks explicitly, and throw a clean SerializationError instead of some internal server error. So this is what this patch does. If the malformed object doesn't come from the query but rather comes from the data, we just silently return false. This is our usual convention - we don't expect malformed data in our database, but if we do have some (see issue #8070) we shouldn't tell the user that there was an error in his completely valid query. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-05-10 20:34:36 +03:00
Yaniv Kaul	a6cf45f9e2	data_dictionary: fix swapped arguments in extraneous options error The format string says "Extraneous options for {type}: {values}" but the arguments were passed in the wrong order (values first, type second), producing misleading error messages like "Extraneous options for bucket,endpoint: S3" instead of "Extraneous options for S3: bucket,endpoint". Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2026-05-10 17:51:20 +03:00
Yaniv Kaul	a13da94308	types: fix wrong variable in map key/value size error messages Four error messages for oversized map keys/values reported map_size (the total number of entries) instead of the actual key or value size that exceeded the limit. The condition checks elem.first.size() or elem.second.size(), but the error message printed map_size. This affects both the bytes and managed_bytes serialization overloads. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2026-05-10 17:51:20 +03:00
Yaniv Kaul	bf1d59ad95	ent: fix missing format placeholders in encryption error/log messages Fix two format string bugs: - kmip_host.cc: cmd_in was passed as an argument to a trace log but had no {} placeholder, so the command was silently dropped. - kms_host.cc: the XML node name (what) was passed to the error message but had no {} placeholder, so the error never showed which XML node was missing. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2026-05-10 17:51:20 +03:00
Yaniv Kaul	a76774f8f9	mutation: fix spurious argument in shadowable_tombstone formatter The formatter for shadowable_tombstone had a spurious t.tomb() argument between the timestamp and deletion_time arguments. This caused t.tomb() (the whole tombstone) to be formatted into the deletion_time={} slot, while the actual deletion_time count was silently dropped. Remove the extra argument. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>	2026-05-10 17:51:19 +03:00

... 2 3 4 5 6 ...

53948 Commits