scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 11:36:54 +00:00

Author	SHA1	Message	Date
Dimitrios Symonidis	4c0a991017	test/cluster: fix proxy resource leak in internode compression test The test_internode_compression_between_datacenters test was flaky due to proxy servers and leased host IPs not being cleaned up on failure paths. If any exception occurred after proxies were started (e.g. during server_start or driver_connect), the asyncio.Server listeners remained bound and leased hosts were never released back to HostRegistry. On subsequent test runs, this caused EADDRINUSE (errno 98) when trying to bind the same address:port. Wrap the proxy/server lifecycle in try/finally to ensure proxies are always stopped and hosts are always released, regardless of whether the test succeeds or fails. Fixes: SCYLLADB-2183 Closes scylladb/scylladb#30127	2026-05-29 13:51:43 +03:00
Pavel Emelyanov	5d0371620d	test/backup: Reduce s3 logging from trace to debug Change s3 log level from TRACE to DEBUG in backup tests. TRACE level generates excessive log volume with too much low-level detail about S3 operations. While it was usefult in the early days of S3 client, nowadays DEBUG level likely provides sufficient diagnostic information for backup test troubleshooting. The reduced log volume significantly improves test performance, which is the main outcome of this change: - Less I/O time writing logs during test execution - Faster teardown: each test scans all server logs for errors, and smaller logs mean faster grep operations (23.3s → 9.97s for 8-node cluster teardown) Impact on test_restore_with_streaming_scopes[topology4] (8 nodes): - Log volume: 49 MB → 23 MB (reduced by half) - Test runtime: 82.55s → 57.53s (30% faster) - Teardown time: 23.3s → 9.97s (57% faster) Tests that start smaller clusters also have notable timing improvements Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30109	2026-05-29 13:46:10 +03:00
Pavel Emelyanov	24c0ea6b19	sstables_loader: Prevent table destruction during tablet restore download Similar to `e5e6608f20` ("sstables_loader: prevent use-after-free on table drop during streaming") which fixed the same class of race for load_and_stream, the tablet restore path also holds a replica::table& reference across the download_sstable() coroutine without preventing concurrent table destruction. If DROP KEYSPACE is applied while download_sstable() is writing SSTable components to the table's data directory, the directory is removed mid-write causing ENOENT → abort (with --abort-on-internal-error). Fix by acquiring a stream_in_progress() phaser guard after find_column_family() and before download_sstable(). table::stop() calls _pending_streams_phaser.close() which blocks until all outstanding guards are released, keeping the table alive for the duration of the download. Fixes: SCYLLADB-2187 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#30094	2026-05-29 13:43:37 +03:00
Pavel Emelyanov	8b2ff16cae	schema: Move grace_period from schema_ctxt to schema_registry The schema_registry_grace_period field on schema_ctxt was only used by schema_registry itself for eviction timing. Move it to be a direct member of schema_registry, passed at init() time. This removes one db::config dependency from schema_ctxt. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#30038	2026-05-29 13:42:23 +03:00
Botond Dénes	1384c9523e	Merge 'Simplify handler injection call sites to use appropriate existing API' from Pavel Emelyanov Several error injection call sites use the verbose handler-lambda API when simpler alternatives already exist in the framework. This series converts them to use the appropriate overloads, reducing boilerplate and making the injection intent immediately obvious from the call site. Cleaning up in-code debugging facilities, no need to backport Closes scylladb/scylladb#29962 * github.com:scylladb/scylladb: error_injection: Convert handler-style breakpoints to wait_for_message sugar error_injection: Convert no-op handler injections to enter()/is_enabled() error_injection: Convert handler-throw injections to lambda-throw style utils: Add share_messages parameter to breakpoint injection API	2026-05-29 13:41:09 +03:00
Botond Dénes	3ae88e31bd	Merge 'test/pylib: stop using random ports for MinIO and JMX' from Piotr Smaron Replace random port selection in MinIO and JMX test helpers with fixed ports on unique per-test loopback IPs, eliminating TOCTOU races. Commits: - kmip_wrapper: default hostname to 127.0.0.1 - nodetool: bind JMX to the per-module loopback IP with fixed port 7199 - minio: use fixed service and console ports on a unique HostRegistry IP instead of probing the ephemeral range; raise on start failure Fixes: SCYLLADB-1817 Minor improvement, no need to backport. Closes scylladb/scylladb#29741 * github.com:scylladb/scylladb: test/pylib: use fixed MinIO ports on unique loopback IPs test/nodetool: bind JMX to per-module loopback IP test/pylib: default KMIP wrapper to loopback	2026-05-29 13:40:24 +03:00
Botond Dénes	46631692cd	mutation_fragment_stream_validator: use legacy byte order for same-token partition key comparison When two partition keys share the same token, their relative order is determined by their raw serialized bytes (legacy_tri_compare), which matches the physical on-disk order in SSTables. The validator was using partition_key::tri_compare instead — a type-aware comparator that can disagree with byte order for types like timeuuid. The result was a false-positive "out-of-order partition key" error for any two same-token partitions whose timeuuid (or other type-aware) order is the reverse of their byte order. In scrub mode this caused the second partition to be silently dropped. Fixes: SCYLLADB-2304 Closes scylladb/scylladb#30120	2026-05-29 11:54:20 +02:00
Tomasz Grabiec	5ceabcbcc5	Merge 'tablets: fix update_tablet_metadata failures during bootstrap' from Aleksandra Martyniuk When partition_split_builder splits a tablet metadata partition into multiple mutations, the first mutation gets the partition tombstone and/or static row while subsequent mutations contain only clustered rows. The hint logic would correctly clear tokens (marking a full partition read) upon seeing the tombstone in the first mutation, but then re-add tokens when processing the subsequent row-only mutations. This caused update_tablet_metadata to attempt a point update via mutate_tablet_map_async on a tablet map that doesn't exist yet during bootstrap, throwing no_such_tablet_map and failing the snapshot transfer. Fix by adding a full_read flag to table_hint. Once a full partition read is decided (due to partition tombstone, range tombstone, static row, or row deletion), the flag prevents subsequent mutations for the same table from re-adding tokens. Additionally, fall back to a full partition read when the tablet map is missing locally, which happens when the joining node receives tablet metadata for a table it has never seen before. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-2303. Needs backports to 2026.1+. 2026.1 introduces the regression with `b17a36c071` Closes scylladb/scylladb#30115 * github.com:scylladb/scylladb: tablets: fall back to full partition read when tablet map is missing tablets: fix hint re-adding tokens after full partition read decision	2026-05-29 11:53:36 +02:00
Botond Dénes	091e3f5191	Merge 'test.py: reduce resource metrics gathering overhead' from Evgeniy Naydanov Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O. Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise. Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent() that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the loop control to both space out iterations and allow immediate shutdown responsiveness. Fixes SCYLLADB-2141 Closes scylladb/scylladb#29987 * github.com:scylladb/scylladb: test: use non-blocking cpu_percent in SystemResourceMonitor test.py: reduce cgroup overhead in resource metrics gathering	2026-05-29 10:52:17 +03:00
Nadav Har'El	7a387a499f	Merge 'cql3: extract vector search select statement into cql3/statements/external_search/' from Szymon Malewski Extract vector_indexed_table_select_statement and its filter logic out of the monolithic select_statement.cc and vector_search/ module into a dedicated directory cql3/statements/index_search/. This improves modularity and eliminates a circular dependency between cql3 and vector_search: the filter code depends heavily on cql3 types (expressions, query_options, statement_restrictions) and belongs in the cql3 layer. Follow-up to VECTOR-250 which originally addressed the same dependency but has since regressed. This is also a preparatory refactoring for full-text search select statements, which can share some implementation with the vector search. Pure refactoring, no semantic changes - no need for backporting. Closes scylladb/scylladb#30100 * github.com:scylladb/scylladb: vector_index: move filter into cql3/statements/external_search cql3: extract vector_indexed_table_select_statement into own compilation unit vector_index: split query_base_table to return raw coordinator_result	2026-05-28 11:26:49 +03:00
Piotr Dulikowski	8dfd455001	Merge 'strong consistency: fix drop table blocking on stuck writes and handle timeout in update()' from Petr Gusev - Fix table drop blocking for the full client timeout when in-flight writes can't reach quorum - Handle unhandled timeout exception in the wait-for-leader loop during group startup When a strongly consistent table is dropped, `schedule_raft_group_deletion`() calls `g->close()` which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires (~seconds), unnecessarily delaying group deletion. Additionally, the wait-for-leader loop in groups_manager::update() uses abort_on_expiry with a 60-second timeout but never catches the exception if it fires, leaving the group in an indeterminate state. SCYLLADB-2080 fix: - Reorder `schedule_raft_group_deletion`: initiate gate close (prevents new operations), then abort the raft server (unblocks stuck writes by causing `raft::stopped_error`), then await the gate future (resolves immediately since holders are released). - Handle `raft::stopped_error` in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return `no_such_column_family` (CQL layer converts to InvalidRequest: unconfigured table). Otherwise fall through to the default timeout handling. - Replace gate->hold() with try_hold() + on_internal_error in acquire_server, with a comment explaining why the gate can never be closed at that point (table removal in `schema_applier::commit_on_shard` precedes gate closure, with no scheduling point in between). Timeout handling fix: - Use `coroutine::as_future` in the wait-for-leader loop to catch timeout exceptions gracefully — log a warning and break out instead of propagating unhandled. Includes a cluster test reproducer (test_drop_table_unblocks_stuck_write) that: 1. Pauses a write on the leader before add_entry 2. Drops the table (follower destroys its group immediately) 3. Resumes the write — verifies it fails promptly with InvalidRequest ("unconfigured table") instead of hanging for 15 seconds backport: no need, strong consistency is not released yet Fixes: SCYLLADB-2080 Closes scylladb/scylladb#30105 * github.com:scylladb/scylladb: strong consistency/groups_manager: handle timeout in update() wait-for-leader loop strong consistency: abort raft server before gate close when dropping a table test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080	2026-05-28 09:59:20 +02:00
Szymon Malewski	ed1006928f	vector_index: move filter into cql3/statements/external_search Move prepared_filter, prepared_restriction, prepared_rhs types and prepare_filter() from vector_search/filter.{hh,cc} into new files cql3/statements/external_search/filter.{hh,cc} under namespace cql3::statements::external_search. This eliminates a circular dependency between the cql3 and vector_search modules: the filter code depends heavily on cql3 types (expressions, query_options, statement_restrictions) and belongs in the cql3 layer. This is a follow-up to VECTOR-250 which originally addressed the same circular dependency but has since regressed.	2026-05-27 21:43:56 +02:00
Ferenc Szili	76dac2fd8e	test: fix format string typo in error logging in ldap_server.py This change fixes a typo in the error logging format string: s% -> %s Fixes: SCYLLADB-2244 Closes scylladb/scylladb#30088	2026-05-27 17:22:21 +03:00
Aleksandra Martyniuk	d6c1707a04	tablets: fix hint re-adding tokens after full partition read decision When partition_split_builder splits a tablet metadata partition into multiple mutations, the first mutation gets the partition tombstone and/or static row while subsequent mutations contain only clustered rows. The tablet metadata change hint logic would correctly clear tokens (marking a full partition read) upon seeing the tombstone in the first mutation, but then re-add tokens when processing the subsequent row-only mutations. This caused update_tablet_metadata to attempt a point update via mutate_tablet_map_async on a tablet map that doesn't exist yet during bootstrap, throwing no_such_tablet_map and failing the snapshot transfer. Fix by adding a full_read flag to table_hint. Once a full partition read is decided (due to partition tombstone, range tombstone, static row, or row deletion), the flag prevents subsequent mutations for the same table from re-adding tokens.	2026-05-27 15:36:16 +02:00
Nadav Har'El	21ecc12fc6	Merge 'index: fix local vector index locality detection after schema reload' from Michał Hudobski After schema reload, `target_parser::is_local()` did not recognize the vector-index local target format `{"pk": [...], "tc": "..."}`, causing local vector indexes to be treated as global. This broke duplicate detection when both a global and a local vector index existed on the same column. Fix by introducing `vector_index::is_local()` and dispatching to it from `create_index_from_index_row()` based on the index class. Also adds tests for local/global vector index coexistence. Fixes: SCYLLADB-987 backport reasoning: we added local vector index support in 2026.1 Closes scylladb/scylladb#29492 * github.com:scylladb/scylladb: test/cqlpy: add tests for global and local vector index coexistence index: fix local vector index locality detection after schema reload	2026-05-27 15:34:57 +03:00
Petr Gusev	d922c43358	strong consistency: abort raft server before gate close when dropping a table When a strongly consistent table is dropped, schedule_raft_group_deletion() used to call g->close() first, which waits for all in-flight operations to release their gate holders. But other nodes may have already destroyed their raft servers for this group, so an in-flight write on the leader cannot reach quorum and hangs until the client timeout expires, unnecessarily delaying group deletion. Fix: initiate gate close (prevents new operations from entering), then abort the raft server (causes in-flight add_entry/read_barrier to throw raft::stopped_error, releasing their gate holders), then await the gate future (resolves immediately since holders are now released). Handle raft::stopped_error in the coordinator's top-level catch blocks (both write and read paths): if the table no longer exists, return no_such_column_family (which the CQL layer converts to InvalidRequest 'unconfigured table'). Otherwise fall through to the default timeout handling. Also replace gate->hold() with try_hold() + on_internal_error in acquire_server, and handle the timeout exception in the wait-for-leader loop in update() gracefully (log + break instead of propagating). Fixes: SCYLLADB-2080	2026-05-27 12:06:46 +02:00
Petr Gusev	89307064b5	test/cluster: rewrite test_queries_while_dropping_table for SCYLLADB-2080 Rewrite the test to use 2 nodes (RF=2) instead of 1 (RF=1), which exposes the quorum-loss scenario: when a table is dropped, the follower destroys its raft group immediately while the leader's in-flight operations are still holding the gate. The test pauses both a read and a write on the leader, drops the table, then resumes them. Both are expected to fail with 'no such column family' since the raft server is aborted as part of group deletion. A 15-second timeout guard detects the old buggy behavior (write stuck forever). Marked xfail until the fix is applied in the next commit.	2026-05-27 12:06:46 +02:00
Botond Dénes	555cfbcd38	Merge 'treewide: replace deprecated smp::count and smp::all_cpus() with new APIs' from Avi Kivity Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. - schema_builder: sometimes called from BOOST_AUTO_TEST_CASE without a reactor. Added pre-patch that makes the implicit shard count parameter implicit and pass 1 in those cases. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings. No backport: the Seastar commit that deprecated these function hasn't (and won't) make its way into any release branches (and the warnings are cosmetic anyway) Closes scylladb/scylladb#29990 * github.com:scylladb/scylladb: treewide: replace deprecated smp::count and smp::all_cpus() with new APIs scylla-gdb: read shard count from smp::_this_smp instead of smp::count schema_builder: make shard_count an explicit constructor parameter	2026-05-27 09:42:06 +03:00
Avi Kivity	8010e408a2	treewide: replace deprecated smp::count and smp::all_cpus() with new APIs Replace all uses of the deprecated seastar::smp::count with this_smp_shard_count() and smp::all_cpus() with this_smp_all_shards() across the ScyllaDB codebase (seastar submodule untouched). Both replacement functions require a reactor thread context. All call sites were verified to run on reactor threads. Notable cases: - dht/token-sharding.hh: this_smp_shard_count() is used as a default parameter value. This is safe since all callers are on reactor threads, but the expression is now evaluated at each call site rather than being a reference to a global variable. - service/storage_service.hh, locator/abstract_replication_strategy.hh, ent/encryption/encryption.cc: used in default member initializers and constructor member-init-lists. Objects are always constructed on reactor threads. Not changed: - scylla-gdb.py: reads smp::count as a GDB symbol (no reactor context). - Python test files: only reference smp::count in comments/strings.	2026-05-26 17:35:20 +03:00
Wojciech Mitros	ae0d77257f	mv: fix view_update_builder losing fragments across batch boundaries When a mutation generates more view updates than max_rows_for_view_updates (100), view_update_builder::build_some() splits the work into multiple batches. There was a bug in how fragments were read between batches: When should_stop_updates() returned true, the old code called stop() which returned stop_iteration::yes without reading the next fragments. On the next build_some() call, read_both_next_fragments() was called at the start, which advanced BOTH readers - skipping any fragment that was already read but not yet consumed. A row could be not consumed if either: - the 100th (last in the batch) update was a row insertion and we still had insertions/updates remaining - the 100th (last in the batch) update was a row deletion and we still had deletions/updates remaining For the most common case where work is split in batches, i.e. range deletions, we couldn't hit this because range delete generates only view row deletions. On tables with a single materialized view, we also couldn't get this for any batches with less than 50 statements (unless the batch also contained range deletions), because one non-range-delete update can generate up to 2 view updates. Howeveer, for a range of scenarios outside these 2, we could lose view updates, resulting in persistent inconsistencies. The fix: - read_*_next_fragment() now accept a stop_iteration parameter, so the next fragments are always read after consuming (even when stopping), but stop_iteration::yes is correctly propagated to break the loop. - build_some() no longer re-reads fragments at the start. Instead, an initialize() method performs the initial read once at construction. - because now we only advance readers after consuming, we won't advance readers after end_of_partition, so we extend the break condition to accept either readers evaluating to `false` or them being at the end_of_partition. We also handle the optimization with _skip_row_updates Fixes: scylladb/scylladb#29155 Closes scylladb/scylladb#29498	2026-05-26 14:15:12 +02:00
Pavel Emelyanov	cd7d9a63bc	error_injection: Convert handler-style breakpoints to wait_for_message sugar Replace verbose handler lambdas that only log and call wait_for_message() with the equivalent one-liner breakpoint sugar. The behavior is identical -- the sugar produces the same log messages in the format "{name}: waiting for message" / "{name}: message received". Update Python tests that waited for the old ad-hoc log messages to match the new standardized format. Converted injections: - topology_state_load_before_update_cdc (storage_service.cc) - migration_streaming_wait x2 (storage_service.cc) - pause_after_streaming_tablet (storage_service.cc) - cdc_generation_publisher_fiber (topology_coordinator.cc) - wait_after_tablet_cleanup (topology_coordinator.cc) - fast_orphan_removal_fiber (topology_coordinator.cc) - split_storage_groups_wait (table.cc) - wait_before_stop_compaction_groups (table.cc) - tasks_vt_get_children (task_manager.cc) - truncate_compaction_disabled_wait (database.cc) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-05-26 15:01:01 +03:00
Avi Kivity	c59985c38b	Merge 'cql3: limit large allocations when parsing queries' from Botond Dénes Queries are stored and passed around as sstring/std::string_view. While normally they are small enough to not cause problems, as the `test_cdc_large_values.TestLargeColumnsWithCDC.test_single_column_blob_max_size_with_cdc_preimage_full_postimage[unprepared_statements]` demonstrates, queries can be arbitrarily large, putting heavy strain on Scylla internals via large allocations, in the extreme case causing denial of service. This PR attempts to alleviate this by using fragmented storage for queries: read query as fragmented string from the input stream in `transport/server.cc`, propagate it as such to `query_processor::prepare()` and also store it as such in `cql3::cql_statement::raw_cql_statement`. Also avoid linearizing raw values during in the CQL expression tree: switch `cql3::expr::untyped_constant::raw_text` to fragmented storage. For this to be possible, some infrastructure code had to be made fragmented storage friendly: ascii/utf8 validation, hashers, from_hex and importantly: `abstract_type::from_string()`. Unfortunately, the query still has to be linearized for parsing itself, as ANTLR -- although allows for custom InputStream implementation -- plays pointer arithmetics games with the pointers obtained from them, so fragmented input cannot be used. Still, this PR limits the places where the query is linearized to the following: * Parsing * Audit * Logs and error messages So the normal query paths for queries that actually can get arbitrarily large (UPDATE and INSERT) should only linearize the query temporarily for parsing. Fixes #10779 Improvement, no backport Closes scylladb/scylladb#28619 * github.com:scylladb/scylladb: tracing: add_query(): change query param to utils::chunked_string cql3: store raw query string in utils::chunked_string serializer: add serializer<utils::chunked_string> utils/reusable_buffer: add get_linearized_view(managed_bytes_view) cql3/expr: use utils::chunked_string for untyped_constant::raw_text types: abstract_type::from_string() switch to fragmented buffers (implementation) types: abstract_type::from_string() switch to fragmented buffers (interface) types: use write_fragmented from utils/fragment_range.hh types: timestamp_from_string(): don't assume std::string_view is null-terminated types/duration: don't assume std::string_view is null-terminated utils/hashers: add calculate(managed_bytes_view) overload utils/ascii: add validate(managed_bytes_view) overload utils: add managed_bytes_fwd.hh utils: add chunked_string utils: add managed_bytes_basic_view::byte_iterator	2026-05-26 15:00:53 +03:00
Avi Kivity	f165b396fd	schema_builder: make shard_count an explicit constructor parameter A recent Seastar update deprecated smp::count and introduced this_smp_shard_count() as a replacement. One difference is that this_smp_shard_count() wants to run on a reactor thread. This poses a problem for non-reactor tests (BOOST_AUTO_TEST_CASE) that nevertheless use a schema, as the schema_builder constructor references smp::count. If we replace it with this_smp_shard_count() then it will crash when running without a reactor. To fix, remove the implicit this_smp_shard_count() call from raw_schema's constructor and require callers to pass shard_count explicitly to schema_builder. This allows tests that don't run on a reactor thread to construct schemas without crashing. Production code and reactor-based tests pass this_smp_shard_count(). Non-reactor test files (expr_test, keys_test, nonwrapping_interval_test, wrapping_interval_test, bti_key_translation_test, range_tombstone_list_test) pass a fixed shard count of 1. Note: sstable_test.cc is a Seastar test file (SEASTAR_THREAD_TEST_CASE) but also contains one plain BOOST_AUTO_TEST_CASE (test_empty_key_view_comparison) that constructs a schema_builder without a reactor context. This test also receives a fixed shard count of 1.	2026-05-26 11:55:56 +03:00
Nikos Dragazis	54cb6d4608	test: Order task-wait before finalization in test_migration_wait_task The purpose of this test is to verify that the task manager's "wait" API works correctly for vnodes-to-tablets migration virtual tasks. It starts a `wait_task` HTTP request concurrently with a finalize (or rollback) operation, and asserts that the wait returns the correct final state ("done" or "suspended"). The test `uses asyncio.create_task()` to wrap the wait request into a task, and then immediately calls finalize. With asyncio's lazy task scheduling, the wait coroutine does not start until the event loop yields, so the finalization request reaches the server before wait, and therefore may also complete before it. Once finalization completes, the virtual migration task is no longer discoverable, causing a "task not found" error. Add a log message in Scylla's wait handler and a synchronization point in the test to ensure that the wait request lands the server before finalization. This follows the same pattern used in `test_tablet_tasks.py::check_and_abort_repair_task`. Fixes SCYLLADB-2077 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#29973	2026-05-26 10:43:22 +03:00
Botond Dénes	0fd25dc47c	Merge 'Replace get_injection_parameters() with inject_parameter() where appropriate' from Pavel Emelyanov Several error injection sites use the low-level get_injection_parameters() API to fetch the entire parameters map and then manually look up a single key. The inject_parameter() API is better suited for these cases — it combines the enabled check and typed single-parameter extraction in one call, returning std::optional. Cleaning error injection usage, not backporting Closes scylladb/scylladb#29970 * github.com:scylladb/scylladb: test: Use inject_parameter() in row_cache_test sstables: Use inject_parameter() for mx reader fill buffer timeout streaming: Use inject_parameter() for order_sstables_for_streaming	2026-05-26 10:32:44 +03:00
Nadav Har'El	f65a52f3ec	Merge 'vector_search: test: migrate rescoring tests from C++/Boost to pytest' from Szymon Malewski Migrate mock-based rescoring and oversampling tests from test/vector_search/rescoring_test.cc to pytest and delete the C++ file. Index option validation tests go to test_vector_index.py; rescoring tests go to a new test_vector_search_rescoring.py which introduces shared infrastructure (EmbeddingRow dataclass, TEST_DATA dict, reversed_ann_response() helper, rescoring_test_table() context manager). Two tests have updated assertions (semantic change): filters_invalid_similarity_scores now uses per-function expected result sets including a zero-vector row, and rescoring_with_zerovector_query asserts empty results after NaN filtering (cosine only). Both are marked xfail pending SCYLLADB-924. Follow-up to #29593. Does not require backport - simple refactoring of tests Closes scylladb/scylladb#29906 * github.com:scylladb/scylladb: test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc test/vector_search: migrate invalid similarity score filtering test to pytest test/vector_search: migrate non-ANN similarity argument rescoring test to pytest test/vector_search: migrate wildcard select rescoring test to pytest test/vector_search: migrate similarity_function rescoring test to pytest test/vector_search: migrate rescoring and f32 quantization tests to pytest test/vector_search: migrate oversampling tests to pytest test/vector_search: migrate vector_index option validation tests to pytest	2026-05-26 09:45:40 +03:00
Botond Dénes	2c9a5f9634	types: abstract_type::from_string() switch to fragmented buffers (implementation) The previous patch changed the interface and callers, this one updates the implementation to actually work with fragmented buffers. Most types just use with_linearized() to linearize the fragmented input buffer for parsing. This is fine, as most types have a fixed or bounded-size string representation that is small. Importantly, the input is not linearized for the 3 types which have unbounded values: ascii, bytes and text. The tuple type can contain any of these types itself, so it is also converted to avoid linearization.	2026-05-26 09:08:06 +03:00
Botond Dénes	597d4252dc	types: abstract_type::from_string() switch to fragmented buffers (interface) Change input: str::string_view -> utils::chunked_string_view. Change return value: bytes -> managed_bytes. This patch only changes the interface, with some to_bytes() sprinkled in the internals to deal with recursive calls. Internals will be updated in the next patch, to keep the churn of updating callers separate from the actually important changes.	2026-05-26 09:08:06 +03:00
Botond Dénes	a9028d88b2	utils/hashers: add calculate(managed_bytes_view) overload Uses update() for each fragment, then finalize. Yields identical hash to calling calculate(std::string_view) with linearized buffer. This is checked by new tests.	2026-05-26 09:08:05 +03:00
Botond Dénes	a2fff12bcd	utils: add chunked_string A thin facade over managed_bytes[_view], offering some extra convenience for working with strings, as well as a strong type communicating the purpose (storing text instead of a blob). Also introduces utils::from_hex(chunked_string_view), a fragmented hex-decode that operates directly on a chunked_string_view without requiring linearization. Hex pairs straddling fragment boundaries are handled via a carry-over nibble.	2026-05-26 09:08:05 +03:00
Botond Dénes	09743aed36	utils: add managed_bytes_basic_view::byte_iterator bytes-wise iterator which works both as bidirectional-iterator and as output-iterator (for mutable views). Allows using managed_bytes_view in algorithms which are iterator based. Added unit tests for covering the iterator functionality.	2026-05-26 09:08:05 +03:00
Evgeniy Naydanov	8e76763d7b	test: use non-blocking cpu_percent in SystemResourceMonitor Replace cpu_percent(interval=0.1) with a non-blocking cpu_percent() that returns CPU% since the previous call. Use stop_event.wait(timeout=2.0) as the loop control to both space out iterations and allow immediate shutdown responsiveness.	2026-05-26 05:14:01 +00:00
Evgeniy Naydanov	901c452c82	test.py: reduce cgroup overhead in resource metrics gathering Only enable the memory controller in cgroup subtree_control instead of all available controllers. cpu.stat is available in cgroup v2 without enabling the cpu controller (base accounting), and enabling io/pids/cpu controllers adds unnecessary per-operation kernel overhead to Scylla processes - particularly the memory controller's per-page-cache-operation accounting combined with io controller overhead during heavy I/O. Additionally, restrict SystemResourceMonitor to the master process only. System-wide metrics (CPU%, memory) are identical from any process, so running a monitoring thread in each xdist worker was redundant and added unnecessary SQLite write contention and thread scheduling noise. Co-Authored-By: Claude Opus 4.6 (200K context) <noreply@anthropic.com>	2026-05-26 05:10:05 +00:00
Szymon Malewski	2151a4fac3	test/vector_search: migrate zero-vector query rescoring test to pytest; delete rescoring_test.cc Migrate rescoring_with_zerovector_query from rescoring_test.cc to pytest as test_rescoring_with_zerovector_query. Tested with cosine similarity only because zero vectors produce NaN only for cosine; other functions yield valid scores. The test is marked xfail: similarity_cosine now returns NaN for zero vectors (SCYLLADB-456 fix) and rescoring should filter out NaN scores, yielding an empty result set. Semantic change: the test now asserts the desired empty-result behavior instead of asserting that the query does not throw. Delete rescoring_test.cc now that all tests have been migrated and remove its entries from configure.py and test/vector_search/CMakeLists.txt.	2026-05-26 00:37:54 +02:00
Szymon Malewski	533a8e65fe	test/vector_search: migrate invalid similarity score filtering test to pytest Migrate no_nulls_in_rescored_results from rescoring_test.cc to pytest, renamed to test_filters_invalid_similarity_scores_in_rescored_results. The test now also inserts a zero-vector row (id=14) to cover the case introduced when similarity_cosine was changed to return NaN for zero vectors instead of throwing (SCYLLADB-456). The expected surviving set of rows is refined per similarity function based on which inputs produce valid (non-NaN, non-Infinity) similarity scores. Marked xfail because rescoring does not yet filter rows with invalid scores. Semantic change: the expected surviving row set is updated per the behavior described above.	2026-05-26 00:37:54 +02:00
Szymon Malewski	63d9b7445f	test/vector_search: migrate non-ANN similarity argument rescoring test to pytest Migrate select_similarity_function_other_than_ann_ordering from rescoring_test.cc to pytest. The test verifies that similarity scores in SELECT are computed against the explicitly supplied argument vector rather than the ANN ordering vector. No semantic change.	2026-05-26 00:37:54 +02:00
Szymon Malewski	0cb557695a	test/vector_search: migrate wildcard select rescoring test to pytest Migrate wildcard_select_is_correctly_rescored from rescoring_test.cc to pytest. The test verifies that SELECT * with rescoring returns rows in the correct similarity order with correct embedding values, covering a slightly different processing path from the explicit-column SELECT test. No semantic change.	2026-05-26 00:37:53 +02:00
Szymon Malewski	cae816a8c6	test/vector_search: migrate similarity_function rescoring test to pytest Migrate similarity_function_returns_correctly_rescored_results from rescoring_test.cc to pytest. The test verifies that similarity scores in the SELECT clause are computed correctly after rescoring, for both argument orderings of the similarity function. No semantic change.	2026-05-26 00:37:53 +02:00
Szymon Malewski	78d72309b8	test/vector_search: migrate rescoring and f32 quantization tests to pytest Introduce shared test infrastructure in test_vector_search_rescoring.py: EmbeddingRow dataclass, TEST_DATA dict keyed by similarity function name, ANN_QUERY_VECTOR, reversed_ann_response() helper, and rescoring_test_table() context manager. Migrate result_returned_by_vector_store_is_rescored and f32_quantization_disables_rescoring from rescoring_test.cc. No semantic change.	2026-05-26 00:37:53 +02:00
Szymon Malewski	400c0dbb22	test/vector_search: migrate oversampling tests to pytest Migrate oversampling_multiplies_limit_for_vector_store_query and oversampled_vector_store_results_are_limited_to_cql_limit from rescoring_test.cc to test_vector_search_rescoring_with_mock.py. No semantic change.	2026-05-26 00:37:53 +02:00
Szymon Malewski	9f632182fb	test/vector_search: migrate vector_index option validation tests to pytest CREATE INDEX option tests for quantization, oversampling, and rescoring are moved from rescoring_test.cc to test_vector_index.py alongside the existing index option tests. These tests exercise only option parsing and validation - no vector store mock needed. No semantic change.	2026-05-26 00:37:52 +02:00
Nadav Har'El	96dd3121e7	Merge 'cql: rewrite CassIO SAI metadata index to regular secondary index' from Szymon Wasik CassIO (the library backing LangChain's `langchain_community.vectorstores.Cassandra` integration) issues the following DDL during schema setup to create a metadata index: ```sql CREATE CUSTOM INDEX IF NOT EXISTS eidx_metadata_s_<table> ON <keyspace>.<table> (ENTRIES(metadata_s)) USING 'org.apache.cassandra.index.sai.StorageAttachedIndex'; ``` ScyllaDB does not support Cassandra's StorageAttachedIndex (SAI) for non-vector columns and previously rejected this statement with: ``` StorageAttachedIndex (SAI) is only supported on vector columns; use a secondary index for non-vector columns ``` This blocks seamless migration of existing LangChain/CassIO applications from Cassandra to ScyllaDB — applications fail during initialization before any application-level workaround can run, even when metadata filtering is not used (`metadata_indexing="none"`). CassIO is no longer actively maintained but remains the only official LangChain integration path for Apache Cassandra over CQL, meaning existing applications will continue using this setup pattern. Instead of rejecting the CassIO metadata-map SAI DDL, detect the pattern and rewrite it to a standard ScyllaDB secondary index on collection entries: - Detection: SAI class name + single `ENTRIES` target on a non-frozen `map` column - Rewrite: Clear the custom class so the index is created through the standard secondary index path (which already fully supports indexing map entries) - Warning: Emit a CQL warning informing the user that SAI is not supported by ScyllaDB, a regular secondary index was created instead, and metadata filtering behavior may differ from Cassandra SAI The rewrite is placed early in `validate_while_executing()`, before the rf-rack-validity check, so the standard secondary index code path handles all subsequent validation naturally — no code duplication. After this change, the CassIO schema setup succeeds on ScyllaDB: - `CREATE CUSTOM INDEX ... USING 'sai'` on `ENTRIES(metadata_s)` creates a real secondary index - The index is functional and can accelerate metadata filtering queries - A CQL warning makes the rewrite transparent to operators - SAI on non-vector, non-map-entries columns is still rejected as before - Vector SAI indexes continue to be rewritten to `vector_index` as before - `test_sai_entries_on_map_creates_regular_index` — verifies the index is created and the warning is emitted (fully-qualified SAI class name) - `test_sai_entries_on_map_short_name` — same with the `'sai'` short alias - `test_sai_on_regular_column_rejected` — confirms SAI on regular scalar columns is still rejected All 148 tests in `test_vector_index.py` and `test_secondary_index.py` pass with no regressions (125 passed, 22 xfailed, 1 skipped). Fixes: SCYLLADB-2113 Backport: 2026.2 as this is the version where the support for SAI class needed by LangChain was added. Closes scylladb/scylladb#29981 * github.com:scylladb/scylladb: cql: rewrite CassIO SAI metadata index to regular secondary index db/config: add enable_cassio_compatibility flag	2026-05-26 00:19:03 +03:00
Michał Hudobski	1d17d2144f	index, vector_index: limit primary key columns to 255 The vector-store's InvariantKey type supports at most 255 key components. Reject vector index creation when the base table's primary key (partition + clustering columns) exceeds this limit. Fixes: VECTOR-553 Closes scylladb/scylladb#29317	2026-05-25 19:24:17 +03:00
Piotr Smaron	cf6814aff2	test/pylib: use fixed MinIO ports on unique loopback IPs MinIO tests used random ports from the ephemeral range with a retry loop, which had a TOCTOU window between port selection and bind. Run MinIO on fixed service (9000) and console (9001) ports. To avoid collisions between parallel test workers, MinioWrapper now leases a unique loopback IP from HostRegistry for each instance. Also raise RuntimeError when the MinIO binary is missing or fails to start, instead of silently returning a half-initialized object. Fixes: SCYLLADB-1817	2026-05-25 15:35:51 +02:00
Piotr Smaron	e441ce2fd9	test/nodetool: bind JMX to per-module loopback IP The Cassandra nodetool fixture picked a random JMX port on 127.0.0.1, which can collide with unrelated listeners and has a TOCTOU race between port selection and bind. Bind JMX to the per-module loopback IP with the standard port 7199 instead. Set java.rmi.server.hostname so the RMI endpoint stays on the same leased address.	2026-05-25 15:35:51 +02:00
Piotr Smaron	8fd946c649	test/pylib: default KMIP wrapper to loopback The standalone KMIP CLI wrapper could inherit a non-loopback hostname from defaults when the config did not specify one. Default to 127.0.0.1 so the dynamically assigned port remains local unless a test explicitly overrides the address.	2026-05-25 15:35:51 +02:00
Szymon Wasik	5ee339b11d	cql: rewrite CassIO SAI metadata index to regular secondary index When CassIO creates a SAI ENTRIES index on a map column, ScyllaDB now rewrites it to a regular secondary index and emits a CQL warning. This allows LangChain/CassIO applications to work without DDL errors. The rewrite is gated behind the enable_cassio_compatibility flag (disabled by default). Refs: SCYLLADB-2113	2026-05-25 15:11:43 +02:00
Botond Dénes	db89f3f095	Merge 'compaction_manager: unregister compaction module on early shutdown' from Patryk Jędrzejczak The compaction module is registered with task_manager in the compaction_manager constructor, and unregistered in compaction_manager::really_do_stop(), which was gated behind `_state != state::none` in compaction_manager::do_stop(). Since enable() -- which transitions _state from none to running -- is called later during startup (from database::start() or the disk space monitor callback) than the compaction_manager constructor, an early shutdown could leave the compaction module registered after compaction_manager::do_stop() returned. task_manager::stop() then aborted with 'Tried to stop task manager while some modules were not unregistered'. Fix compaction_manager::do_stop() to call _task_manager_module->stop() even when `_state == state::none`, so that the compaction module is always properly unregistered. Fixes: SCYLLADB-2106 Backport to all supported branches, as the bug is there and it has already caused a failure in 2026.1 CI. Closes scylladb/scylladb#30015 * github.com:scylladb/scylladb: test: add test_stop_before_starting_compaction_manager compaction_manager: unregister compaction module on early shutdown	2026-05-25 16:08:20 +03:00
Dmitry Kropachev	74fa423271	transport: report host id in SUPPORTED Currently driver creates network layout (node IP addresses and ports) from `system.local`, `system.peers`, `system.client_routes` and then runs on assumption that this network layout is correct. It does not check if it is. If, for example it happens so that node ip/port (say on proxy) will not match what driver calculated it will go unnoticed. The goal of this feature is to provide driver host-id on SUPPORTED frame, so that it would know which node it connected to and could make decision wether keep connection or drop it. - add `SCYLLA_HOST_ID` to the CQL `SUPPORTED` response - add a regression test that hooks the Python driver handshake and verifies the reported host id - `python3.12 -m py_compile test/cqlpy/test_protocol_exceptions.py` - syntax-only compile of `transport/server.cc` with the repo toolchain flags inside `dbuild` Refs #27452 Refs https://scylladb.atlassian.net/browse/DRIVER-610 Closes scylladb/scylladb#29809	2026-05-25 14:36:53 +03:00
Avi Kivity	892f22f49c	Merge 'cql: atomic add/subtract operations with LWT' from Nadav Har'El ScyllaDB has special counter columns for which atomic add/subtract operations like `SET a = a + 1` are allowed. Such operations have not been allowed on ordinary non-counter columns, as they would not be properly atomic - the read an the write are separate, and concurrent operations can have incorrect results. This patch makes it allowed to use such atomic add/subtract operations in LWT statements. For example UPDATE ... SET a = a - 7 IF a > 0 or UPDATE ... SET a = a + 1 IF a != NULL The row updated in the operation, and the updated column (`a`) should be initialized before the update. The example `SET a = a + 1 IF a != NULL` will fail the condition if `a` is not set. A different request `SET a = a + 1 IF EXISTS` will just leave `a` unset if it's unset (NULL + 1 is NULL, this is SQL's null propagation rules). This add/subtract operations is allowed on any numeric (integer or floating point) column. The ability of LWT to fetch the old values of a column and use it to calculate the new value has long been available in our internal CAS implementation - and has been in use for years in Alternator - but until this patch it was not exposed in CQL's LWT. This series does not add new syntax to CQL - the "SET a = a + b" and "SET a = a - b" syntax already existed for counters, and we just allow the same syntax for non- counters. However, the series does add a bit of machinery that will allow us to easily support more general expressions in the future. In particular, this series implements the addition, subtraction, and unary-minus operators for expressions, and adds the machinery needed to run any expression in "SET a = expr()", using existing row values fetched by LWT. This is a new Scylla-only feature that does not exist in Cassandra. Fixes #10568 Refs #22918 ("Support arithmetic operators"), SCYLLADB-1576 ("Decimal arithmetic operations OOM") This is is a new feature, so normally would not be backported. Closes scylladb/scylladb#29939 * github.com:scylladb/scylladb: cql: atomic add/subtract operations with LWT cql3: let constants::setter evaluate expressions using prefetched row data cql3/expr: add NEG unary operator for numeric negation cql3/expr: add SUB binary operator for numeric subtraction cql3/expr: add ADD binary operator for numeric addition types: add is_arithmetic() method for types	2026-05-25 14:27:33 +03:00

1 2 3 4 5 ...

11929 Commits