scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 18:10:39 +00:00

Author	SHA1	Message	Date
Michael Litvak	3468e8de8b	test/mv/test_mv_staging: wait for cql after restart Wait for cql on all hosts after restarting a server in the test. The problem that was observed is that the test restarts servers[1] and doesn't wait for the cql to be ready on it. On test teardown it drops the keyspace, trying to execute it on the host that is not ready, and fails. Fixes SCYLLADB-1632 Closes scylladb/scylladb#29562	2026-04-23 12:40:19 +02:00
Marcin Maliszkiewicz	3df951bc9c	Merge 'audit: set audit_info for native-protocol BATCH messages' from Andrzej Jackowski Commit `16b56c2451` ("Audit: avoid dynamic_cast on a hot path") moved audit info into batch_statement via set_audit_info(), but only wired it for the CQL-text BATCH path (raw::batch_statement::prepare()). Native-protocol BATCH messages (opcode 0x0D), handled by process_batch_internal in transport/server.cc, construct a batch_statement without setting audit_info. This causes audit to silently skip the entire batch. Set audit_info on the batch_statement so these batches are audited. Fixes SCYLLADB-1652 No backport - bug introduced recently. Closes scylladb/scylladb#29570 * github.com:scylladb/scylladb: test/audit: add reproducer for native-protocol batch not being audited audit: set audit_info for native-protocol BATCH messages test/audit: rename internal test methods to avoid CI misdetection	2026-04-22 18:56:28 +02:00
Botond Dénes	eb3326b417	Merge 'test.py: migrate all bare skips to typed skip markers' from Artsiom Mishuta should be merged after #29235 Complete the typed skip markers migration started in the plugin PR. Every bare `@pytest.mark.skip` decorator and `pytest.skip()` runtime call across the test suite is replaced with a typed equivalent, making skip reasons machine-readable in JUnit XML and Allure reports. 62 files changed across 8 commits, covering ~127 skip sites in total. Bare `pytest.skip` provides only a free-text reason string. CI dashboards (JUnit, Allure) cannot distinguish between a test skipped due to a known bug, a missing feature, a slow test, or an environment limitation. This makes it hard to track skip debt, prioritize fixes, or filter dashboards by skip category. The typed markers (`skip_bug`, `skip_not_implemented`, `skip_slow`, `skip_env`) introduced by the `skip_reason_plugin` solve this by embedding a `skip_type` field into every skip report entry. \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_bug` \| 24 \| 16 \| Skip reason references a known bug/issue \| \| `skip_not_implemented` \| 10 \| 5 \| Feature not yet implemented in Scylla \| \| `skip_slow` \| 4 \| 3 \| Test too slow for regular CI runs \| \| `skip_not_implemented` (bare) \| 2 \| 1 \| Bare `@pytest.mark.skip` with no reason (COMPACT STORAGE, #3882) \| \| Type \| Count \| Files \| Description \| \|------\|-------\|-------\|-------------\| \| `skip_env` \| ~85 \| 34 \| Feature/config/topology not available at runtime \| \| `skip_bug` \| 2 \| 2 \| Known bugs: Streams on tablets (#23838), coroutine task not found (#22501) \| - Comments: 7 comments/docstrings across 5 files updated from `pytest.skip()` to `skip()` - Plugin hardened: `warnings.warn()` → `pytest.UsageError` for bare `@pytest.mark.skip` at collection time — bare skips are now a hard error, not a warning - Guard tests: New `test/pylib_test/test_no_bare_skips.py` with 3 tests that prevent regression: - AST scan for bare `@pytest.mark.skip` decorators - AST scan for bare `pytest.skip()` runtime calls - Real `pytest --collect-only` against all Python test directories Runtime skip sites use the convenience wrappers from `test.pylib.skip_types`: ```python from test.pylib.skip_types import skip_env ``` Usage: ```python skip_env("Tablets not enabled") ``` 1. test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs — 24 decorator sites, 16 files 2. test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented — 10 decorator sites, 5 files 3. test: migrate @pytest.mark.skip to @pytest.mark.skip_slow — 4 decorator sites, 3 files 4. test: migrate bare @pytest.mark.skip to skip_not_implemented — 2 bare decorators, 1 file 5. test: migrate runtime pytest.skip() to typed skip_env() — ~85 sites, 34 files 6. test: migrate runtime pytest.skip() to typed skip_bug() — 2 sites, 2 files 7. test: update comments referencing pytest.skip() to skip() — 7 comments, 5 files 8. test/pylib: reject bare pytest.mark.skip and add codebase guards — plugin hardening + 3 guard tests - All 60 plugin + guard tests pass (`test/pylib_test/`) - No bare `@pytest.mark.skip` or `pytest.skip()` calls remain in the codebase - `pytest --collect-only` succeeds across all test directories with the hardened plugin SCYLLADB-1349 Closes scylladb/scylladb#29305 * github.com:scylladb/scylladb: test/alternator: replace bare pytest.skip() with typed skip helpers test: migrate new bare skips introduced by upstream after rebase test/pylib: reject bare pytest.mark.skip and add codebase guards test: update comments referencing pytest.skip() to skip_env() test: migrate runtime pytest.skip() to typed skip_bug() test: migrate runtime pytest.skip() to typed skip_env() test: migrate bare @pytest.mark.skip to skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_slow test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs	2026-04-22 15:48:27 +03:00
Avi Kivity	e84e7dfb7a	build: drop utils/rolling_max_tracker.hh from precompiled header Added by mistake. Precompiled headers should only include library headers that rarely change, since any dependency change causes a full rebuild. Closes scylladb/scylladb#29560	2026-04-22 15:46:50 +03:00
Botond Dénes	3aced88586	Merge 'audit: decrease allocations / instructions on will_log() fast path' from Marcin Maliszkiewicz Audit::will_log() runs on every CQL/Alternator request. Since `9646ee05bd` it constructs three temporary sstrings per call to look up the audited keyspaces set / tables map with std::string_view keys, costing ~180 insns/op and 2 allocations if sstring misses SSO. This series switches the containers to std::less<> comparators to enable heterogeneous lookup, then drops the sstring temporaries from will_log(). perf-simple-query --smp 1 --duration 15 --audit "table" --audit-keyspaces "ks-non-existing" --audit-categories "DCL,DDL,AUTH,DML,QUERY" baseline `3d0582d51e` 36777 insns/op regression `9646ee05bd` 36952 (+175) this series 36768 (-184, fixed) Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1616 Backport: no, offending commit is not backported Closes scylladb/scylladb#29565 * github.com:scylladb/scylladb: audit: drop sstring temporaries on the will_log() fast path audit: enable heterogeneous lookup on audited keyspaces/tables	2026-04-22 15:46:16 +03:00
Marcin Maliszkiewicz	4043d95810	Merge 'storage_service: fix REST API races during shutdown and cross-shard forwarding' from Piotr Smaron REST route removal unregisters handlers but does not wait for requests that already entered storage_service. A request can therefore suspend inside an async operation, restart proceeds to tear the service down, and the coroutine later resumes against destroyed members such as _topology_state_machine, _group0, or _sys_ks — a use-after-destruction bug that surfaces as UBSAN dynamic-type failures (e.g. the crash seen from topology_state_load()). Fix this by holding storage_service::_async_gate from the entry boundary of every externally-triggered async operation so that stop() drains them before teardown begins. The gate is acquired in run_with_api_lock, run_with_no_api_lock, and in individual REST handlers that bypass those wrappers (reload_raft_topology_state, mark_excluded, removenode, schema reload, topology-request waits/abort, cleanup, ring/schema queries, SSTable dictionary training/publish, and sampling). Additionally, fix get_ownership() and abort_topology_request() which forward work to shard 0 but were still referencing the caller-shard's `this` pointer instead of the destination-shard instance, causing silent cross-shard access to shard-local state. Add a cluster regression test that repeatedly exercises the multi-shard ownership REST path to cover the forwarding fix. Fixes: SCYLLADB-1415 Should be backported to all branches, the code has been introduced around 2024.1 release. Closes scylladb/scylladb#29373 * github.com:scylladb/scylladb: storage_service: fix shard-0 forwarding in REST helpers storage_service: gate REST-facing async operations during shutdown storage_service: prepare for async gate in REST handlers	2026-04-22 14:43:31 +02:00
Radosław Cybulski	cc39b54173	alternator: use `stream_arn` instead of `std::string` in list_streams Use `stream_arn` object for storage of last returned to the user stream instead of raw `std::string`. `stream_arn` is used for parsing ARN incoming from the user, for returning `std::string` was used because of buggy copy / move operations of `stream_arn`. Those were fixed, so we're fixing usage as well. Fixes: SCYLLADB-1241 Closes scylladb/scylladb#29578	2026-04-22 14:02:53 +02:00
Artsiom Mishuta	183c6d120e	test: exclude pylib_test from default test runs Add pylib_test to norecursedirs in pytest.ini so it is not collected during ./test.py or pytest test/ runs, but can still be run directly via 'pytest test/pylib_test'. Also fix pytest log cleanup: worker log files (pytest_gw*) were not being deleted on success because cleanup was restricted to the main process only. Now each process (main and workers) cleans up its own log file on success. Closes scylladb/scylladb#29551	2026-04-22 11:38:40 +02:00
Piotr Smaron	dffb266b79	storage_service: fix shard-0 forwarding in REST helpers get_ownership() and abort_topology_request() forward work to shard 0 via container().invoke_on(0, ...) but the lambda captured 'this' and accessed members through it instead of through the shard-0 'ss' parameter. This means the lambda used the caller-shard's instance, defeating the purpose of the forwarding. Use the 'ss' parameter consistently so the operations run against the correct shard-0 state.	2026-04-22 10:30:33 +02:00
Piotr Smaron	6a91d046f3	storage_service: gate REST-facing async operations during shutdown Hold _async_gate in all REST-facing async operations so that stop() drains in-flight requests before teardown, preventing use-after-free crashes when REST calls race with shutdown. A centralized gated() wrapper in set_storage_service (api/storage_service.cc) automatically holds the gate for every REST handler registered there, so new handlers get shutdown-safety by default. run_with_api_lock_internal and run_with_no_api_lock hold _async_gate on shard 0 as well, because REST requests arriving on any shard are forwarded there for execution. Methods that previously self-forwarded to shard 0 (mark_excluded, prepare_for_tablets_migration, set_node_intended_storage_mode, get_tablets_migration_status, finalize_tablets_migration) now assert this_shard_id() == 0. Their REST handlers call them via run_with_no_api_lock, which performs the shard-0 hop and gate hold centrally. Fixes: SCYLLADB-1415	2026-04-22 10:30:33 +02:00
Piotr Smaron	74dd33811e	storage_service: prepare for async gate in REST handlers Add hold_async_gate() public accessor for use by the REST registration layer in a followup commit. Convert run_with_no_api_lock to a coroutine so a followup commit can hold the async gate across the entire forwarded operation. No functional changes.	2026-04-22 10:28:54 +02:00
Botond Dénes	18ceeaf3ef	Merge 'Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode' from Raphael Raph Carvalho When tombstone_gc=repair, the repaired compaction view's sstable_set_for_tombstone_gc() previously returned all sstables across all three views (unrepaired, repairing, repaired). This is correct but unnecessarily expensive: the unrepaired and repairing sets are never the source of a GC-blocking shadow when tombstone_gc=repair, for base tables. The key ordering guarantee that makes this safe is: - topology_coordinator sends send_tablet_repair RPC and waits for it to complete. Inside that RPC, mark_sstable_as_repaired() runs on all replicas, moving D from repairing → repaired (repaired_at stamped on disk). - Only after the RPC returns does the coordinator commit repair_time + sstables_repaired_at to Raft. - gc_before = repair_time - propagation_delay only advances once that Raft commit applies. Therefore, when a tombstone T in the repaired set first becomes GC-eligible (its deletion_time < gc_before), any data D it shadows is already in the repaired set on every replica. This holds because: - The memtable is flushed before the repairing snapshot is taken (take_storage_snapshot calls sg->flush()), capturing all data present at repair time. - Hints and batchlog are flushed before the snapshot, ensuring remotely-hinted writes arrive before the snapshot boundary. - Legitimate unrepaired data has timestamps close to 'now', always newer than any GC-eligible tombstone (USING TIMESTAMP to write backdated data is user error / UB). Excluding the repairing and unrepaired sets from the GC shadow check cannot cause any tombstone to be wrongly collected. The memtable check is also skipped for the same reason: memtable data is either newer than the GC-eligible tombstone, or was flushed into the repairing/repaired set before gc_before advanced. Safety restriction — materialized views: The optimization IS applied to materialized view tables. Two possible paths could inject D_view into the MV's unrepaired set after MV repair: view hints and staging via the view-update-generator. Both are safe: (1) View hints: flush_hints() creates a sync point covering BOTH _hints_manager (base mutations) AND _hints_for_views_manager (view mutations). It waits until ALL pending view hints — including D_view entries queued in _hints_for_views_manager while the target MV replica was down — have been replayed to the target node before take_storage_snapshot() is called. D_view therefore lands in the MV's repairing sstable and is promoted to repaired. When a repaired compaction then checks for shadows it finds D_view in the repaired set, keeping T_mv non-purgeable. (2) View-update-generator staging path: Base table repair can write a missing D_base to a replica via a staging sstable. The view-update-generator processes the staging sstable ASYNCHRONOUSLY: it may fire arbitrarily later, even after MV repair has committed repair_time and T_mv has been GC'd from the repaired set. However, the staging processor calls stream_view_replica_updates() which performs a READ-BEFORE-WRITE via as_mutation_source_excluding_staging(): it reads the CURRENT base table state before building the view update. If T_base was written to the base table (as it always is before the base replica can be repaired and the MV tombstone can become GC-eligible), the view_update_builder sees T_base as the existing partition tombstone. D_base's row marker (ts_d < ts_t) is expired by T_base, so the view update is a no-op: D_view is never dispatched to the MV replica. No resurrection can occur regardless of how long staging is delayed. A potential sub-edge-case is T_base being purged BEFORE staging fires (leaving D_base as the sole survivor, so stream_view_replica_updates would dispatch D_view). This is blocked by an additional invariant: for tablet-based tables, the repair writer stamps repaired_at on staging sstables (repair_writer_impl::create_writer sets mark_as_repaired = true and perform_component_rewrite writes repaired_at = sstables_repaired_at + 1 on every staging sstable). After base repair commits sstables_repaired_at to Raft, the staging sstable satisfies is_repaired(sstables_repaired_at, staging_sst) and therefore appears in make_repaired_sstable_set(). Any subsequent base repair that advances sstables_repaired_at further still includes the staging sstable (its repaired_at ≤ new sstables_repaired_at). D_base in the staging sstable thus shadows T_base in every repaired compaction's shadow check, keeping T_base non-purgeable as long as D_base remains in staging. A base table hint also cannot bypass this. A base hint is replayed as a base mutation. The resulting view update is generated synchronously on the base replica and sent to the MV replica via _hints_for_views_manager (path 1 above), not via staging. USING TIMESTAMP with timestamps predating (gc_before + propagation_delay) is explicitly UB and excluded from the safety argument. For tombstone_gc modes other than repair (timeout, immediate, disabled) the invariant does not hold for base tables either, so the full storage-group set is returned. The expected gain is reduced bloom filter and memtable key-lookup I/O during repaired compactions: the unrepaired set is typically the largest (it holds all recent writes), yet for tombstone_gc=repair it never influences GC decisions. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-231. Closes scylladb/scylladb#29310 * github.com:scylladb/scylladb: compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode test/repair: Add tombstone GC safety tests for incremental repair	2026-04-22 10:21:37 +03:00
Avi Kivity	f5eb99f149	test: bump multishard_query_test querier_cache TTL to 60s to avoid flake Three test cases in multishard_query_test.cc set the querier_cache entry TTL to 2s and then assert, between pages of a stateful paged query, that cached queriers are still present (population >= 1) and that time_based_evictions stays 0. The 2s TTL is not load-bearing for what these tests exercise — they are checking the paging-cache handoff, not TTL semantics. But on busy CI runners (SCYLLADB-1642 was observed on aarch64 release), scheduling jitter between saving a reader and sampling the population can exceed 2s. When that happens, the TTL fires, both saved queriers are time-evicted, population drops to 0, and the assertion `require_greater_equal(saved_readers, 1u)` fails. The trailing `require_equal(time_based_evictions, 0)` check never runs because the earlier assertion has already aborted the iteration — which is why the Jenkins failure surfaces only as a bare "C++ failure at seastar_test.cc:93". Reproduced deterministically in test_read_with_partition_row_limits by injecting a `seastar::sleep(2500ms)` between the save and the sample: the hook then reports population=0 inserts=2 drops=0 time_based_evictions=2 resource_based_evictions=0 and the assertion fires — matching the Jenkins symptoms exactly. Bump the TTL to 60s in all three affected tests: - test_read_with_partition_row_limits (confirmed repro for SCYLLADB-1642) - test_read_all (same pattern, same invariants — suspect) - test_read_all_multi_range (same pattern, same invariants — suspect) Leave test_abandoned_read (1s TTL, actually tests TTL-driven eviction) and test_evict_a_shard_reader_on_each_page (tests manual eviction via evict_one(); its TTL is not load-bearing but the fix is deferred for a separate review) unchanged. Fixes: SCYLLADB-1642 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Closes scylladb/scylladb#29564	2026-04-22 09:48:59 +03:00
Tomasz Grabiec	cddde464ca	Merge 'service: Support adding/removing a datacenter with tablets by changing RF' from Aleksandra Martyniuk With this change, you can add or remove a DC(s) in a single ALTER KEYSPACE statement. It requires the keyspace to use rack list replication factor. In existing approach, during RF change all tablet replicas are rebuilt at once. This isn't the case now. In global_topology_request::keyspace_rf_change the request is added to a ongoing_rf_changes - a new column in system.topology table. In a new column in system_schema.keyspaces - next_replication - we keep the target RF. In make_rf_change_plan, load balancer schedules necessary migrations, considering the load of nodes and other pending tablet transitions. Requests from ongoing_rf_changes are processed concurrently, independently from one another. In each request racks are processed concurrently. No tablet replica will be removed until all required replicas are added. While adding replicas to each rack we always start with base tables and won't proceed with views until they are done (while removing - the other way around). The intermediary steps aren't reflected in schema. When the Rf change is finished: - in system_schema.keyspaces: - next_replication is cleared; - new keyspace properties are saved; - request is removed from ongoing_rf_changes; - the request is marked as done in system.topology_requests. Until the request is done, DESCRIBE KEYSPACE shows the replication_v2. If a request hasn't started to remove replicas, it can be aborted using task manager. system.topology_requests::error is set (but the request isn't marked as done) and next_replication = replication_v2. This will be interpreted by load balancer, that will start the rollback of the request. After the rollback is done, we set the relevant system.topology_requests entry as done (failed), clear the request id from system.topology::ongoing_rf_changes, and remove next_replication. Fixes: SCYLLADB-567. No backport needed; new feature. Closes scylladb/scylladb#24421 * github.com:scylladb/scylladb: service: fix indentation docs: update documentation test: test multi RF changes service: tasks: allow aborting ongoing RF changes cql3: allow changing RF by more than one when adding or removing a DC service: handle multi_rf_change service: implement make_rf_change_plan service: add keyspace_rf_change_plan to migration_plan service: extend tablet_migration_info to handle rebuilds service: split update_node_load_on_migration service: rearrange keyspace_rf_change handler db: add columns to system_schema.keyspaces db: service: add ongoing_rf_changes to system.topology gms: add keyspace_multi_rf_change feature	2026-04-22 01:46:11 +02:00
Andrzej Jackowski	b6cb025e9b	test/audit: add reproducer for native-protocol batch not being audited The existing test_batch sends a textual BEGIN BATCH ... APPLY BATCH as a QUERY message, which goes through the CQL parser and raw::batch_statement:: prepare() — a path that correctly sets audit_info. This missed the bug where native-protocol BATCH messages (opcode 0x0D), handled by process_batch_internal in transport/server.cc, construct a batch_statement without setting audit_info, causing audit to silently skip the batch. Add _test_batch_native_protocol which uses the driver's BatchStatement (both unprepared and prepared variants) to exercise this code path. Refs SCYLLADB-1652	2026-04-21 21:52:26 +02:00
Andrzej Jackowski	f5bb9b6282	audit: set audit_info for native-protocol BATCH messages Commit `16b56c2451` ("Audit: avoid dynamic_cast on a hot path") moved audit info into batch_statement via set_audit_info(), but only wired it for the CQL-text BATCH path (raw::batch_statement::prepare()). Native-protocol BATCH messages (opcode 0x0D), handled by process_batch_internal in transport/server.cc, construct a batch_statement without setting audit_info. This causes audit to silently skip the entire batch. Set audit_info on the batch_statement so these batches are audited. Fixes SCYLLADB-1652	2026-04-21 21:52:26 +02:00
Andrzej Jackowski	5f93d57d6e	test/audit: rename internal test methods to avoid CI misdetection The CI heuristic picks up any function named test_* in changed files and tries to run it as a standalone pytest test. The AuditTester class methods (test_batch, test_dml, etc.) are not top-level pytest tests — they are internal helpers called from the actual test functions. Prefix them with underscore so CI does not mistake them for standalone tests.	2026-04-21 21:52:26 +02:00
Dario Mirovic	cf237e060a	test: auth_cluster: use safe_driver_shutdown() for Cluster teardown A handful of cassandra-driver Cluster.shutdown() call sites in the auth_cluster tests were missed by the previous sweep that introduced safe_driver_shutdown(), because the local variable holding the Cluster is named "c" rather than "cluster". Direct Cluster.shutdown() is racy: the driver's "Task Scheduler" thread may raise RuntimeError ("cannot schedule new futures after shutdown") during or after the call, occasionally failing tests. safe_driver_shutdown() suppresses this expected RuntimeError and joins the scheduler thread. Replace the remaining c.shutdown() calls in: - test/cluster/auth_cluster/test_startup_response.py - test/cluster/auth_cluster/test_maintenance_socket.py with safe_driver_shutdown(c) and add the corresponding import from test.pylib.driver_utils. No behavioral change to the tests; only the driver teardown is hardened against a known driver-side race. Fixes SCYLLADB-1662 Closes scylladb/scylladb#29576	2026-04-21 17:45:11 +02:00
Radosław Cybulski	6f7bf30a14	alternator: increase wait time to tablet sync When forcing tablet count change via cql command, the underlying tablet machinery takes some time to adjust. Original code waited at most 0.1s for tablet data to be synchronized. This seems to be not enough on debug builds, so we add exponential backoff and increase maximum waiting time. Now the code will wait 0.1s first time and continue waiting with each time doubling the time, up to maximum of 6 times - or total time ~6s. Fixes: SCYLLADB-1655 Closes scylladb/scylladb#29573	2026-04-21 17:38:07 +02:00
Radosław Cybulski	74b523ea20	treewide: fix spelling errors. Fix various spelling errors. Closes scylladb/scylladb#29574	2026-04-21 18:20:26 +03:00
Piotr Dulikowski	cb8253067d	Merge 'strong_consistency: fix crash when DROP TABLE races with in-flight DML' from Petr Gusev When DROP TABLE races with an in-flight DML on a strongly-consistent table, the node aborts in `groups_manager::acquire_server()` because the raft group has already been erased from `_raft_groups`. A concurrent `DROP TABLE` may have already removed the table from database registries and erased the raft group via `schedule_raft_group_deletion`. The `schema.table()` in `create_operation_ctx()` might not fail though because someone might be holding `lw_shared_ptr<table>`, so that the table is dropped but the table object is still alive. Fix by accepting table_id in acquire_server and checking that the table still exists in the database via `find_column_family` before looking up the raft group. If the table has been dropped, find_column_family throws no_such_column_family instead of the node aborting via on_internal_error. When the table does exist, acquire_server proceeds to acquire state.gate; schedule_raft_group_deletion co_awaits gate::close, so it will wait for the DML operation to complete before erasing the group. backport: not needed (not released feature) Fixes SCYLLADB-1450 Closes scylladb/scylladb#29430 * github.com:scylladb/scylladb: strong_consistency: fix crash when DROP TABLE races with in-flight DML test: add regression test for DROP TABLE racing with in-flight DML	2026-04-21 16:54:20 +02:00
Dario Mirovic	bcda39f716	test: audit: use set diff to identify new audit rows assert_entries_were_added asserted that new audit rows always appear at the tail of each per-node, event_time-sorted sequence. That invariant is not a property of the audit feature: audit writes are asynchronous with respect to query completion, and on a multi-node cluster QUORUM reads of audit.audit_log can reveal a row with an older event_time after a row with a newer one has already been observed. Replace the positional tail slice with a per-node set difference between the rows observed before and after the audited operation. The wait_for retry loop, noise filtering, and final by-value comparison against expected_entries are unchanged, so the test still verifies the real contract, that the expected audit entries appear, without relying on a visibility-ordering invariant that the audit log does not guarantee. Fixes SCYLLADB-1589 Closes scylladb/scylladb#29567	2026-04-21 15:33:36 +02:00
Nadav Har'El	6165124fcc	Merge 'cql3: statement_restrictions: analyze during prepare time' from Avi Kivity The statement_restrictions code is responsible for analyzing the WHERE clause, deciding on the query plan (which index to use), and extracting the partition and clustering keys to use for the index. Currently, it suffers from repetition in making its decisions: there are 15 calls to expr::visit in statement_restrictions.cc, and 14 find_binop calls. This reduces to 2 visits (one nested in the other) and 6 find_binop calls. The analysis of binary operators is done once, then reused. The key data structure introduced is the predicate. While an expression takes inputs from the row evaluated, constants, and bind variables, and produces a boolean result, predicates ask which values for a column (or a number of columns) are needed to satisfy (part of) the WHERE clause. The WHERE clause is then expressed as a conjunction of such predicates. The analyzer uses the predicates to select the index, then uses the predicates to compute the partition and clustering keys. The refactoring is composed of these parts (but patches from different parts are interspersed): 1. an exhaustive regression test is added as the first commit, to ensure behavior doesn't change 2. move computation from query time to prepare time 3. introduce, gradually enrich, and use predicates to implement the statement_restrictions API Major refactoring, and no bugs fixed, so definitely not backporting. Closes scylladb/scylladb#29114 * github.com:scylladb/scylladb: cql3: statement_restrictions: replace has_eq_restriction_on_column with precomputed set cql3: statement_restrictions: replace multi_column_range_accumulator_builder with direct predicate iteration cql3: statement_restrictions: use predicate fields in build_get_clustering_bounds_fn cql3: statement_restrictions: remove extract_single_column_restrictions_for_column cql3: statement_restrictions: use predicate vectors in prepare_indexed_local cql3: statement_restrictions: use predicate vector size for clustering prefix length cql3: statement_restrictions: replace do_find_idx and is_supported_by with predicate-based versions cql3: statement_restrictions: remove expression-based has_supporting_index and index_supports_some_column cql3: statement_restrictions: replace multi-column and PK index support checks with predicate-based versions cql3: statement_restrictions: add predicate-based index support checking cql3: statement_restrictions: use pre-built single-column maps for index support checks cql3: statement_restrictions: build clustering-prefix restrictions incrementally cql3: statement_restrictions: build partition-range restrictions incrementally cql3: statement_restrictions: build clustering-key single-column restrictions map incrementally cql3: statement_restrictions: build partition-key single-column restrictions map incrementally cql3: statement_restrictions: build non-primary-key single-column restrictions map incrementally cql3: statement_restrictions: use tracked has_mc_clustering for _has_multi_column cql3: statement_restrictions: track has-token state incrementally cql3: statement_restrictions: track partition-key-empty state incrementally cql3: statement_restrictions: track first multi-column predicate incrementally cql3: statement_restrictions: track last clustering column incrementally cql3: statement_restrictions: track clustering-has-slice incrementally cql3: statement_restrictions: track has-multi-column-clustering incrementally cql3: statement_restrictions: track clustering-empty state incrementally cql3: statement_restrictions: replace restr bridge variable with pred.filter cql3: statement_restrictions: convert single-column branch to use predicate properties cql3: statement_restrictions: convert multi-column branch to use predicate properties cql3: statement_restrictions: convert constructor loop to iterate over predicates cql3: statement_restrictions: annotate predicates with operator properties cql3: statement_restrictions: annotate predicates with is_not_null and is_multi_column cql3: statement_restrictions: complete preparation early cql3: statement_restrictions: convert expressions to predicates without being directed at a specific column cql3: statement_restrictions: refine possible_lhs_values() function_call processing cql3: statement_restrictions: return nullptr for function solver if not token cql3: statement_restrictions: refine possible_lhs_values() subscript solving cql3: statement_restrictions: return nullptr from possible_lhs_values instead of on_internal_error cql3: statement_restrictions: convert possible_lhs_values into a solver cql3: statement_restrictions: split _where to boolean factors in preparation for predicates conversion cql3: statement_restrictions: refactor IS NOT NULL processing cql3: statement_restrictions: fold add_single_column_nonprimary_key_restriction() into its caller cql3: statement_restrictions: fold add_single_column_clustering_key_restriction() into its caller cql3: statement_restrictions: fold add_single_column_partition_key_restriction() into its caller cql3: statement_restrictions: fold add_token_partition_key_restriction() into its caller cql3: statement_restrictions: fold add_multi_column_clustering_key_restriction() into its caller cql3: statement_restrictions: avoid early return in add_multi_column_clustering_key_restrictions cql3: statement_restrictions: fold add_is_not_restriction() into its caller cql3: statement_restrictions: fold add_restriction() into its caller cql3: statement_restrictions: remove possible_partition_token_values() cql3: statement_restrictions: remove possible_column_values cql3: statement_restrictions: pass schema to possible_column_values() cql3: statement_restrictions: remove fallback path in solve() cql3: statement_restrictions: reorder possible_lhs_column parameters cql3: statement_restrictions: prepare solver for multi-column restrictions cql3: statement_restrictions: add solver for token restriction on index cql3: statement_restrictions: pre-analyze column in value_for() cql3: statement_restrictions: don't handle boolean constants in multi_column_range_accumulator_builder cql3: statement_restrictions: split range_from_raw_bounds into prepare phase and query phase cql3: statement_restrictions: adjust signature of range_from_raw_bounds cql3: statement_restrictions: split multi_column_range_accumulator into prepare-time and query-time phases cql3: statement_restrictions: make get_multi_column_clustering_bounds a builder cql3: statement_restrictions: multi-key clustering restrictions one layer deeper cql3: statement_restrictions: push multi-column post-processing into get_multi_column_clustering_bounds() cql3: statement_restrictions: pre-analyze single-column clustering key restrictions cql3: statement_restrictions: wrap value_for_index_partition_key() cql3: statement_restrictions: hide value_for() cql3: statement_restrictions: push down clustering prefix wrapper one level cql3: statement_restrictions: wrap functions that return clustering ranges cql3: statement_restrictions: do not pass view schema back and forth cql3: statement_restrictions: pre-analyze token range restrictions cql3: statement_restrictions: pre-analyze partition key columns cql3: statement_restrictions: do not collect subscripted partition key columns cql3: statement_restrictions: split _partition_range_restrictions into three cases cql3: statement_restrictions: move value_list, value_set to header file cql3: statement_restrictions: wrap get_partition_key_ranges cql3: statement_restrictions: prepare statement_restrictions for capturing `this` test: statement_restrictions: add index_selection regression test	2026-04-21 15:44:06 +03:00
Anna Stuchlik	d222e6e2a4	doc: document support for OCI Object Storage This commit extends the object storage configuration section with support for OCi object storage. Fixes SCYLLADB-502 Closes scylladb/scylladb#29503	2026-04-21 15:11:58 +03:00
Botond Dénes	cfebe17592	sstables: fix segfault in parse_assert() when message is nullptr parse_assert() accepts an optional `message` parameter that defaults to nullptr. When the assertion fails and message is nullptr, it is implicitly converted to sstring via the sstring(const char*) constructor, which calls strlen(nullptr) -- undefined behavior that manifests as a segfault in __strlen_evex. This turns what should be a graceful malformed_sstable_exception into a fatal crash. In the case of CUSTOMER-279, a corrupt SSTable triggered parse_assert() during streaming (in continuous_data_consumer:: fast_forward_to()), causing a crash loop on the affected node. Fix by guarding the nullptr case with a ternary, passing an empty sstring() when message is null. on_parse_error() already handles the empty-message case by substituting "parse_assert() failed". Fixes: SCYLLADB-1329 Closes scylladb/scylladb#29285	2026-04-21 12:40:33 +02:00
Marcin Maliszkiewicz	935e6a495d	Merge 'transport: add per-service-level cql_requests_serving metric' from Piotr Smaron The existing scylla_transport_requests_serving metric is a single global per-shard gauge counting outstanding CQL requests. When debugging latency spikes, it's useful to know which service level is contributing the most in-flight requests. This PR adds a new per-scheduling-group gauge scylla_transport_cql_requests_serving (with the scheduling_group_name label), using the existing cql_sg_stats per-SG infrastructure. The cql_ prefix is intentional — it follows the convention of all other per-SG transport metrics (cql_requests_count, cql_request_bytes, etc.) and avoids Prometheus confusion with the global requests_serving metric (which lacks the scheduling_group_name label). Fixes: SCYLLADB-1340 New feature, no backport. Closes scylladb/scylladb#29493 * github.com:scylladb/scylladb: transport: add per-service-level cql_requests_serving metric transport: move requests_serving decrement to after response is sent	2026-04-21 12:35:50 +02:00
Aleksandra Martyniuk	cd79b99112	test: fix flaky test_alter_tablets_rf_dc_drop by using read barrier The test was reading system_schema.keyspaces from an arbitrary node that may not have applied the latest schema change yet. Pin the read to a specific node and issue a read barrier before querying, ensuring the node has up-to-date data. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1643. Closes scylladb/scylladb#29563	2026-04-21 09:12:51 +03:00
Raphael S. Carvalho	474e962e01	compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode When tombstone_gc=repair, the repaired compaction view's sstable_set_for_tombstone_gc() previously returned all sstables across all three views (unrepaired, repairing, repaired). This is correct but unnecessarily expensive: the unrepaired and repairing sets are never the source of a GC-blocking shadow when tombstone_gc=repair, for base tables. The key ordering guarantee that makes this safe is: - topology_coordinator sends send_tablet_repair RPC and waits for it to complete. Inside that RPC, mark_sstable_as_repaired() runs on all replicas, moving D from repairing → repaired (repaired_at stamped on disk). - Only after the RPC returns does the coordinator commit repair_time + sstables_repaired_at to Raft. - gc_before = repair_time - propagation_delay only advances once that Raft commit applies. Therefore, when a tombstone T in the repaired set first becomes GC-eligible (its deletion_time < gc_before), any data D it shadows is already in the repaired set on every replica. This holds because: - The memtable is flushed before the repairing snapshot is taken (take_storage_snapshot calls sg->flush()), capturing all data present at repair time. - Hints and batchlog are flushed before the snapshot, ensuring remotely-hinted writes arrive before the snapshot boundary. - Legitimate unrepaired data has timestamps close to 'now', always newer than any GC-eligible tombstone (USING TIMESTAMP to write backdated data is user error / UB). Excluding the repairing and unrepaired sets from the GC shadow check cannot cause any tombstone to be wrongly collected. The memtable check is also skipped for the same reason: memtable data is either newer than the GC-eligible tombstone, or was flushed into the repairing/repaired set before gc_before advanced. Safety restriction — materialized views: The optimization IS applied to materialized view tables. Two possible paths could inject D_view into the MV's unrepaired set after MV repair: view hints and staging via the view-update-generator. Both are safe: (1) View hints: flush_hints() creates a sync point covering BOTH _hints_manager (base mutations) AND _hints_for_views_manager (view mutations). It waits until ALL pending view hints — including D_view entries queued in _hints_for_views_manager while the target MV replica was down — have been replayed to the target node before take_storage_snapshot() is called. D_view therefore lands in the MV's repairing sstable and is promoted to repaired. When a repaired compaction then checks for shadows it finds D_view in the repaired set, keeping T_mv non-purgeable. (2) View-update-generator staging path: Base table repair can write a missing D_base to a replica via a staging sstable. The view-update-generator processes the staging sstable ASYNCHRONOUSLY: it may fire arbitrarily later, even after MV repair has committed repair_time and T_mv has been GC'd from the repaired set. However, the staging processor calls stream_view_replica_updates() which performs a READ-BEFORE-WRITE via as_mutation_source_excluding_staging(): it reads the CURRENT base table state before building the view update. If T_base was written to the base table (as it always is before the base replica can be repaired and the MV tombstone can become GC-eligible), the view_update_builder sees T_base as the existing partition tombstone. D_base's row marker (ts_d < ts_t) is expired by T_base, so the view update is a no-op: D_view is never dispatched to the MV replica. No resurrection can occur regardless of how long staging is delayed. A potential sub-edge-case is T_base being purged BEFORE staging fires (leaving D_base as the sole survivor, so stream_view_replica_updates would dispatch D_view). This is blocked by an additional invariant: for tablet-based tables, the repair writer stamps repaired_at on staging sstables (repair_writer_impl::create_writer sets mark_as_repaired = true and perform_component_rewrite writes repaired_at = sstables_repaired_at + 1 on every staging sstable). After base repair commits sstables_repaired_at to Raft, the staging sstable satisfies is_repaired(sstables_repaired_at, staging_sst) and therefore appears in make_repaired_sstable_set(). Any subsequent base repair that advances sstables_repaired_at further still includes the staging sstable (its repaired_at ≤ new sstables_repaired_at). D_base in the staging sstable thus shadows T_base in every repaired compaction's shadow check, keeping T_base non-purgeable as long as D_base remains in staging. A base table hint also cannot bypass this. A base hint is replayed as a base mutation. The resulting view update is generated synchronously on the base replica and sent to the MV replica via _hints_for_views_manager (path 1 above), not via staging. USING TIMESTAMP with timestamps predating (gc_before + propagation_delay) is explicitly UB and excluded from the safety argument. For tombstone_gc modes other than repair (timeout, immediate, disabled) the invariant does not hold for base tables either, so the full storage-group set is returned. Implementation: - Add compaction_group::is_repaired_view(v): pointer comparison against _repaired_view. - Add compaction_group::make_repaired_sstable_set(): iterates _main_sstables and inserts only sstables classified as repaired (repair::is_repaired(sstables_repaired_at, sst)). - Add storage_group::make_repaired_sstable_set(): collects repaired sstables across all compaction groups in the storage group. - Add table::make_repaired_sstable_set_for_tombstone_gc(): collects repaired sstables from all compaction groups across all storage groups (needed for multi-tablet tables). - Add compaction_group_view::skip_memtable_for_tombstone_gc(): returns true iff the repaired-only optimization is active; used by get_max_purgeable_timestamp() in compaction.cc to bypass the memtable shadow check. - is_tombstone_gc_repaired_only() private helper gates both methods: requires is_repaired_view(this) && tombstone_gc_mode == repair. No is_view() exclusion. - Add error injection "view_update_generator_pause_before_processing" in process_staging_sstables() to support testing the staging-delay scenario. - New test test_tombstone_gc_mv_optimization_safe_via_hints: stops servers[2], writes D_base + T_base (view hints queued for servers[2]'s MV replica), restarts, runs MV tablet repair (flush_hints delivers D_view + T_mv before snapshot), triggers repaired compaction, and asserts the MV row is NOT visible — T_mv preserved because D_view landed in the repaired set via the hints-before-snapshot path. - New test test_tombstone_gc_mv_safe_staging_processor_delay: runs base repair before writing T_base so D_base is staged on servers[0] via row-sync; blocks the view-update-generator with an error injection; writes T_base + T_mv; runs MV repair (fast path, T_mv GC-eligible); triggers repaired compaction (T_mv purged — no D_view in repaired set); asserts no resurrection; releases injection; waits for staging to complete; asserts no resurrection after a second flush+compaction. Demonstrates that the read-before-write in stream_view_replica_updates() makes the optimization safe even when staging fires after T_mv has been GC'd. The expected gain is reduced bloom filter and memtable key-lookup I/O during repaired compactions: the unrepaired set is typically the largest (it holds all recent writes), yet for tombstone_gc=repair it never influences GC decisions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-20 16:59:09 -03:00
Ferenc Szili	a50aa7e689	test/cluster: wait for ready CQL in cross-rack merge test test_tablet_merge_cross_rack_migrations() starts issuing DDL immediately after adding the new cross-rack nodes. In the failing runs the driver is still converging on the updated topology at that point, so the control connection sees incomplete peer metadata while schema changes are in flight. That leaves a race where CREATE TABLE is sent during topology churn and the test can surface a misleading AlreadyExists error even though the table creation has already been committed. Use get_ready_cql(servers) here so the test waits for inter-node visibility and CQL readiness before creating the keyspace and table. Fixes: SCYLLADB-1635 Closes scylladb/scylladb#29561	2026-04-20 20:12:11 +02:00
Łukasz Paszkowski	d18eb9479f	cql/statement: Create keyspace_metadata with correct initial_tablets count In `ks_prop_defs::as_ks_metadata(...)` a default initial tablets count is set to 0, when tablets are enabled and the replication strategy is NetworkReplicationStrategy. This effectively sets _uses_tablets = false in abstract_replication_strategy for the remaining strategies when no `tablets = {...}` options are specified. As a consequence, it is possible to create vnode-based keyspaces even when tablets are enforced with `tablets_mode_for_new_keyspaces`. The patch sets a default initial tablets count to zero regardless of the chosen replication strategy. Then each of the replication strategy validates the options and raises a configuration exception when tablets are not supported. All tests are altered in the following way: + whenever it was correct, SimpleStrategy was replaced with NetworkTopologyStrategy + otherwise, tablets were explicitly disabled with ` AND tablets = {'enabled': false}` Fixes https://github.com/scylladb/scylladb/issues/25340 Closes scylladb/scylladb#25342	2026-04-20 17:57:38 +03:00
Botond Dénes	69c58c6589	Merge 'streaming: add oos protection in mutation based streaming' from Łukasz Paszkowski The mutation-fragment-based streaming path in `stream_session.cc` did not check whether the receiving node was in critical disk utilization mode before accepting incoming mutation fragments. This meant that operations like `nodetool refresh --load-and-stream`, which stream data through the `STREAM_MUTATION_FRAGMENTS` RPC handler, could push data onto a node that had already reached critical disk usage. The file-based streaming path in stream_blob.cc already had this protection, but the load&stream path was missing it. This patch adds a check for `is_in_critical_disk_utilization_mode()` in the `stream_mutation_fragments` handler in `stream_session.cc`, throwing a `replica::critical_disk_utilization_exception` when the node is at critical disk usage. This mirrors the existing protection in the blob streaming path and closes the gap that allowed data to be written to a node that should have been rejecting all incoming writes. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-901 The out of space prevention mechanism was introduced in 2025.4. The fix should be backported there and all later versions. Closes scylladb/scylladb#28873 * github.com:scylladb/scylladb: streaming: reject mutation fragments on critical disk utilization test/cluster/storage: Add a reproducer for load-and-stream out-of-space rejection sstables: clean up TemporaryHashes file in wipe() sstables: add error injection point in write_components test/cluster/storage: extract validate_data_existence to module scope test/cluster: enable suppress_disk_space_threshold_checks in tests using data_file_capacity utils/disk_space_monitor: add error injection to suppress threshold checks	2026-04-20 17:56:36 +03:00
David Garcia	16ed338a89	Fix CODEOWNERS to cover nested docs subfolders The `docs/*` pattern only matches files directly inside `docs/`, not files in nested subfolders like `docs/folder_b/test.md` or `docs/alternator/setup.md`. Those files currently have no code owner assigned. Replace with `/docs/` and `/docs/alternator/` which match the directories and all their subdirectories recursively, per GitHub's CODEOWNERS syntax. Ref: https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners Closes scylladb/scylladb#29521	2026-04-20 17:55:43 +03:00
Avi Kivity	5687a4840d	conf: pair sstable_format=ms with column_index_size_in_kb=1 One of the advantages of Trie indexes (with sstable_format=ms) is that the index is more compact, and more suitable for paging from disk (fewer pages required per search). We can exploit it by setting column_index_size_in_kb to 1 rather than 64, increasing the index file size (and requiring more index pages to be loaded and parsed) in return for smaller data file reads. To test this, I created a 1M row partition with 300-byte rows, compacted it into a single sstable, and tested reads to a single row. With column_index_size_in_kb=64: Rows.db file size 60k 3 pages read from Rows.db (4k each) 2x 32k read from Data.db With column_index_size_in_kb=1: Rows.db file size 2MB (33X) 5 pages read from Rows.db (4k each, 1.7X) 1x 4107 bytes read from Data.db (0.5X IOPS, 0.06X bandwidth) Given that Rows.db will be typically cached, or at least all but one of the levels (its size is 157X smaller than Data.db), we win on both IOPS and bandwidth. I would have expected the the Data.db read to be closer to 1k, but this is already an improvement. Given that, set column_index_size_in_kb=1, but only for new clusters where we also select sstable_format=ms. Raw data (w1, w64 are working directories with different column_index_size_in_kb): ```console $ ls -l w/data/bench/wide_partition-/{Rows,Data}.db -rw-r--r-- 1 avi avi 314964958 Apr 19 16:17 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db -rw-r--r-- 1 avi avi 2001227 Apr 19 16:17 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db -rw-r--r-- 1 avi avi 314963261 Apr 19 16:18 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db -rw-r--r-- 1 avi avi 59989 Apr 19 16:18 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db ``` column_index_size_in_kb=64 trace: ``` cqlsh> SELECT FROM bench.wide_partition WHERE pk = 0 AND ck = 654321 BYPASS CACHE; pk \| ck \| v ----+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 \| 654321 \| 9OXdwmDHRapL2w5YruWLTOtiC3PKbyctSDdQ8YpuPKtWkSYBF10G7bKo2rdnxSAd52HLI21568YM7OwK05B6qAF7X2b6910qsJEA106QBEcFWQVybMCkxkpO4VDRcAVNLRgjB3vygcDBP17GBTb2s7l47UOloy3KtZ7J5YQgKcf7zlFSKGHa49vnRrzoXZCdYexOpix6jcSV2SiwRNqgv6XmYhx43ZwGa4zUtOe0eIKJj7KTxu5bzyWUWGW7US4NLFZRD8Vdb6EasIFkOfVKdiFp2LZHMXGRvtvdF93UTFUb (1 rows) Tracing session: 19219900-3bf3-11f1-bc43-c0a4e62b53d1 activity \| timestamp \| source \| source_elapsed \| client --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+----------------+----------- Execute CQL3 query \| 2026-04-19 16:24:30.992000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0/sl:default] \| 2026-04-19 16:24:30.992643+00:00 \| 127.0.0.1 \| 1 \| 127.0.0.1 Processing a statement for authenticated user: anonymous [shard 0/sl:default] \| 2026-04-19 16:24:30.992738+00:00 \| 127.0.0.1 \| 96 \| 127.0.0.1 Executing read query (reversed false) [shard 0/sl:default] \| 2026-04-19 16:24:30.992765+00:00 \| 127.0.0.1 \| 123 \| 127.0.0.1 Creating read executor for token -3485513579396041028 with all: [cf134ebd-5f1b-4844-94e3-e5c7ad9421f0] targets: [cf134ebd-5f1b-4844-94e3-e5c7ad9421f0] repair decision: NONE [shard 0/sl:default] \| 2026-04-19 16:24:30.992781+00:00 \| 127.0.0.1 \| 139 \| 127.0.0.1 Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0/sl:default] \| 2026-04-19 16:24:30.992782+00:00 \| 127.0.0.1 \| 140 \| 127.0.0.1 read_data: querying locally [shard 0/sl:default] \| 2026-04-19 16:24:30.992795+00:00 \| 127.0.0.1 \| 153 \| 127.0.0.1 Start querying singular range {{-3485513579396041028, pk{000400000000}}} [shard 0/sl:default] \| 2026-04-19 16:24:30.992801+00:00 \| 127.0.0.1 \| 160 \| 127.0.0.1 [reader concurrency semaphore sl:default] admitted immediately [shard 0/sl:default] \| 2026-04-19 16:24:30.992805+00:00 \| 127.0.0.1 \| 163 \| 127.0.0.1 [reader concurrency semaphore sl:default] executing read [shard 0/sl:default] \| 2026-04-19 16:24:30.992814+00:00 \| 127.0.0.1 \| 172 \| 127.0.0.1 Reading key {-3485513579396041028, pk{000400000000}} from sstable w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db [shard 0/sl:default] \| 2026-04-19 16:24:30.992837+00:00 \| 127.0.0.1 \| 195 \| 127.0.0.1 page cache miss: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Partitions.db, page=0, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:24:30.992851+00:00 \| 127.0.0.1 \| 209 \| 127.0.0.1 page cache miss: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=14, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:24:30.995294+00:00 \| 127.0.0.1 \| 2653 \| 127.0.0.1 page cache hit: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=14 [shard 0/sl:default] \| 2026-04-19 16:24:30.995375+00:00 \| 127.0.0.1 \| 2733 \| 127.0.0.1 page cache miss: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=2, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:24:30.995376+00:00 \| 127.0.0.1 \| 2734 \| 127.0.0.1 page cache hit: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=14 [shard 0/sl:default] \| 2026-04-19 16:24:30.995463+00:00 \| 127.0.0.1 \| 2821 \| 127.0.0.1 page cache hit: file=w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Rows.db, page=2 [shard 0/sl:default] \| 2026-04-19 16:24:30.995463+00:00 \| 127.0.0.1 \| 2821 \| 127.0.0.1 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: scheduling bulk DMA read of size 32768 at offset 206057984 [shard 0/sl:default] \| 2026-04-19 16:24:30.995471+00:00 \| 127.0.0.1 \| 2829 \| 127.0.0.1 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: scheduling bulk DMA read of size 32768 at offset 206090752 [shard 0/sl:default] \| 2026-04-19 16:24:30.995475+00:00 \| 127.0.0.1 \| 2833 \| 127.0.0.1 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: finished bulk DMA read of size 32768 at offset 206057984, successfully read 32768 bytes [shard 0/sl:default] \| 2026-04-19 16:24:30.995586+00:00 \| 127.0.0.1 \| 2945 \| 127.0.0.1 Page stats: 1 partition(s) (1 live, 0 dead), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 1 cell(s) (1 live, 0 dead) [shard 0/sl:default] \| 2026-04-19 16:24:30.995637+00:00 \| 127.0.0.1 \| 2995 \| 127.0.0.1 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: finished bulk DMA read of size 32768 at offset 206090752, successfully read 32768 bytes [shard 0/sl:default] \| 2026-04-19 16:24:30.995645+00:00 \| 127.0.0.1 \| 3003 \| 127.0.0.1 Querying is done [shard 0/sl:default] \| 2026-04-19 16:24:30.995653+00:00 \| 127.0.0.1 \| 3012 \| 127.0.0.1 Done processing - preparing a result [shard 0/sl:default] \| 2026-04-19 16:24:30.995670+00:00 \| 127.0.0.1 \| 3028 \| 127.0.0.1 Request complete \| 2026-04-19 16:24:30.995039 \| 127.0.0.1 \| 3039 \| 127.0.0.1 w64/data/bench/wide_partition-69d6adb03bf111f1865f3b0b343d3479/ms-3gzp_10y7_514282x1o2bojimy0q-big-Data.db: scheduling bulk DMA read of size 32768 at offset 206090752 [shard 0/sl:default] \| 2026-04-19 16:22:43.107215+00:00 \| 127.0.0.1 \| 8685 \| 127.0.0.1 ``` column_index_size_in_kb=1 trace: ``` cqlsh> SELECT * FROM bench.wide_partition WHERE pk = 0 AND ck = 654321 BYPASS CACHE; pk \| ck \| v ----+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 0 \| 654321 \| FIA7X52ZqYwvDxEGlmWJUSy1I94WTuWZTdLwXr9HBQ90RJLqYKr5nInTADSI6hzofwawaXphAQK07YMoyzFfRaGeKPQPKUb35XpLEGvLJ4xu9r4es8wUEHPXaFBGdMcWUkyDJSTYCFzZAPCzUHEuPJHMXVrI6UExWrIR0Xujg4GZa9UciU9rbEvrSBwSzoPEfbXJ6qZSGiTD8gcXz5kdAblLxsAeWug8tZqslsTu04HMLKfZ8WopQvHbpR6YlGSnM99CiBgz30LMmllULV4VA4u9kMpzsRV2IE2tKmJOddEl (1 rows) Tracing session: 3953a1f0-3bf3-11f1-b976-4a3dc2a7a57f activity \| timestamp \| source \| source_elapsed \| client -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------+-----------+----------------+----------- Execute CQL3 query \| 2026-04-19 16:25:25.007000 \| 127.0.0.1 \| 0 \| 127.0.0.1 Parsing a statement [shard 0/sl:default] \| 2026-04-19 16:25:25.007423+00:00 \| 127.0.0.1 \| 1 \| 127.0.0.1 Processing a statement for authenticated user: anonymous [shard 0/sl:default] \| 2026-04-19 16:25:25.007511+00:00 \| 127.0.0.1 \| 89 \| 127.0.0.1 Executing read query (reversed false) [shard 0/sl:default] \| 2026-04-19 16:25:25.007536+00:00 \| 127.0.0.1 \| 114 \| 127.0.0.1 Creating read executor for token -3485513579396041028 with all: [e7bd75e7-6d2a-46dc-9f66-430524f40e0d] targets: [e7bd75e7-6d2a-46dc-9f66-430524f40e0d] repair decision: NONE [shard 0/sl:default] \| 2026-04-19 16:25:25.007551+00:00 \| 127.0.0.1 \| 129 \| 127.0.0.1 Creating never_speculating_read_executor - speculative retry is disabled or there are no extra replicas to speculate with [shard 0/sl:default] \| 2026-04-19 16:25:25.007553+00:00 \| 127.0.0.1 \| 131 \| 127.0.0.1 read_data: querying locally [shard 0/sl:default] \| 2026-04-19 16:25:25.007556+00:00 \| 127.0.0.1 \| 134 \| 127.0.0.1 Start querying singular range {{-3485513579396041028, pk{000400000000}}} [shard 0/sl:default] \| 2026-04-19 16:25:25.007562+00:00 \| 127.0.0.1 \| 139 \| 127.0.0.1 [reader concurrency semaphore sl:default] admitted immediately [shard 0/sl:default] \| 2026-04-19 16:25:25.007564+00:00 \| 127.0.0.1 \| 142 \| 127.0.0.1 [reader concurrency semaphore sl:default] executing read [shard 0/sl:default] \| 2026-04-19 16:25:25.007573+00:00 \| 127.0.0.1 \| 151 \| 127.0.0.1 Reading key {-3485513579396041028, pk{000400000000}} from sstable w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db [shard 0/sl:default] \| 2026-04-19 16:25:25.007594+00:00 \| 127.0.0.1 \| 172 \| 127.0.0.1 page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Partitions.db, page=0, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:25:25.007607+00:00 \| 127.0.0.1 \| 184 \| 127.0.0.1 page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=488, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:25:25.016029+00:00 \| 127.0.0.1 \| 8607 \| 127.0.0.1 page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=488 [shard 0/sl:default] \| 2026-04-19 16:25:25.016109+00:00 \| 127.0.0.1 \| 8687 \| 127.0.0.1 page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=486, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:25:25.016111+00:00 \| 127.0.0.1 \| 8688 \| 127.0.0.1 page cache miss: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=285, readahead=1 [shard 0/sl:default] \| 2026-04-19 16:25:25.016176+00:00 \| 127.0.0.1 \| 8754 \| 127.0.0.1 page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=488 [shard 0/sl:default] \| 2026-04-19 16:25:25.016260+00:00 \| 127.0.0.1 \| 8838 \| 127.0.0.1 page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=486 [shard 0/sl:default] \| 2026-04-19 16:25:25.016261+00:00 \| 127.0.0.1 \| 8839 \| 127.0.0.1 page cache hit: file=w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Rows.db, page=285 [shard 0/sl:default] \| 2026-04-19 16:25:25.016261+00:00 \| 127.0.0.1 \| 8839 \| 127.0.0.1 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db: scheduling bulk DMA read of size 4107 at offset 206086656 [shard 0/sl:default] \| 2026-04-19 16:25:25.016268+00:00 \| 127.0.0.1 \| 8846 \| 127.0.0.1 w1/data/bench/wide_partition-e0b436a03bf111f18587cc3d55b31baf/ms-3gzp_10x9_373io213ox3uf4irhr-big-Data.db: finished bulk DMA read of size 4107 at offset 206086656, successfully read 4608 bytes [shard 0/sl:default] \| 2026-04-19 16:25:25.016340+00:00 \| 127.0.0.1 \| 8918 \| 127.0.0.1 Page stats: 1 partition(s) (1 live, 0 dead), 0 static row(s) (0 live, 0 dead), 1 clustering row(s) (1 live, 0 dead), 0 range tombstone(s) and 1 cell(s) (1 live, 0 dead) [shard 0/sl:default] \| 2026-04-19 16:25:25.016367+00:00 \| 127.0.0.1 \| 8945 \| 127.0.0.1 Querying is done [shard 0/sl:default] \| 2026-04-19 16:25:25.016385+00:00 \| 127.0.0.1 \| 8963 \| 127.0.0.1 Done processing - preparing a result [shard 0/sl:default] \| 2026-04-19 16:25:25.016401+00:00 \| 127.0.0.1 \| 8979 \| 127.0.0.1 Request complete \| 2026-04-19 16:25:25.015989 \| 127.0.0.1 \| 8989 \| 127.0.0.1 ``` Closes scylladb/scylladb#29552	2026-04-20 17:53:56 +03:00
Marcin Maliszkiewicz	c136b2e640	audit: drop sstring temporaries on the will_log() fast path audit::will_log() is called for every CQL/Alternator request. With non-empty keyspace it does: _audited_keyspaces.find(sstring(keyspace)) should_log_table(sstring(keyspace), sstring(table)) constructing three temporary sstrings from the std::string_view arguments on every call. Now that the underlying associative containers use std::less<> as comparator (previous commit), find() accepts the string_view directly. Switch should_log_table() to take string_view as well so the temporaries disappear entirely. For short keyspace names the temporaries stay in SSO so allocs/op is unchanged at 58.1, but each construction still costs ~60 instructions. perf-simple-query --smp 1 --duration 15 --audit "table" --audit-keyspaces "ks-non-existing" --audit-categories "DCL,DDL,AUTH,DML,QUERY" build: --mode=release --use-profile="" (no PGO) Before (regression introduced in `9646ee05bd`): instructions_per_op: 36952 After: instructions_per_op: 36768 Brings insns/op back to the pre-regression baseline `3d0582d51e` (insns/op ~36777) within the per-run noise of ~15 insns standard deviation, eliminating the ~180 insns/op regression. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1616	2026-04-20 15:18:22 +02:00
Marcin Maliszkiewicz	724b9e66ea	audit: enable heterogeneous lookup on audited keyspaces/tables Replace the bare std::set<sstring>/std::map<sstring, std::set<sstring>> member types with named aliases that use std::less<> as the comparator. The transparent comparator enables heterogeneous lookup with string_view keys. This commit is a pure refactor with no behavioral change: the parser return types, constructor parameters, observer template instantiations, and start_audit() locals are all updated to use the aliases.	2026-04-20 15:14:58 +02:00
Marcin Maliszkiewicz	9f11920b15	Merge 'alternator: fix remaining problems with new Stream ARN format' from Nadav Har'El This small series includes a few followups to the patch that changed Alternator Stream ARNs from using our own UUID format to something that resembles Amazon's Stream ARNs (and the KCL library won't reject as bogus-looking ARNs). The first patch is the most important one, fixing ListStreams's LastEvaluatedStreamArn to also use the new ARN format. It fixes SCYLLADB-539. The following patches are additional cleanups and tests for the new ARN code. Closes scylladb/scylladb#29474 * github.com:scylladb/scylladb: alternator: fix ListStreams paging if table is deleted during paging test/alternator: test DescribeStream on non-existent table alternator: ListStreams: on last page, avoid LastEvaluatedStreamArn alternator: remove dead code stream_shard_id alternator: fix ListStreams to return real ARN as LastEvaluatedStreamArn	2026-04-20 14:42:28 +02:00
Raphael S. Carvalho	a50e6215aa	test/repair: Add tombstone GC safety tests for incremental repair Add three cluster tests that verify no data resurrection occurs when tombstone GC runs on the repaired sstable set under incremental repair with tombstone_gc=repair mode. All tests use propagation_delay_in_seconds=0 so that tombstones become GC-eligible immediately after repair_time is committed (gc_before = repair_time), allowing the scenarios to exercise the actual GC eligibility path without artificial sleeps. (test_tombstone_gc_no_resurrection_basic_ordering) Data D (ts=1) and tombstone T (ts=2) are written to all replicas and flushed before repair. Repair captures both in the repairing snapshot and promotes them to repaired. Once repair_time is committed, T is GC-eligible (T.deletion_time < gc_before = repair_time). The test verifies that compaction on the repaired set does NOT purge T, because D is already in repaired (mark_sstable_as_repaired() completes on all replicas before repair_time is committed to Raft) and clamps max_purgeable to D.timestamp=1 < T.timestamp=2. (test_tombstone_gc_no_resurrection_hints_flush_failure) The repair_flush_hints_batchlog_handler_bm_uninitialized injection causes hints flush to fail on one node. When hints flush fails, flush_time stays at gc_clock::time_point{} (epoch). This propagates as repair_time=epoch committed to system.tablets, so gc_before = epoch - propagation_delay is effectively the minimum possible time. No tombstone has a deletion_time older than epoch, so T is never GC-eligible from this repair. The test verifies that repair_time does not advance to a meaningful value after a failed hints flush, and that compaction on the repaired set does not purge T (key remains deleted, no resurrection). (test_tombstone_gc_no_resurrection_propagation_delay) Simulates a write D carrying an old CQL USING TIMESTAMP (ts_d = now-2h) that was stored as a hint while a replica was down, and a tombstone T with a higher timestamp (ts_t = now-90min, ts_t > ts_d) that was written to all live replicas. After the replica restarts, repair flushes hints synchronously before taking the repairing snapshot, guaranteeing D is delivered and captured in repairing before the snapshot. After mark_sstable_as_repaired() promotes D to repaired, the coordinator commits repair_time. gc_before = repair_time > T.deletion_time so T is GC-eligible. The test verifies that compaction on the repaired set does NOT purge T: D (ts_d < ts_t) is already in repaired, clamping max_purgeable = ts_d < ts_t = T.timestamp, so T is not purgeable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-20 09:09:39 -03:00
Wojciech Mitros	6011cb8a4c	db/view: track range tombstones in update stream during view update building The view update builder ignored range tombstone changes from the update stream when there all existing mutation fragments were already consumed. The old code assumed range tombstones 'remove nothing pre-existing, so we can ignore it', but this failed to update _update_current_tombstone. Consequently, when a range delete and an insert within that range appeared in the same batch, the range tombstone was not applied to the inserted row, or was applied to a row outside the range that it covered causing it to incorrectly survive/be deleted in the materialized view. Fix by handling is_range_tombstone_change() fragments in the update-only branch, updating _update_current_tombstone so subsequent clustering rows correctly have the range tombstone applied to them. Fixes SCYLLADB-1555 Closes scylladb/scylladb#29483	2026-04-20 13:38:52 +02:00
Wojciech Mitros	073710a661	view: apply existing range tombstones after exhausting the update reader When view_update_builder::on_results() hits the path where the update fragment reader is already exhausted, it still needs to keep tracking existing range tombstones and apply them to encountered rows. Otherwise a row covered by an existing range tombstone can appear alive while generating the view update and create a spurious view row. Update the existing tombstone state even on the exhausted-reader path and apply the effective tombstone to clustering rows before generating the row tombstone update. Add a cqlpy regression test covering the partition-delete-after-range-tombstone case. Fixes: SCYLLADB-1554 Closes scylladb/scylladb#29481	2026-04-20 13:29:05 +02:00
Dario Mirovic	40740104ab	test: use DROP KEYSPACE IF EXISTS in new_test_keyspace cleanup The new_test_keyspace context manager in test/cluster/util.py uses DROP KEYSPACE without IF EXISTS during cleanup. The Python driver has a known bug (scylladb/python-driver#317) where connection pool renewal after concurrent node bootstraps causes double statement execution. The DROP succeeds server-side, but the response is lost when the old pool is closed. The driver retries on the new pool, and gets ConfigurationException message "Cannot drop non existing keyspace". The CREATE KEYSPACE in create_new_test_keyspace already uses IF NOT EXISTS as a workaround for the same driver bug. This patch applies the same approach to fix DROP KEYSPACE. Fixes SCYLLADB-1538 Closes scylladb/scylladb#29487	2026-04-20 12:51:17 +02:00
Botond Dénes	ad7647c3c7	test/commitlog: reduce resource usage in test_commitlog_handle_replayed_segments The test was using max_size_mb = 8*1024 (8 GB) with 100 iterations, causing it to create up to 260 files of 32 MB each per iteration via fallocate. On a loaded CI machine this totals hundreds of GB of file operations, easily exceeding the 15-minute test timeout (SCYLLADB-1496). The test only needs enough files to verify that delete_segments keeps the disk footprint within [shard_size, shard_size + seg_size]. Reduce max_size_mb to 128 (8 files of 32 MB per iteration) and the iteration count to 10, which is sufficient to exercise the serialized-deletion and recycle logic without imposing excessive I/O load. Closes scylladb/scylladb#29510	2026-04-20 11:02:25 +03:00
Ernest Zaslavsky	e5e6608f20	sstables_loader: prevent use-after-free on table drop during streaming sstables_loader::load_and_stream holds a replica::table& reference via the sstable_streamer for the entire streaming operation. If the table is dropped concurrently (e.g. DROP TABLE or DROP KEYSPACE), the reference becomes dangling and the next access crashes with SEGV. This was observed in a longevity-50gb-12h-master test run where a keyspace was dropped while load_and_stream was still streaming SSTables from a previous batch. Fix by acquiring a stream_in_progress() phaser guard in load_and_stream before creating the streamer. table::stop() calls _pending_streams_phaser.close() which blocks until all outstanding guards are released, keeping the table alive for the duration of the streaming operation. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1352 Closes scylladb/scylladb#29403	2026-04-20 07:39:51 +03:00
Benny Halevy	34adb0e069	test/cluster/dtest: fix test_scrub_static_table flakiness Pass jvm_args=["--smp", "1"] on both cluster.start() calls to ensure consistent shard count across restarts, avoiding resharding on restart. Also pass wait_for_binary_proto=True to cluster.start() to ensure the CQL port is ready before connecting. Fixes: SCYLLADB-824 Closes scylladb/scylladb#29548	2026-04-20 06:53:49 +03:00
Piotr Szymaniak	378bcd69e3	tree: add AGENTS.md router and improve AI instruction files Add AGENTS.md as a minimal router that directs AI agents to the relevant instruction files based on what they are editing. Improve the instruction files: - cpp.instructions.md: clarify seastarx.hh scope (headers, not "many files"), explain std::atomic restriction (single-shard model, not "blocking"), scope macros prohibition to new ad-hoc only, add coroutine exception propagation pattern, add invariant checking section preferring throwing_assert() over SCYLLA_ASSERT (issue #7871) - python.instructions.md: demote PEP 8 to fallback after local style, clarify that only wildcard imports are prohibited - copilot-instructions.md: show configure.py defaults to dev mode, add frozen toolchain section, clarify --no-gather-metrics applies to test.py, fix Python test paths to use .py extension, add license header guidance for new files Closes scylladb/scylladb#29023	2026-04-19 21:59:52 +03:00
Dario Mirovic	f77ff28081	test: manager_client: use safe_driver_shutdown for exclusive_clusters Using cluster.shutdown() is an incorrect way to shut down a Cassandra Cluster. The correct way is using safe_driver_shutdown. Fixes SCYLLADB-1434 Closes scylladb/scylladb#29390	2026-04-19 21:31:18 +03:00
Avi Kivity	d584bd7358	cql3: statement_restrictions: replace has_eq_restriction_on_column with precomputed set has_eq_restriction_on_column() walked expression trees at prepare time to find binary_operators with op==EQ that mention a given column on the LHS. Its only caller is ORDER BY validation in select_statement, which checks that clustering columns without an explicit ordering have an EQ restriction. Replace the 50-line expression-walking free function with a precomputed unordered_set<const column_definition*> (_columns_with_eq) populated during the main predicate loop in analyze_statement_restrictions. For single-column EQ predicates the column is taken from on_column; for multi-column EQ like (ck1, ck2) = (1, 2), all columns in on_clustering_key_prefix are included. The member function becomes a single set::contains() call.	2026-04-19 20:57:09 +03:00
Avi Kivity	b7f86eaabc	cql3: statement_restrictions: replace multi_column_range_accumulator_builder with direct predicate iteration build_get_multi_column_clustering_bounds_fn() used expr::visit() to dispatch each restriction through a 15-handler visitor struct. Only the binary_operator handler did real work; the conjunction handler just recursed, and the remaining 13 handlers were dead-code on_internal_error calls (the filter expression of each predicate is always a binary_operator). Replace the visitor with a loop over predicates that does as<binary_operator>(pred.filter) directly, building the same query-time lambda inline. Promote intersect_all() and process_in_values() from static methods of the deleted struct to free functions in the anonymous namespace -- they are still called from the query-time lambda.	2026-04-19 20:57:09 +03:00
Avi Kivity	ece9af229d	cql3: statement_restrictions: use predicate fields in build_get_clustering_bounds_fn Replace find_binop(..., is_multi_column) with pred.is_multi_column in build_get_clustering_bounds_fn() and add_clustering_restrictions_to_idx_ck_prefix(). Replace is_clustering_order(binop) with pred.order == comparison_order::clustering and iterate predicates directly instead of extracting filter expressions. Remove the now-dead is_multi_column() free function.	2026-04-19 20:57:09 +03:00
Avi Kivity	72da1207d7	cql3: statement_restrictions: remove extract_single_column_restrictions_for_column The previous commit made prepare_indexed_local() use the pre-built predicate vectors instead of calling extract_single_column_restrictions_for_column(). That was the last production caller. Remove the function definition (65 lines of expression-walking visitor) and its declaration/doc-comment from the header. Replace the unit test (expression_extract_column_restrictions) which directly called the removed function with synthetic column_definitions, with per_column_restriction_routing which exercises the same routing logic through the public analyze_statement_restrictions() API. The new test verifies not just factor counts but the exact (column_name, oper_t) pairs in each per-column entry, catching misrouted restrictions that a count-only check would miss.	2026-04-19 20:57:09 +03:00
Avi Kivity	b093477cf7	cql3: statement_restrictions: use predicate vectors in prepare_indexed_local Replace the extract_single_column_restrictions_for_column(_where, ...) call in prepare_indexed_local() with a direct lookup in the pre-built predicate vectors. The old code walked the entire WHERE expression tree to extract binary operators mentioning the indexed column, wrapped them in a conjunction, translated column definitions to the index schema, then called to_predicate_on_column() which walked the expression again to convert back to predicates. The new code selects the appropriate predicate vector map (PK, CK, or non-PK) based on the indexed column's kind, looks up the column's predicates directly, applies replace_column_def to each, and folds them with make_conjunction -- producing the same result without any expression tree walks. This removes the last production caller of extract_single_column_restrictions_for_column (unit tests in statement_restrictions_test.cc still exercise it).	2026-04-19 20:57:09 +03:00
Avi Kivity	a725e39218	cql3: statement_restrictions: use predicate vector size for clustering prefix length Replace the body of num_clustering_prefix_columns_that_need_not_be_filtered() with a single return of _clustering_prefix_restrictions.size(). The old implementation called get_single_column_restrictions_map() to rebuild a per-column map from the clustering expression tree, then iterated it in schema order counting columns until it hit a gap, a needs-filtering predicate, or a slice. But _clustering_prefix_restrictions is already built with exactly that same logic during the constructor (lines 1234-1248): it iterates CK columns in schema order, appending predicates until it encounters a gap in column_id, a predicate that needs_filtering, or a slice -- at which point it stops. So the vector's size is, by construction, the answer to the same question the old code was re-deriving at query time. This makes four helper functions dead code: - get_single_column_restrictions_map(): walked the expression tree to build a map<column_definition*, expression> of per-column restrictions. Was a ~15-line function that called get_sorted_column_defs() and extract_single_column_restrictions_for_column() for each column. - get_the_only_column(): extracted the single column_value from a restriction expression, asserting it was single-column. Called by the old loop body. - is_single_column_restriction(): thin wrapper around get_single_column_restriction_column(). - get_single_column_restriction_column(): ~25-line function that walked an expression tree with for_each_expression<column_value> to determine whether all column_value nodes refer to the same column. Called by the above two. Remove all four functions and their forward declarations (-95 lines).	2026-04-19 20:57:08 +03:00
Avi Kivity	68c2e292ac	cql3: statement_restrictions: replace do_find_idx and is_supported_by with predicate-based versions Convert do_find_idx() from a member function that walks expression trees via index_restrictions()/for_each_expression/extract_single_column_restrictions to a static free function that iterates index_search_group spans using are_predicates_supported_by(). Convert calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index() to use predicate vectors instead of expression-based is_supported_by(). Remove now-dead code: is_supported_by(), is_supported_by_helper(), score() member function, and do_find_idx() member function.	2026-04-19 20:57:08 +03:00
Avi Kivity	c42397e995	cql3: statement_restrictions: remove expression-based has_supporting_index and index_supports_some_column These functions are no longer called now that all index support checks in the constructor use predicate-based alternatives. The expression-based is_supported_by and is_supported_by_helper are still needed by choose_idx() and calculate_column_defs_for_filtering_and_erase_restrictions_used_for_index().	2026-04-19 20:57:08 +03:00
Avi Kivity	1aafe0708a	cql3: statement_restrictions: replace multi-column and PK index support checks with predicate-based versions Replace clustering_columns_restrictions_have_supporting_index(), multi_column_clustering_restrictions_are_supported_by(), get_clustering_slice(), and partition_key_restrictions_have_supporting_index() with predicate-based equivalents that use the already-accumulated mc_ck_preds and sc_pk_pred_vectors locals. The new multi_column_predicates_have_supporting_index() checks each multi-column predicate's columns list directly against indexes, avoiding expression tree walks through find_in_expression and bounds_slice.	2026-04-19 20:57:08 +03:00
Avi Kivity	fa6f239cc7	cql3: statement_restrictions: add predicate-based index support checking Add `op` and `is_subscript` fields to `struct predicate` and populate them in all predicate creation sites in `to_predicates()`. These fields record the binary operator and whether the LHS is a subscript (map element access), which are the two pieces of information needed to query index support. Add `is_predicate_supported_by()` which mirrors `is_supported_by_helper()` but operates on a single predicate's fields instead of walking the expression tree. Add a predicate-vector overload of `index_supports_some_column()` and use it in the constructor to replace expression-based index support checks for single-column partition key, clustering key, and non-primary-key restrictions. The multi-column clustering key case still uses the existing expression-based path.	2026-04-19 20:57:08 +03:00
Avi Kivity	25ba3bd649	cql3: statement_restrictions: use pre-built single-column maps for index support checks Replace index_supports_some_column(expression, ...) with index_supports_some_column(single_column_restrictions_map, ...) to eliminate get_single_column_restrictions_map() tree walks when checking index support. The three call sites now use the maps already built incrementally in the constructor loop: _single_column_nonprimary_key_restrictions, _single_column_clustering_key_restrictions, and _single_column_partition_key_restrictions. Also replace contains_multi_column_restriction() tree walk in clustering_columns_restrictions_have_supporting_index() with _has_multi_column.	2026-04-19 20:57:08 +03:00
Avi Kivity	fab90224b3	cql3: statement_restrictions: build clustering-prefix restrictions incrementally Replace the extract_clustering_prefix_restrictions() tree walk with incremental collection during the main loop. Two new locals -- mc_ck_preds and sc_ck_preds -- accumulate multi-column and single-column clustering key predicates respectively. A short post-loop block computes the longest contiguous prefix from sc_ck_preds (or uses mc_ck_preds directly for multi-column), replacing the removed function. Also remove the now-unused to_predicate_on_clustering_key_prefix(), with_current_binary_operator() helper, and the visitor_with_binary_operator_context concept.	2026-04-19 20:57:08 +03:00
Avi Kivity	3bd308986a	cql3: statement_restrictions: build partition-range restrictions incrementally Replace the extract_partition_range() tree walk with incremental collection during the main loop. Two new locals before the loop -- token_pred and pk_range_preds -- accumulate token and single-column EQ/IN partition key predicates respectively. A short post-loop block materializes _partition_range_restrictions from these locals, replacing the removed function. This removes the last tree walk over partition-key restrictions.	2026-04-19 20:57:08 +03:00
Avi Kivity	db28411548	cql3: statement_restrictions: build clustering-key single-column restrictions map incrementally Instead of accumulating all clustering-key restrictions into a conjunction tree and then decomposing it by column via get_single_column_restrictions_map() post-loop, build the per-column map incrementally as each single-column clustering-key predicate is processed. The post-loop guard (!has_mc_clustering) is no longer needed: multi-column predicates go through the is_multi_column branch and never insert into this map, and mixing multi with single-column is rejected with an exception. This eliminates a post-loop tree walk over _clustering_columns_restrictions.	2026-04-19 20:57:08 +03:00
Avi Kivity	a4608804d8	cql3: statement_restrictions: build partition-key single-column restrictions map incrementally Instead of accumulating all partition-key restrictions into a conjunction tree and then decomposing it by column via get_single_column_restrictions_map() post-loop, build the per-column map incrementally as each single-column partition-key predicate is processed. The post-loop guard (!has_token_restrictions()) is no longer needed: token predicates go through the on_partition_key_token branch and never insert into this map, and mixing token with non-token is rejected with an exception. This eliminates a post-loop tree walk over _partition_key_restrictions.	2026-04-19 20:57:08 +03:00
Avi Kivity	e9b16a11ba	cql3: statement_restrictions: build non-primary-key single-column restrictions map incrementally Instead of accumulating all non-primary-key restrictions into a conjunction tree and then decomposing it by column via get_single_column_restrictions_map() post-loop, build the per-column map incrementally as each non-primary-key predicate is processed. This eliminates a post-loop tree walk over _nonprimary_key_restrictions.	2026-04-19 20:57:08 +03:00
Avi Kivity	701366a8d1	cql3: statement_restrictions: use tracked has_mc_clustering for _has_multi_column Replace the two post-loop find_binop(_clustering_columns_restrictions, is_multi_column) tree walks and the contains_multi_column_restriction() tree walk with the already-tracked local has_mc_clustering. The redundant second assignment inside the _check_indexes block is removed entirely.	2026-04-19 20:57:08 +03:00
Avi Kivity	da438507d0	cql3: statement_restrictions: track has-token state incrementally Replace the two in-loop calls to has_token_restrictions() (which walks the _partition_key_restrictions expression tree looking for token function calls) with a local bool has_token, set to true when a token predicate is processed. The member function is retained since it's used outside the constructor. With this change, the constructor loop's non-error control flow performs zero expression tree scanning. The only remaining tree walks are on error paths (get_sorted_column_defs, get_columns_in_commons for formatting exception messages) and structural (make_conjunction for building accumulated expressions).	2026-04-19 20:57:07 +03:00
Avi Kivity	1344278a19	cql3: statement_restrictions: track partition-key-empty state incrementally Replace the in-loop call to partition_key_restrictions_is_empty() (which walks the _partition_key_restrictions expression tree via is_empty_restriction()) with a local bool pk_is_empty, set to false at the two sites where partition key restrictions are added. The member function is retained since it's used outside the constructor.	2026-04-19 20:57:07 +03:00
Avi Kivity	14812ea1e0	cql3: statement_restrictions: track first multi-column predicate incrementally Replace find_in_expression<binary_operator>(_clustering_columns_restrictions, always_true), which walks the accumulated expression tree to find the first binary_operator, with a tracked pointer first_mc_pred set when the first multi-column predicate is added. This eliminates the tree scan, the null check, and the is_lower_bound/is_upper_bound lambdas, replacing them with direct predicate field accesses: first_mc_pred->order, first_mc_pred->is_lower_bound, first_mc_pred->is_upper_bound, and first_mc_pred->filter for error messages.	2026-04-19 20:57:07 +03:00
Avi Kivity	ef005c10ba	cql3: statement_restrictions: track last clustering column incrementally Replace get_last_column_def(_clustering_columns_restrictions), which walks the entire accumulated expression tree to collect and sort all column definitions, with a local pointer ck_last_column that tracks the column with the highest schema position as single-column clustering restrictions are added.	2026-04-19 20:57:07 +03:00
Avi Kivity	88bd5ea1b7	cql3: statement_restrictions: track clustering-has-slice incrementally Replace has_slice(_clustering_columns_restrictions), which walks the accumulated expression tree looking for slice operators, with a local bool ck_has_slice set when any clustering predicate with is_slice is added. Updated at all three clustering insertion points: multi-column first assignment, multi-column slice conjunction, and single-column conjunction.	2026-04-19 20:57:07 +03:00
Avi Kivity	1071c39f17	cql3: statement_restrictions: track has-multi-column-clustering incrementally Replace find_binop(_clustering_columns_restrictions, is_tuple_constructor), which walks the accumulated expression tree looking for multi-column restrictions, with a local bool has_mc_clustering set when a multi-column predicate is first added. This serves both the multi-column branch (checking existing restrictions are also multi-column) and the single-column branch (checking no multi-column restrictions exist).	2026-04-19 20:57:07 +03:00
Avi Kivity	aa6a0ad326	cql3: statement_restrictions: track clustering-empty state incrementally Replace is_empty_restriction(_clustering_columns_restrictions), which recursively walks the accumulated expression tree, with a local bool ck_is_empty that is set to false when a clustering restriction is first added. Updated at both insertion points: multi-column first assignment and single-column make_conjunction.	2026-04-19 20:57:07 +03:00
Avi Kivity	d4ff613c0a	cql3: statement_restrictions: replace restr bridge variable with pred.filter The constructor loop no longer needs to extract a binary_operator reference from each predicate. All remaining uses (make_conjunction, get_columns_in_commons, assignment to accumulated restriction members, _where.push_back, and error formatting) accept expression directly, which is what pred.filter already is. This eliminates the unnecessary as<binary_operator> cast at the top of the loop.	2026-04-19 20:57:07 +03:00
Avi Kivity	44b18f3399	cql3: statement_restrictions: convert single-column branch to use predicate properties In the single-column partition-key and clustering-key sub-branches, replace direct binary_operator field inspections with pre-computed predicate booleans: !pred.equality && !pred.is_in instead of restr.op != EQ && restr.op != IN, pred.is_in instead of find(restr, IN), and pred.is_slice instead of has_slice(restr). Also fix a leftover restr.order in the multi-column branch error message.	2026-04-19 20:57:07 +03:00
Avi Kivity	b0c5eed384	cql3: statement_restrictions: convert multi-column branch to use predicate properties Replace direct operator comparisons with predicate boolean fields: pred.equality, pred.is_in, pred.is_slice, pred.is_lower_bound, pred.is_upper_bound, and pred.order.	2026-04-19 20:57:07 +03:00
Avi Kivity	afd68187ea	cql3: statement_restrictions: convert constructor loop to iterate over predicates Convert the constructor loop to first build predicates from the prepared where clause, then iterate over the predicates. The IS_NOT branch now uses pred.is_not_null_single_column and pred.on instead of inspecting the expression directly. The branch conditions for multi-column (pred.is_multi_column), token (on_partition_key_token), and single-column (on_column) now use predicate properties instead of expression helpers. Remove extract_column_from_is_not_null_restriction() which is no longer needed.	2026-04-19 20:57:07 +03:00
Avi Kivity	440d9f2d82	cql3: statement_restrictions: annotate predicates with operator properties Add boolean fields to struct predicate that describe the operator: equality, is_in, is_slice, is_upper_bound, is_lower_bound, and comparison_order. Populate them in all to_predicates() return sites. These fields will allow the constructor loop to inspect predicate properties directly instead of re-examining the expression.	2026-04-19 20:57:07 +03:00
Avi Kivity	e0eb3bde8d	cql3: statement_restrictions: annotate predicates with is_not_null and is_multi_column To avoid having to dig deep into the expression, compute is_not_null and is_multicolumn early and store them in the predicate.	2026-04-19 20:57:06 +03:00
Avi Kivity	6892642176	cql3: statement_restrictions: complete preparation early We want to move away from the unprepared domain to the prepared domain to avoid confusion. Ideally we'd receive prepared expressions via the constructor, but that is left for later.	2026-04-19 20:57:06 +03:00
Avi Kivity	ed5dd645e8	cql3: statement_restrictions: convert expressions to predicates without being directed at a specific column Currently, possible_lhs_values accepts a column_definition parameter that tells it which column we are interested in. This works because callers pre-analyze the expression and only pass a subexpression that contains the specified columns. We wish to convert expressions to predicates early, and so won't have the benefit of knowing which columns we're interested in. Generally, this is simple: a binary operator contains a column on the left-hand side, so use that. If the expression is on a token, use that. When the expression is a boolean constant (not expressible by the grammar, but somehow found its way into the code). We invent a new `on_row` designator meaning it's not about a specific column. It will be useful one day when we allow things like `WHERE some_boolean_function(c1, c2)` that aren't specific to any single column. Finally, we introduce helpers that, given such an expression decomposed into predicates and a column_definition, extract the predicate related to the given column. This mimics the possible_lhs_values API and allows us to make minimal changes to callers, deferring that until later. possible_lhs_values() is renamed to to_predicates() and loses the column_definition parameter to indicate its new role.	2026-04-19 20:57:06 +03:00
Avi Kivity	bfd1302311	cql3: statement_restrictions: refine possible_lhs_values() function_call processing Currently, we are careful to call possible_lhs_values() for a token function only when slice/equality operators are used. We wish to relax this, so return nullptr (must filter) for the other cases instead of raising an internal error.	2026-04-19 20:57:06 +03:00
Avi Kivity	736011b663	cql3: statement_restrictions: return nullptr for function solver if not token Currently, possible_lhs_values() for a function call expression will only be called when we're sure it's the token() function. But soon this will no longer be the case. Return nullptr for non-token functions to indicate we can't solve for a column value instead of an internal error.	2026-04-19 20:57:06 +03:00
Avi Kivity	8faf62a1aa	cql3: statement_restrictions: refine possible_lhs_values() subscript solving Do more work at prepare time.	2026-04-19 20:57:06 +03:00
Avi Kivity	a28689a99a	cql3: statement_restrictions: return nullptr from possible_lhs_values instead of on_internal_error Since we're a first-resort call now, and there's a last-restort (evaluate) Logically should be part of previous patch, but the rest of the code is still careful enough not to call here when not expecting a solution, so the split is not breaking bisectability.	2026-04-19 20:57:06 +03:00
Avi Kivity	370f3fd2e8	cql3: statement_restrictions: convert possible_lhs_values into a solver Convert from an execute-time function to a prepare-time function by returning a solver function instead of directly solving. When not possible to solve, but still possible to evaluate (filter), return nullptr.	2026-04-19 20:57:06 +03:00
Avi Kivity	92a43557dc	cql3: statement_restrictions: split _where to boolean factors in preparation for predicates conversion Expressions are a tree-like structure so a single expression is sufficient (for complicated ones, a conjunction is used), but predicates are flat. Prepare for conversion to predicates by storing the expressions that will correspond to predicates, namely the boolean factors of the WHERE clause.	2026-04-19 20:57:06 +03:00
Avi Kivity	694c1aed98	cql3: statement_restrictions: refactor IS NOT NULL processing Move some code to a helper, but don't let it mutate state.	2026-04-19 20:57:06 +03:00
Avi Kivity	35f14544dc	cql3: statement_restrictions: fold add_single_column_nonprimary_key_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:06 +03:00
Avi Kivity	1965741914	cql3: statement_restrictions: fold add_single_column_clustering_key_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:06 +03:00
Avi Kivity	1d631f7bac	cql3: statement_restrictions: fold add_single_column_partition_key_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:05 +03:00
Avi Kivity	24cd98e454	cql3: statement_restrictions: fold add_token_partition_key_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:05 +03:00
Avi Kivity	be3239fc58	cql3: statement_restrictions: fold add_multi_column_clustering_key_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:05 +03:00
Avi Kivity	8990346c75	cql3: statement_restrictions: avoid early return in add_multi_column_clustering_key_restrictions Prepare for inlining it into its caller, which doesn't work easily if there's an early return.	2026-04-19 20:57:05 +03:00
Avi Kivity	fa130051a6	cql3: statement_restrictions: fold add_is_not_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:05 +03:00
Avi Kivity	63f9362c89	cql3: statement_restrictions: fold add_restriction() into its caller The goal is to simplify flow-control where the order in which variables are updated depends on their location in the source. With functions, this is difficult.	2026-04-19 20:57:05 +03:00
Avi Kivity	9cbb1b851e	cql3: statement_restrictions: remove possible_partition_token_values() It's just a call to possible_lhs_values() with a different signature. Now possible_lhs_values() is our only solver.	2026-04-19 20:57:05 +03:00
Avi Kivity	c1fc596203	cql3: statement_restrictions: remove possible_column_values replace with now-identical possible_lhs_values. This paves the way to have only one solver function (after we remove possible_partition_token_values).	2026-04-19 20:57:05 +03:00
Avi Kivity	b26e6f7330	cql3: statement_restrictions: pass schema to possible_column_values() This unifies the signature with possible_lhs_values(), paving the way to deduplicating the two functions. We always have the schema and may as well pass it.	2026-04-19 20:57:05 +03:00
Avi Kivity	c6f6e81fe5	cql3: statement_restrictions: remove fallback path in solve() All query plans that try to solve for the possible values a column (or token, or column-tuple) can take have been converted to set analyzed_column::solve_for. Recognize that by removing the fallback path. This removes the last possible_column_values() call that isn't bound (using std::bind_front), and will allow moving it to prepare time.	2026-04-19 20:57:05 +03:00
Avi Kivity	e0445269e5	cql3: statement_restrictions: reorder possible_lhs_column parameters By moving query_options to the end, we can use std::bind_front to convert it from a build-time to a run-time function that depends only on the query_options.	2026-04-19 20:57:05 +03:00
Avi Kivity	e42ad62561	cql3: statement_restrictions: prepare solver for multi-column restrictions Multi-column restrictions (a, b) > (:v1, :v2) do not obey normal comparison rules. For example, given (a, b) > (5, 1) AND a <= 5 We see that (a, b) = (5, 2) satisfies the constraint, but if we tried to solve for the interval ( (5, 1), (5) ] We'd have to conclude that (5,1) <= (5). It's possible to extend the CQL type system to support this, but that would be a lot of work, and in fact the current code doesn't depend on it (by solving these intersections in its own code path (multi_column_range_accumulator_builder's prefix3cmp). So, we just mark such solvers as non-comparable, and generate an internal error if we try to compare them in make_conjunction.	2026-04-19 20:57:05 +03:00
Avi Kivity	96e8414963	cql3: statement_restrictions: add solver for token restriction on index possible_column_values() knows how to find the values that the token can take, so add a solve_for implementation for tokens.	2026-04-19 20:57:04 +03:00
Avi Kivity	135809d97b	cql3: statement_restrictions: pre-analyze column in value_for() Since we pre-analyze the column, return a built function, and remove the corresponding lambda from the caller.	2026-04-19 20:57:04 +03:00
Avi Kivity	0a16d90acb	cql3: statement_restrictions: don't handle boolean constants in multi_column_range_accumulator_builder In statement_restriction's constructor, we check that all the boolean factors are relations. This means the code to handle a constant here is dead code. Remove it; while it's good to handle it, it should be handled at the top level, not in multi-column restriction processing.	2026-04-19 20:57:04 +03:00
Avi Kivity	56ae02d8a3	cql3: statement_restrictions: split range_from_raw_bounds into prepare phase and query phase range_from_raw_bound processes restrictions of the form (a, b) > SCYLLA_CLUSTERING_BOUND(?, ?) indicating that comparisons respect whether columns are reversed or not. Iterate over expressions during the prepare phase only; generating "builder" functions to be executed during the query phase.	2026-04-19 20:57:04 +03:00
Avi Kivity	2c75123bbd	cql3: statement_restrictions: adjust signature of range_from_raw_bounds The get_clustering_bounds() family works in terms of vectors of clustering ranges (to support IN) and in fact the only caller converts it to a vector. Converting it immediately simplifies later patching.	2026-04-19 20:57:04 +03:00
Avi Kivity	e646b763e7	cql3: statement_restrictions: split multi_column_range_accumulator into prepare-time and query-time phases multi_column_range_accumulator analyzes an expression containing multi-column restrictions of the form (a, b) > (?, ?) and simultaneously analyzes them and solves for the set of intervals that satisfy those restrictions. Split this into prepare-time phase (that generates "builders", functions that operator on the accumulator), and a query phase that executes the builders. Importantly, the expression visitor ends up on the prepare phase, so it can be merged with other parts of the analysis. Helper functions of the visitor are made static, since they need to run during the query phase but the visitor only exists during the prepare phase.	2026-04-19 20:57:04 +03:00
Avi Kivity	ea26186043	cql3: statement_restrictions: make get_multi_column_clustering_bounds a builder Lay the groundwork for analyzing multi column clustering bounds by splitting the function into prepare-time and execute-time parts. To start with, all of the work is done at query time, but later patches will move bits into prepare time.	2026-04-19 20:57:04 +03:00
Avi Kivity	c60e3d5cf7	cql3: statement_restrictions: multi-key clustering restrictions one layer deeper For the multi column binary operator case, perform more of the work at prepare time in preparation for consolidating the analysis.	2026-04-19 20:57:04 +03:00
Avi Kivity	b520e74128	cql3: statement_restrictions: push multi-column post-processing into get_multi_column_clustering_bounds() Doing this splits the multi-column processing code into a preparation phase and an evaluation phase in a single call, making it easier to further split prepare/evaluate.	2026-04-19 20:57:04 +03:00
Avi Kivity	c4ab0ddb85	cql3: statement_restrictions: pre-analyze single-column clustering key restrictions Change _clustering_prefix_restrictions and _idx_tbl_ck_prefix (the latter is the equivalent of the former, for indexed queries), to use predicate instead of expressions. This lets us do more of the work of solving restrictions during prepare time. We only handle single-column restrictions here. Multi-column restrictions use the existing path. We introduce two helpers: - value_set_to_singleton() converts a restriction solution to a singleton when we know that's the only possible answer - replace_column_def() overload for predicate, similar to the existing overload for expressions There is a wart in get_single_column_clustering_bounds(): we arrive at his point with the two vectors possibly pointing at different columns. Previously, possible_lhs_values() did this check while solving. We now check for it here. The predicate::on variant gets another member, for clustering key prefixes. Since everything is still handled by the legacy paths, we mostly error out.	2026-04-19 20:57:04 +03:00
Avi Kivity	201ed53837	cql3: statement_restrictions: wrap value_for_index_partition_key() To allow more work to be carried out during prepare time, wrap the body in an std::function, which will be called at execution time. Currently we actually do the work during execution time; but the way is prepared.	2026-04-19 20:57:04 +03:00
Avi Kivity	325497d460	cql3: statement_restrictions: hide value_for() value_for() is a general function that solves for values that satisfy an expression set to TRUE. This goes against our goal to prepare solvers for all the expressions we use. Fortunately, it's only called with one expression, which comes from statement_restrictions, so we can add an accessor that provides the expression from our own state. Later, we'll be able to do prepare-time work on it.	2026-04-19 20:57:04 +03:00
Avi Kivity	dcdd2f7e72	cql3: statement_restrictions: push down clustering prefix wrapper one level This allows us to tackle each case separately.	2026-04-19 20:57:03 +03:00
Avi Kivity	1039ed9ed2	cql3: statement_restrictions: wrap functions that return clustering ranges During prepare time, build functions for use during execution time. Currently, the wrappers are very shallow, and practically all the work is done at execution time. But the stage is set for more peeling. The index clustering ranges had on_internal_error()s if an index was not used. They're converted to returning a null function. If executed (which is never supposed to happen), it will throw a bad_function_call.	2026-04-19 20:57:03 +03:00
Avi Kivity	620df7103f	cql3: statement_restrictions: do not pass view schema back and forth For indexed queries, statement_restrictions calculates _view_schema, which is passed via get_view_schema() to indexed_select_statement(), which passes it right back to statement_restrictions via one of three functions to calculate clustering ranges. Avoid the back-and-forth and use the stored value. Using a different value would be broken. This change allows unifying the signatures of the four functions that get clustering ranges.	2026-04-19 20:57:03 +03:00
Avi Kivity	6fce090e30	cql3: statement_restrictions: pre-analyze token range restrictions Convert token range restrictions to the predicate format we introduced earlier, where we have a function to solve for the token range rather than running the analysis at runtime. Again the truth is that the function will delegate to possible_partition_token_values() which actually will do the analysis at runtime, but it's one step closer. We add a new variant element for predicate::on, since it doesn't fit the existing element (the token isn't a column).	2026-04-19 20:57:03 +03:00
Avi Kivity	941011bb4a	cql3: statement_restrictions: pre-analyze partition key columns The expression tree for partition keys is analyzed during runtime: in partition_range_from_singles() (for example), we call find_binop and get_subscripted_column() to understand the expression structure. This analysis is problematic because it has to match the analysis during prepare time; and they have to evolve in lock step. Here, we move the analysis to the prepare stage. This is done by augmenting the expression into a new predicate struct. It contains the original expression (as a fallback for paths not yet converted), as well as a solve_for function which contains a function built at prepare time that embeds all the necessary analysis. We introduce the `predicate` type which is an augmentation of boolean expressions. In addition to the expression, we remember what column the expression is on, and a function that computes what values the column can take on that would make the expression true. The field that says what column the predicate is about is typed as a variant since later on we will have predicates on non-columns (the token, or a clustering prefix). Note that currently the function engages in some run-time analysis of its own, since it calls possible_lhs_values that itself does analysis, but this is a step in the right direction.	2026-04-19 20:57:03 +03:00
Avi Kivity	c73f3ac55f	cql3: statement_restrictions: do not collect subscripted partition key columns An indexed SELECT of the from SELECT ... WHERE pk['sub'] = ? is impossible because our indexes do not support frozen maps, and partition key collections must be frozen. Stop collecting such constructs for the purpose of determining the partition range. This reduces having to deal with combinations of restrictions on the column and its entries later on. In case we start supporting indexes on frozen maps, leave an on_internal_error to remind us.	2026-04-19 20:57:03 +03:00
Avi Kivity	531f137ed3	cql3: statement_restrictions: split _partition_range_restrictions into three cases _partition_range_restrictions are a vector of expressions, one per partition key column, except that it can be empty if there is no restriction on the partition that can be translated to a read command, and if the restriction is on a token range, the first element only is used. Separate the three cases into distinct structs. After this, additional work can be done utilizing the specialization.	2026-04-19 20:57:03 +03:00
Avi Kivity	fcf7c4c90d	cql3: statement_restrictions: move value_list, value_set to header file They don't really need to be public, but will be used in intermediate storage.	2026-04-19 20:57:03 +03:00
Avi Kivity	926886fcfb	cql3: statement_restrictions: wrap get_partition_key_ranges statement_restrictions::get_partition_key_ranges() re-interprets the expressions used to specify the partition key. This means that the analysis phase (determining what those expressions are and how they are to be used) and the execution phase (using them) are in separate places. This makes it very hard to refactor while preserving correctness. As a first step in unifying the two phases, we move the selection of the strategy (using token, cartesian product, or single partition) from execution to analysis, by making the if-tree return a function to be executed at execution time, rather than running the if-tree itself at execution time.	2026-04-19 20:57:03 +03:00
Avi Kivity	eec0b20dbc	cql3: statement_restrictions: prepare statement_restrictions for capturing `this` Prevent copying/moving, that can change the address, and instead enforce using shared_ptr. Most of the code is already using shared_ptr, so the changes aren't very large. To forbid non-shared_ptr construction, the constructors are annotated with a private_tag tag class.	2026-04-19 20:57:03 +03:00
Avi Kivity	374be94faa	test: statement_restrictions: add index_selection regression test In preparation for refactoring statement_restrictions, add a simple and an exhaustive regression test, encoding the index selection algorithm into the test. We cannot change the index selection algorithm because then mixed-node clusters will alter the sorting key mid-query (if paging takes place). Because the exhaustive space has such a large stack frame, and because Address Santizer bloats the stack frame, increase it for debug builds.	2026-04-19 20:57:01 +03:00
Artsiom Mishuta	dce0c24a02	test/alternator: replace bare pytest.skip() with typed skip helpers	2026-04-19 17:34:41 +02:00
Artsiom Mishuta	b078cd1e72	test: migrate new bare skips introduced by upstream after rebase Migrate 3 bare skip sites that appeared in upstream/master after the initial migration: - test/cluster/test_strong_consistency.py: 2 @pytest.mark.skip → @pytest.mark.skip_bug (SCYLLADB-1056) - test/cqlpy/conftest.py: pytest.skip() → skip_env() in skip_on_scylla_vnodes fixture	2026-04-19 17:34:41 +02:00
Artsiom Mishuta	9c4d3ce097	test/pylib: reject bare pytest.mark.skip and add codebase guards Harden the skip_reason_plugin to reject bare @pytest.mark.skip at collection time with pytest.UsageError instead of warnings.warn(). Add test/pylib_test/test_no_bare_skips.py with three guard tests: - AST scan for bare pytest.skip() runtime calls - Real pytest --collect-only against all Python test directories	2026-04-19 17:34:31 +02:00
Avi Kivity	a15294d601	Revert "Update seastar submodule" This reverts commit `2943d30b0c`. It introduces a regression where --unsafe-bypass-fsync is not honored. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1496	2026-04-19 15:14:48 +03:00
Avi Kivity	9fb67e3e96	Revert "alternator: optional stripping of http response headers" This reverts commit `73f0deef6d`. It prevents `2943d30b0c`, which causes high flakiness, from being reverted.	2026-04-19 15:14:48 +03:00
Artsiom Mishuta	0b6b380b80	test: update comments referencing pytest.skip() to skip_env() Update 7 comments/docstrings across 5 files that still referenced pytest.skip() to reference the typed skip_env() wrapper for consistency with the migrated code.	2026-04-19 11:14:03 +02:00
Artsiom Mishuta	b10028e556	test: migrate runtime pytest.skip() to typed skip_bug() Migrate 2 runtime pytest.skip() calls referencing known bugs to use the typed skip_bug() wrapper from test.pylib.skip_types: - test/alternator/test_ttl.py: Streams on tablets (#23838) - test/scylla_gdb/test_task_commands.py: coroutine task not found (#22501)	2026-04-19 11:10:42 +02:00
Artsiom Mishuta	8a80e2c3be	test: migrate runtime pytest.skip() to typed skip_env() Migrate runtime pytest.skip() calls across 34 files to use the typed skip_env() wrapper from test.pylib.skip_types. These sites skip at runtime because a required feature, config option, library version, build mode, or runtime topology is not available. Also fixes 'raise pytest.skip(...)' in test_audit.py — skip_env() already raises internally, so the explicit raise was incorrect. Each file gains one new import: from test.pylib.skip_types import skip_env	2026-04-19 11:09:29 +02:00
Artsiom Mishuta	fb0974a329	test: migrate bare @pytest.mark.skip to skip_not_implemented Migrate 2 bare @pytest.mark.skip decorators (no reason string) to @pytest.mark.skip_not_implemented with an explicit reason referencing issue #3882 (COMPACT STORAGE not implemented).	2026-04-19 11:06:30 +02:00
Artsiom Mishuta	a39fb9d29a	test: migrate @pytest.mark.skip to @pytest.mark.skip_slow Migrate 4 @pytest.mark.skip decorator sites to @pytest.mark.skip_slow across 3 test files where the skip reason indicates a slow test.	2026-04-19 11:06:30 +02:00
Artsiom Mishuta	638efedc3c	test: migrate @pytest.mark.skip to @pytest.mark.skip_not_implemented Migrate 10 @pytest.mark.skip decorator sites to @pytest.mark.skip_not_implemented across 5 test files where the skip reason indicates a feature not yet implemented.	2026-04-19 11:06:30 +02:00
Artsiom Mishuta	465636bc53	test: migrate @pytest.mark.skip to @pytest.mark.skip_bug for known bugs Migrate 24 @pytest.mark.skip decorator sites to @pytest.mark.skip_bug across 16 test files where the reason references a known bug or issue.	2026-04-19 11:06:30 +02:00
Szymon Malewski	73f0deef6d	alternator: optional stripping of http response headers In Alternator's HTTP API, response headers can dominate bandwidth for small payloads. The Server, Date, and Content-Type headers were sent on every response but many clients never use them. This patch introduces three Alternator config options: - alternator_http_response_server_header, - alternator_http_response_disable_date_header, - alternator_http_response_disable_content_type_header, which allow customizing or suppressing the respective HTTP response headers. All three options support live update (no restart needed). The Server header is no longer sent by default; the Date and Content-Type defaults preserve the existing behavior. The Server and Date header suppression uses Seastar's set_server_header() and set_generate_date_header() APIs added in https://github.com/scylladb/seastar/pull/3217. This patch also fixes deprecation warnings from older Seastar HTTP APIs. Tests are in test/alternator/test_http_headers.py. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-70 Closes scylladb/scylladb#28288	2026-04-19 09:22:04 +03:00
Nadav Har'El	f83270df12	Merge 'alternator/streams: Block tablet merges for Alternator Streams on tablet tables' from Piotr Szymaniak DynamoDB Streams API can only convey a single parent per stream shard. Tablet merges produce two parents, making them incompatible with Alternator Streams. This series blocks tablet merges when streams are active on a tablet table. For CreateTable, a freshly created table has no pending merges, so streams are enabled immediately with tablet merges blocked. For UpdateTable on an existing table, stream enablement is deferred: the user's intent is stored via `enable_requested`, tablet merges are blocked (new merge decisions are suppressed and any active merge decision is revoked), and the topology coordinator finalizes enablement once no in-flight merges remain. The topology coordinator is woken promptly on error injection release and tablet split completion, reducing finalization latency from ~60s to seconds. `test_parent_children_merge` is marked xfail (merges are now blocked), and downward (merge) steps are removed from `test_parent_filtering` and `test_get_records_with_alternating_tablets_count`. Not addressed here: using a topology request to preempt long-running operations like repair (tracked in SCYLLADB-1304). Refs SCYLLADB-461 Closes scylladb/scylladb#29224 * github.com:scylladb/scylladb: topology: Wake coordinator promptly for stream enablement lifecycle test/cluster: Test deferred stream enablement on tablet tables alternator/streams: Block tablet merges when Alternator Streams are enabled	2026-04-19 09:15:13 +03:00
Nadav Har'El	0d05e3b4a4	alternator: fix ListStreams paging if table is deleted during paging Currently, ListStreams paging works by looking in the list of tables for ExclusiveStartStreamArn and starting there. But it's possible that during the paging process, one of the tables got deleted and ExclusiveStartStreamArn no longer points to an existing table. In the current implementation this caused the paging to stop (think it reached the end). The solution is simple: ListStreams will now sort the list of tables by name (it anyway needs to be sorted by something to be consistent across pages), and will look with std::upper_bound for the first table after the ExclusiveStartStreamArn - we don't need to find that table name itself. The patch also includes a test reproducing this bug. As usual, the test passes on DynamoDB, fails on Alternator before this patch, and passes with the patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:02 +03:00
Nadav Har'El	930fb4c330	test/alternator: test DescribeStream on non-existent table We already had a test for DescribeStream being called on a bogus ARN returns a ValidationException. But if the stream is more legitimate- looking but refers to a non-existent table (e.g., an ARN taken in the past from a table that no longer exists), we should return ResourceNotFoundException. In this patch we add a test that verifies we indeed do this correctly. Moreover, Alternator's current stream ARNs include both a keyspace name and a table name, and either one being incorrect should lead to ResourceNotFoundException, and indeed the new test validates that it works as expected - there is no bug here (AI guessed we have a bug in the missing keyspace case, but this guess was wrong).	2026-04-19 09:12:02 +03:00
Nadav Har'El	02d474fca8	alternator: ListStreams: on last page, avoid LastEvaluatedStreamArn When ListStreams is on its last page and ran out streams to list, it shouldn't return a paging cookie (LastEvaluatedStreamArn) at all. Before this patch it does, and forces the user to make another call just to get another empty page, which is silly. This patch includes a fix and a reproducer test (that, as usual, passes on DynamoDB and fails on Alternator before the patch and succeeds after). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:02 +03:00
Nadav Har'El	68b783103e	alternator: remove dead code stream_shard_id The class "stream_shard_id" was used in the past (with the old name stream_arn) for representing stream ARNs. It was renamed "stream_shard_id" under the mistaken believe that it will be used to represent DynamoDB Streams "shards" - but it wasn't used for that either (we have a separate "struct shard_id" in the code). So this class is now dead code and can be removed. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-19 09:12:01 +03:00
Nadav Har'El	1ac910c2ab	alternator: fix ListStreams to return real ARN as LastEvaluatedStreamArn Alternator Streams' "ListStreams" does paging by returning a "cookie" LastEvaluatedStreamArn from one request, that the user passes to the next request as ExclusiveStartStreamArn. In the past, Alternator's stream ARNs were UUIDs, but we recently changed them to match DynamoDB's ARN format which the KCL library requires. However, we didn't change ListStream's cookie format, and it remained UUIDs. This, however, goes against the documentation of DynamoDB, which states that LastEvaluatedStreamArn should be "the stream ARN of the item where the operation stopped". It shouldn't be some weird opaque cookie. So in this patch we add a test that confirms that indeed, in DynamoDB the LastEvaluatedStreamARN is really the last returned ARN and not an opaque cookie. The new test passes on DynamoDB, and fails on Alternator before the simple fix that this patch then does. Fixes SCYLLADB-539.	2026-04-19 09:12:01 +03:00
Piotr Szymaniak	a2a0868c7d	topology: Wake coordinator promptly for stream enablement lifecycle The topology coordinator sleeps on a condition variable between iterations. Several events relevant to Alternator stream enablement did not wake it, causing delays of up to 60s (the periodic load stats refresh interval) at each step: 1. Error injection release: when a test disables the delay_cdc_stream_finalization injection, the coordinator was not notified. Add an on_disable callback mechanism to the error injection framework (register_on_disable / unregister_on_disable) so subsystems can react when an injection is released. The topology coordinator uses this to broadcast its event. 2. Tablet split completion: after all local storage groups for a table finish splitting, split_ready_seq_number is set but the coordinator only discovered this via the periodic stats refresh. Add an on_tablet_split_ready callback to topology_state_machine that the coordinator sets to trigger_load_stats_refresh(). The split monitor in storage_service calls it when all compaction groups are split-ready, giving the coordinator fresh stats immediately so it can finalize the resize. These changes reduce test_deferred_stream_enablement_on_tablets from ~120s to ~13s and fix a production issue where Alternator stream enablement could be delayed by up to 60s at each step of the lifecycle (error injection release, split completion).	2026-04-19 03:54:33 +02:00
Piotr Szymaniak	a5d35d2b4c	test/cluster: Test deferred stream enablement on tablet tables Async cluster test exercising the deferred enablement lifecycle: ENABLING -> ENABLED -> disabled, verifying tablet merge blocking and unblocking at each stage. Uses delay_cdc_stream_finalization error injection and CQL ALTER TABLE with tablet count constraints. Also adds tablet scheduler config to test_config.yaml (fast refresh interval, scale factor 1) for reliable tablet count changes.	2026-04-19 03:54:33 +02:00
Piotr Szymaniak	4b6937b570	alternator/streams: Block tablet merges when Alternator Streams are enabled DynamoDB Streams API can only convey a single parent per stream shard. Tablet merges produce 2 parents, which is incompatible. When streams are requested on a tablet table, block tablet merges via tablet_merge_blocked (the allocator suppresses new merge decisions and revokes any active merge decision). add_stream_options() sets tablet_merge_blocked=true alongside enabled=true, so CreateTable needs no special handling — the flag is inert on vnode tables and immediately effective on tablet tables. For UpdateTable, CDC enablement is deferred: store the user's intent via enable_requested, and let the topology coordinator finalize enablement once no in-progress merges remain. A new helper, defer_enabling_streams_block_tablet_merges(), amends the CDC options to this deferred state. Disabling streams clears all flags, immediately re-allowing merges. The tablet allocator accesses the merge-blocked flag through a schema::tablet_merges_forbidden() accessor rather than reaching into CDC options directly. Mark test_parent_children_merge as xfail and remove downward (merge) steps from tablet_multipliers in test_parent_filtering and test_get_records_with_alternating_tablets_count.	2026-04-19 03:54:33 +02:00
Avi Kivity	f5886b4fdd	Merge 'Add virtual task for vnodes-to-tablets migrations' from Nikos Dragazis This PR exposes vnodes-to-tablets migrations through the task manager API via a virtual task. This allows users to list, query status, and wait on ongoing migrations through a standard interface, consistent with other global operations such as tablet operations and topology requests are already exposed. The virtual task exposes all migrations that are currently in progress. Each migrating keyspace appears as a separate task, identified by a deterministic name-based (v3) UUID derived from the keyspace name. Progress is reported as the number of nodes that have switched to tablets vs. the total. The number increases on the forward path and decreases on rollback. The task is not abortable - rolling back a migration requires a manual procedure. The `wait` API blocks until the migration either completes (returning `done`) or is rolled back (returning `suspended`). Example output: ``` $ scylla nodetool tasks list vnodes_to_tablets_migration task_id type kind scope state sequence_number keyspace table entity shard start_time end_time 1747b573-6cd6-312d-abb1-9b66c1c2d81f vnodes_to_tablets_migration cluster keyspace running 0 ks 0 $ scylla nodetool tasks status 1747b573-6cd6-312d-abb1-9b66c1c2d81f id: 1747b573-6cd6-312d-abb1-9b66c1c2d81f type: vnodes_to_tablets_migration kind: cluster scope: keyspace state: running is_abortable: false start_time: end_time: error: parent_id: none sequence_number: 0 shard: 0 keyspace: ks table: entity: progress_units: nodes progress_total: 3 progress_completed: 0 ``` Fixes SCYLLADB-1150. New feature, no backport needed. Closes scylladb/scylladb#29256 * github.com:scylladb/scylladb: test: cluster: Verify vnodes-to-tablets migration virtual task distributed_loader: Link resharding tasks to migration virtual task distributed_loader: Make table_populator aware of migration rollbacks service: Add virtual task for vnodes-to-tablets migrations storage_service: Guard migration status against uninitialized group0 compaction: Add parent_id to table_resharding_compaction_task_impl storage_service: Add keyspace-level migration status function storage_service: Replace migration status string with enum utils: Add UUID::is_name_based()	2026-04-19 00:56:33 +03:00
Nadav Har'El	2943d30b0c	Update seastar submodule * seastar 4d268e0e...22a5aa13 (36): > apps/httpd: replace deprecated reply::done() with write_body() > missing header(s) > net: Fix missing throw for runtime_error in create_native_net_device > tests/io_queue: account for token bucket refill granularity in bandwidth checks > Merge 'iovec: fix iovec_trim_front infinite loop on zero-length iovecs' from Travis Downs tests: add regression tests for zero-length iovec handling iovec: fix iovec_trim_front infinite loop on zero-length iovecs > util/process: graduate process management API from experimental > cooking: don't register ready.txt as a build output > sstring: make make_sstring not static > Add SparkyLinux to debian list in install-dependencies.sh > http: allow control over default response headers > Merge 'chunked_fifo: make cached chunk retention configurable' from Brandon Allard tests/perf: add chunked_fifo microbenchmarks chunked_fifo: set the default free chunk retention to 0 chunked_fifo: make free chunk retention configurable > Merge 'reactor_backend: fix pollable_fd_state_completion reuse in io_uring' from Kefu Chai tests: add regression test for pollable_fd_state_completion reuse reactor_backend: use reset() in AIO and epoll poll paths reactor_backend: fix pollable_fd_state_completion reuse after co_await in io_uring > Merge 'coroutine: Generator cleanups' from Kefu Chai coroutine/generator: extract schedule_or_resume helper coroutine/generator: remove unused next_awaiter classes coroutine/generator: remove write-only _started field coroutine/generator: assert on unreachable path in buffered await_resume coroutine/generator: add elements_of tag and #include <ranges> coroutine/generator: add empty() to bounded_container concept > cmake: bump minimum Boost version to 1.79.0 > seastar_test: remove unnecessary headers > cmake: bump minimum GnuTLS version to 3.7.4 > Merge 'reactor: add get_all_io_queues() method' from Travis Downs tests: add unit test for reactor::get_all_io_queues() reactor: add get_all_io_queues() method reactor: move get_io_queue and try_get_io_queue to .cc file > http: deprecate reply::done(), remove _response_line dead field > core: Deprecate scattered_message > ci: add workflow dispatch to tests workflow > perf_tests: exit non-zero when -t pattern matches no tests > Replace duplicate SEGV_MAPERR check in sigsegv_action() with SEGV_ACCERR. > perf_tests: add total runtime to json output > Merge 'Relax large allocation error originating from json_list_template' from Robert Bindar implement move assignment operator for json_list_template json_list_template copy assignment operator reserves capacity upfront > perf_tests: add --no-perf-counters option > Merge 'Fix to_human_readable_value() ability to work with large values' from Pavel Emelyanov memory: Add compile-time test for value-to-human-readable conversion memory: Extend list of suffixes to have peta-s memory: Fix off-by-one in suffix calculation memory: Mark to_human_readable_value() and others constexpr > http: Improve writing of response_line() into the output > Merge 'websocket: add template parameter for text/binary frame mode and implement client-side WebSocket' from wangyuwei websocket: add template parameter for text/binary frame mode websocket: impl client side websocket function > file: Fix checks for file being read-only > reactor: Make do_dump_task_queue a task_queue method > Merge 'Implement fully mixed mode for output_stream-s' from Pavel Emelyanov tests/output_stream: sample type patterns in sanitizer builds tests/output_stream: extend invariant test to cover mixed write modes iostream: allow unrestricted mixing of buffered and zero-copy writes tests/output_stream: remove obsolete ad-hoc splitting tests tests/output_stream: add invariant-based splitting tests iostream: rename output_stream::_size to ::_buffer_size > reactor_backend: replace virtual bool methods with const bool_class members > resource: Avoid copying CPU vector to break it into groups > perf_tests: increase overhead column precision to 3 decimal places > Merge 'Move reactor::fdatasync() into posix_file_impl' from Pavel Emelyanov reactor: Deprecate fdatasync() method file: Do fdatasync() right in the posix_file_impl::flush() file: Propagate aio_fdatasync to posix_file_impl reactor: Move reactor::fdatasync() code to file.cc reactor,file: Make full use of file_open_options::durable bit file: Add file_open_options::durable boolean file: Account io_stats::fsyncs in posix_file_impl::flush() reactor: Move _fsyncs counter onto io_stats > http: Remove connection::write_body()	2026-04-18 11:52:33 +03:00
Nadav Har'El	31e0315710	Merge 'alternator: fix unnecesary cdc log entries' from Radosław Cybulski Fix cdc writing unnecesary entries to it's log, like for example when Alternator deletes an item which in reality doesn't exist. Originally @wps0 tackled this issue. This patch is an extension of his work. His work involved adding `should_skip` function to cdc, which would process a `mutation` object and decide, wherever changes in the object should be added to cdc log or not. The issue with his approach is that `mutation` object might contain changes for more than one row. If - for example - the `mutation` object contains two changes, delete of non-existing row and create of non-existing row, `should_skip` function will detect changes in second item and allow whole `mutation` (BOTH items) to be added. For example (using python's boto3) running this on empty table: ``` with table.batch_writer() as batch: batch.put_item({'p': 'p', 'c': 'c0'}) batch.delete_item(Key={'p': 'p', 'c': 'c1'}) ``` will emit two events ("put" event and "delete" event), even though the item with `c` set to `c1` does not exist (thus can't be deleted). Note, that both entries in batch write must use the same partition key, otherwise upper layer with split them into separate `mutation` objects and the issue will not happen. The solution is to do similar processing, but consider each change separated from others. This is tricky to implement due to a way cdc works. When cdc processes `mutation` object (containing X changes), it emits cdc entries in phases. Phase 1 - emit `preimage` (old state) for each change (if requested). Phase 2 - for each change emit actual "diff" (update / delete and so on). Phase 3 - emit `postimage` (new state). We will know if change needs to be skipped during phase 2. By that time phase 1 is completed and preimage for the change is emited. At that moment we set a flag that the change (identified by clustering key value) needs to be skipped - we add a clustering key to a `ignore-rows` set (`_alternator_clustering_keys_to_ignore` variable) and continue normally. Once all phases finish we add a `postprocess` phase (`clean_up_noop_rows` function). It will go through generated cdc mutations and skip all modifications, for which clustering key is in `ignore-rows` set. After skipping we need to do a "cleanup" operation - each generated cdc mutation contain index (incremented by one), if we skipped some parts, the index is not consecutive anymore, so we reindex final changes. There's a special case worth mentioning - Alternator tables without clustering keys. At that point `mutation` object passed to cdc can contain exactly one change (since different partition keys are splitted by upper layers and Alternator will never emit `mutation` object containing two (or more) changes with the same primary key. Here, when we decide the change is to be skipped we add empty `bytes` object to `ignore-rows` set. When checking `ignore-rows` set, we check if it's empty or not (we don't check for presence of empty `bytes` object). Note: there might be some confusion between this patch and #28452 patch. Both started from the same error observation and use similar tests for validation, as both are easily triggered by BatchWrite commands (both needs `mutation` object passed to cdc to contain more than one single change). This issue tho is about wrong data written in cdc log and is fixed at cdc, where #28452 is about wrong way of parsing correct cdc data and is fixed at Alternator side of things. Note, that we need #28452 to truly verify (otherwise we will emit correct cdc entries, but Alternator will incorrectly parse them). Note: to benefit / notice this patch you need `alternator_streams_increased_compatibility` flag turned on. Note: rework is quite "broad" and covers a lot of ground - every operation, that might result in a no-change to the database state should be tested. An additional test was added - trying to remove a column from non-existing item, as well as trying to remove non-existing column from existing item. Fixes: #28368 Fixes: SCYLLADB-1528 Fixes: SCYLLADB-538 Closes scylladb/scylladb#28544 * github.com:scylladb/scylladb: alternator: remove unnecesary code alternator: fix Alternator writing unnecesary cdc entries alternator: add failing tests for Streams	2026-04-18 00:07:51 +03:00
Nadav Har'El	32060d73df	Merge 'alternator: Add stream support for tablets' from Radosław Cybulski Implements neccesary changes for Streams to work with tablet based tables. - add utility functions to `system_keyspace` that helps reading cdc content from cdc log tables for tablet based base tables (similar api to ones for vnodes) - remove antitablet `if` checks, update tests that fail / skip if tablets are selected - add two tests to extensively test tablet based version, especially while manipulating stream count Fixes #23838 Fixes SCYLLADB-463 Closes scylladb/scylladb#28500 * github.com:scylladb/scylladb: alternator: add streams with tablets tests alternator: remove antitablet guards when using Streams alternator: implement streams for tablets treewide: add cdc helper functions to system_keyspace alternator: add system_keyspace reference	2026-04-17 23:48:31 +03:00
Radosław Cybulski	586bb1d345	alternator: fix issues with stream_arn copy / move `stream_arn` object holds a full ARN as `std::string` and two `std::string_view` fields (`table_name_` and `keyspace_name_`) pointing into ARN itself. This prevents object from being safely copied (as in that case both `table_name_` and `keyspace_name_` will point into original object's ARN). Similar issue might happen with move, when ARN contains string short enough for small string optimization to kick in (although in practice this is not possible, as ARN has requirements which make it's minimal length above 15 characteres - current limit for small string optimizations in most popular string libraries). The patch drops `std::string_view` objects in favor of integer offsets and sizes. The offset equal to 0 means beginning of ARN string. The api is preserved - both `table_name` and `keyspace_name` function will return `std::string_view` reconstructed on the fly. Closes scylladb/scylladb#29507	2026-04-17 23:13:17 +03:00
Piotr Szymaniak	caaef45b7a	audit: restore static_cast for batch inspect Closes scylladb/scylladb#29545	2026-04-17 23:11:18 +03:00
Nikos Dragazis	d361a0dd83	test: cluster: Verify vnodes-to-tablets migration virtual task Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 21:13:52 +03:00
Nikos Dragazis	295e434781	distributed_loader: Link resharding tasks to migration virtual task When a table is loaded on startup during a vnodes-to-tablets migration (forward or rollback), the `table_populator` runs a resharding compaction. Set the migration virtual task as parent of the resharding task. This enables users to easily find all node-local resharding tasks related to a particular migration. Make `migration_virtual_task::make_task_id()` public so that the `distributed_loader` can compute the migration's task ID. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	a3aa4f6cb4	distributed_loader: Make table_populator aware of migration rollbacks The `table_populator` uses a `migrate_to_tablets` flag to distinguish normal tables from tables under vnodes-to-tablets migration (forward path), since the two require different resharding. The next patch will set the parent info of migration-related resharding compaction tasks so they appear as children of the migration virtual task. For that, the table populator needs to recognize not only migrations in the forward path, but rollbacks as well. Replace the flag with a tri-state `migration_direction` enum (none, forward, rollback). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	696f9f8954	service: Add virtual task for vnodes-to-tablets migrations Add a virtual task that exposes in-progress vnodes-to-tablets migrations through the task manager API. The task is synthesized from the current migration state, so completed migrations are not shown. Progress is reported as the number of nodes that currently use tablets: it increases on the forward path and decreases on rollback. For simplicity, per-node storage modes are not exposed in the task status; callers that need them should use the migration status REST endpoint. Unlike regular tasks that use time-based UUIDs, this task uses deterministic named UUIDs derived from the keyspace names. This keeps the implementation simple (no need to persist them) and gives each keyspace a stable task ID. The downside is that the start time of each task is unknown and repeated migrations of the same keyspace (migration -> rollback -> new migration) cannot be distinguished. Introduce a new task manager module to keep them separate from other tasks. Add support for `wait()`. While its practical value is debatable (migration is a manual procedure, rolling restart will interrupt it), it keeps the task consistent with the task manager interface. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	d1ca01b25d	storage_service: Guard migration status against uninitialized group0 `storage_service::get_tablets_migration_status()` reads a group0 virtual table, so it requires group0 to be initialized. When invoked via the migration REST API, this condition is satisfied since the API is only available after joining group0. However, once this function is integrated into the task API later in this series, the assumption will no longer hold, as the task API is exposed earlier in the startup process. Add a guard to detect this condition and return a clear error message. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	ca830c7bce	compaction: Add parent_id to table_resharding_compaction_task_impl Required to link it with the migration task in the next patches. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	46e3902daa	storage_service: Add keyspace-level migration status function `storage_service::get_tablets_migration_status()` returns the keyspace-level migration status, indicating whether migration has not started, is in progress, or has completed, and for migrating keyspaces also returns per-node migration statuses. Rename it to `get_tablets_migration_status_with_node_details()` and introduce a new `get_tablets_migration_status()` that returns only the keyspace-level status. This prepares the function for reuse in the next patches, which will add a virtual task for vnodes-to-tablets migrations. Several task-manager paths will only need the keyspace-level migration state, not per-node information. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	3096ba0577	storage_service: Replace migration status string with enum Using a string was sufficient while this status was only exposed through the REST API, but the next patches will also consume it internally. Use an enum for the internal representation and convert it back to the existing string values in the REST API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:59:05 +03:00
Nikos Dragazis	a00056381f	utils: Add UUID::is_name_based() The UUID class already provides `is_timestamp()` for identifying time-based (version 1) UUIDs. Add the analogous `is_name_based()` predicate for version 3 (name-based) UUIDs, along with a test. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-17 20:58:39 +03:00
Radosław Cybulski	9a6aed721b	alternator: add streams with tablets tests Add tests for Streams, when table uses tablets underneath. One test verifies filtering using CHILD_SHARDS feature. Other one makes sure we get read all data while the table undergoes tablet count change. Add `--tablet-load-stats-refresh-interval-in-seconds=1` to `alternator/run` script, as otherwise newly added tests will fail. The setting changes how often scylla refreshes tablet metadata. This can't be done using `scylla_config_temporary`, as 1) default is 60 seconds 2) scylla will wait full timeout (60s) to read configuration variable again.	2026-04-17 18:58:27 +02:00
Radosław Cybulski	6be16cf224	alternator: remove antitablet guards when using Streams Remove `if` condition, that prevented tables with tablets working with Streams. Remove a test, that verifies, that Alternator will reject tables with tablets underneath working with Streams feature enabled on them. Update few tests, that were expected to fail on tablets to enable their normal execution.	2026-04-17 18:58:26 +02:00
Radosław Cybulski	d5df3ec07c	alternator: implement streams for tablets Add a code, that will handle Streams reading, when table is using tablets underneath. Fixes #23838	2026-04-17 18:57:44 +02:00
Radosław Cybulski	eb35a7b6ce	treewide: add cdc helper functions to system_keyspace Add helper functions to `system_keyspace` object, that deal with reading cdc content for tablet based table's. `read_cdc_for_tablets_current_generation_timestamp` will read current generation's timestamp. `read_cdc_for_tablets_versioned_streams` will build timestamp -> `cdc::streams_version` map similar to how `system_distributed_keyspace::cdc_get_versioned_streams` works. We're adding those helper functions, because their siblings in `system_distributed_keyspace` work only, when base table is backed up by vnodes. New additions work only, when base table is backed up by tablets.	2026-04-17 18:57:44 +02:00
Radosław Cybulski	d93299b605	alternator: add system_keyspace reference Add a reference to `system_keyspace` object to `executor` object in alternator. The reference is needed, because in future commit we will add there (and use) helper functions that read `cdc_log` tables for tablet based tables similarly to already existing siblings for vnodes living in `system_distributed_keyspace`.	2026-04-17 18:57:43 +02:00
Radosław Cybulski	04b9d3875f	alternator: remove unnecesary code After our fix, that prevents no-op changes being written into cdc log we will remove Piotr Wieczorek's previous attempt, which is now unnecesary.	2026-04-17 18:02:00 +02:00
Radosław Cybulski	6e5aaa85b6	alternator: fix Alternator writing unnecesary cdc entries Work in this patch is a result of two bugs - spurious MODIFY event, when remove column is used in `update_item` on non-existing item and spurious events, when batch write item mixed noop operations with operations involving actual changes (the former would still emit cdc log entries). The latter issue required rework of Piotr Wieczorek's algorithm, which fixed former issue as well. Piotr Wieczorek previously wrote checks, that should prevent unnecesary cdc events from being written. His implementation missed the fact, that a single `mutation` object passed to cdc code to be analysed for cdc log entries can contain modifications for multiple rows (with the same timestamp - for example as a result to BatchWriteItem call). His code tries to skip whole `mutation`, which in such case is not possible, because BatchWriteItem might have one item that does nothing and second item that does modification (this is the reason for the second bug). His algorithm was extended and moved. Originally it was working as follows - user would sent a `mutation` object with some changes to be "augmented". The cdc would process those changes and built a set of cdc log changes based on them, that would be added to cdc log table. Piotr added a `should_skip` function, which processes user changes and tried to determine if they all should be dropped or not. New version, instead of trying to skip adding rows to cdc log `mutation` object, builds a rows-to-ignore set. After whole cdc log `mutation` object is completed, it processes it and go through it row by row. Any row that was previously added to a `rows_to_ignore` set will now be removed. Remaining rows are written to new cdc log `mutation` with new clustering key (`cdc$batch_seq_no` index value should probably be consecutive - we just want to be safe here) and returns new `mutation` object to be sent to cdc log table. The first bug is fixed as a side effect of new algorithm, which contains more precise checks detecting, if given mutation actually made a difference. Fixes: #28368 Fixes: SCYLLADB-538 Fixes: SCYLLADB-1528 Refs: #28452	2026-04-17 18:00:25 +02:00
Botond Dénes	6ce0968960	compaction: release GC'ed sstables incrementally during compaction Garbage collected sstables created during incremental compaction are deleted only at the end of the compaction, which increases the memory footprint. This is inefficient, especially considering that the related input sstables are released regularly during compaction. This commit implements incremental release of GC sstables after each output sstable is sealed. Unlike regular input sstables, GC sstables use a different exhaustion predicate: a GC sstable is only released when its token range no longer overlaps with any remaining input sstable. This is because GC sstables hold tombstones that may shadow data in still-alive overlapping input sstables; releasing them prematurely would cause data resurrection. Fixes #5563 Closes scylladb/scylladb#28984	2026-04-17 18:20:47 +03:00
Radosław Cybulski	2894542e57	alternator: add failing tests for Streams Add failing tests for Streams functionality. Trying to remove column from non-existing item is producing a MODIFY event (while it should none). Doing batch write with operations working on the same partition, where one operation is without side effects and second with will produce events for both operations, even though first changes nothing. First test has two versions - with and without clustering key. Second has only with clustering key, as we can't produce batch write with two items for the same partition - batch write can't use primary key more than once in single call. We also add a test for batch write, where one of three operations has no observable side effects and should not show up in Streams output, but in current scylla's version it does show.	2026-04-17 16:28:14 +02:00
Piotr Smaron	218f8adc8f	transport: add per-service-level cql_requests_serving metric Add a per-scheduling-group gauge that tracks the number of in-flight CQL requests for each service level. The existing scylla_transport_requests_serving metric is a single global per-shard counter; the new metric breaks it down by scheduling group so operators can see which service level contributes the most in-flight requests when debugging latency. The metric is named cql_requests_serving (exposed as scylla_transport_cql_requests_serving) following the cql_ prefix convention used by all other per-scheduling-group transport metrics (cql_requests_count, cql_request_bytes, cql_response_bytes, cql_pending_response_memory). Using a cql_ prefix avoids Prometheus confusion with the global requests_serving metric, which lacks the scheduling_group_name label. The counter is incremented when a request enters process_request() and decremented in the same 'leave' defer block as the global requests_serving, ensuring the request is counted as in-flight until the response is sent.	2026-04-17 15:07:14 +02:00
Piotr Smaron	4988077249	transport: move requests_serving decrement to after response is sent The requests_serving metric was decremented right after query processing completed, but before the response was written to the client. This means requests whose responses were queued in the write pipeline were no longer counted as in-flight, understating the actual load. Move the decrement into the 'leave' defer block, which fires after the response is fully sent via _ready_to_respond. This makes the shedding check (max_concurrent_requests_per_shard) more accurate: requests that have finished processing but are still waiting in the response queue now correctly count toward the in-flight limit.	2026-04-17 15:05:29 +02:00
Botond Dénes	6eb2d15f39	Merge 'Replace CAS estimated histogram with estimated_histogram_with_max' from Amnon Heiman ScyllaDB uses estimated_histogram in many places. We already have a more efficient alternative: estimated_histogram_with_max. It is both CPU- and memory-efficient, and it can be exported as Prometheus native histograms. Its main limitation (which also has benefits) is that the bucket layout is fixed at compile time, so histograms with different configurations cannot be mixed. The end goal is to replace all uses of estimated_histogram in the codebase. That migration requires a few small API adjustments, so it is done in steps. This PR replaces estimated_histogram for CAS contention. The PR includes a patch that adds functionality to the base approx_exponential_histogram, which will be used by the API. The specific histograms are defined in a single place and cover the range 1-100; this makes future changes easy. New feature, no need to backport Closes scylladb/scylladb#29017 * github.com:scylladb/scylladb: storage_proxy: migrate CAS contention histograms to estimated_histogram_with_max estimated_histogram.hh: Add bucket offset and count to approx_exponential_histogram	2026-04-17 13:12:59 +03:00
Andrzej Jackowski	e256d9f69d	test: retry get_coordinator_host() after topology coordinator stop After stopping the topology coordinator, a new topology coordinator may not yet be started when get_coordinator_host() is called. Make the function always retry via wait_for so that every caller is protected against this race. Fixes SCYLLADB-1553 Closes scylladb/scylladb#29489	2026-04-17 12:08:26 +02:00
Botond Dénes	fbcfe3f88f	test: use uuid4 for DockerizedServer container names to avoid collisions Container names were generated as {name}-{pid}-{counter}, where the counter is a per-process itertools.count. This scheme breaks across CI runs on the same host: if a prior job was killed abruptly (SIGKILL, cancellation) its containers are left running since --rm only removes containers on exit. A subsequent run whose worker inherits the same PID (common in containerized CI with small PID namespaces) and reaches the same counter value will collide with the orphaned container. Replace pid+counter with uuid.uuid4(), which generates a random UUID, making names unique across processes, hosts, and time without any shared state or leaking host identifiers. Fixes: SCYLLADB-1540 Closes scylladb/scylladb#29509	2026-04-17 11:56:51 +02:00
Botond Dénes	57f8be49e9	Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov The PR serves two purposes. First, it makes the flag usage be consistent across multiple ways to load sstables components. For example, the sstable::load_metadata() doesn't set it (like .load() does) thus potentially refusing to load "corrupted" components, as the flag assumes. Second, it removes the fanout of db.get_config().ignore_component_digest_mismatch() over the code. This thing is called pretty much everywhere to initialize the sstable_open_config, while the option in question is "scylla state" parameter, not "sstable opening" one. Code cleanup, not backporting Closes scylladb/scylladb#29513 * github.com:scylladb/scylladb: sstables: Remove ignore_component_digest_mismatch from sstable_open_config sstables: Move ignore_component_digest_mismatch initialization to constructor sstables: Add ignore_component_digest_mismatch to sstables_manager config	2026-04-17 12:54:17 +03:00
Avi Kivity	cad3c0de94	test: write minio log to testlog dir for Jenkins artifact collection Write the MinIO server log directly to tempdir_base (testlog/<arch>/) instead of the per-server temp directory that gets destroyed on shutdown. This preserves the log for Jenkins artifact collection, helping debug S3-related flaky test failures like the stcs_reshape_overlapping_s3_test hang (SCYLLADB-1481). Closes scylladb/scylladb#29458	2026-04-17 12:51:55 +03:00
Botond Dénes	facb50cbf9	Merge 'test.py: refactor test.py' from Andrei Chekun With the latest changes, there are a lot of code that is redundant in the test.py. This PR just cleans this code. Also, it narrows using dynamic scope for fixtures to test/alternator and test/cqlpy. All the rest by default will have module scope. test.py will be a wrapper for pytest mostly for CI use. As for now test.py have important part of calculating the number of threads to start pytest with. This is not possible to do in pytest itself. No backport needed, framework enhancement only. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-666 Closes scylladb/scylladb#28852 * github.com:scylladb/scylladb: test.py: remove testpy_test_fixture_scope test.py: add logger for 3rd party service test.py: delete dead code in test.py	2026-04-17 12:51:14 +03:00
Pawel Pery	7883f161bb	vector-store: fix creating local vector search indexes with a part of the partition key Users ought to have possibility to create the local index for Vector Search based only on a part of the partition key. This commits provides this by removing requirements of 'full partition key only' for custom local index. The commit updates docs to explain that local vector index can use only a part of the partition key. The commit implements cqlpy test to check fixed functionality. Fixes: SCYLLADB-953 Needs to be backported to 2026.1 as it is a fix for local vector indexes. Closes scylladb/scylladb#28931	2026-04-17 11:44:15 +02:00
Karol Nowacki	c643f321af	vector_search: decrease default connection timeout to 3s Decrease the default connection timeout to 3s to better align with the default CQL query timeout of 10s. The previous timeout allowed only one failover request in high availability scenario before hitting the CQL query timeout. By decreasing the timeout to 3s, we can perform up to three failover requests within the CQL query timeout, which significantly improves the chances of successfully completing the query in high availability scenarios. Fixes: SCYLLADB-95	2026-04-17 12:26:39 +03:00
Karol Nowacki	9269ca9cf7	vector_search: add unreachable node detection time config Add option `vector_store_unreachable_node_detection_time_in_ms` to control parameters related to detecting unreachable vector store nodes. This parameter is used to set the TCP connect timeout, keepalive parameters, and TCP_USER_TIMEOUT. By configuring these parameters, we can detect unreachable vector store nodes faster and trigger failover mechanisms in a timely manner.	2026-04-17 12:26:38 +03:00
Piotr Smaron	686029f52c	audit: disable caching for the audit log table The audit table had caching enabled by default, which provides no value since audit data is write-heavy and rarely read back through the cache. This wastes cache space that could be used for more important user data. Disable caching by setting keys and rows_per_partition to NONE and enabled to false, consistent with get_disabled_caching_options() and other system tables such as system.batchlog, system.large_partitions, and CDC log tables. Closes scylladb/scylladb#29506	2026-04-17 11:17:10 +02:00
Piotr Dulikowski	37fc1507f0	Merge 'Alternator: Add vector search support' from Nadav Har'El This series adds support for vector search in Alternator based on the existing implementation in CQL. The series adds APIs for `CreateTable` and `UpdateTable` to add or remove vector indexes to Alternator tables, `DescribeTable` to list them and check the indexing status, and `Query` to perform a vector search - which contacts the vector store for the actual ANN (approximate nearest neighbor) search. Correct functionality of these features depend on some features of the the vector store, that were already done (see https://github.com/scylladb/vector-store/pull/394). This initial implementation is fully functional, and can already be useful, but we do not yet support all the features we hope to eventually support. Here are things that we have not done yet, and plan to do later in follow-up pull requests: 1. Support a new optimized vector type ("V") - in addition to the "list of numbers" type supported in this version. 2. Allow choosing a different similarity function when creating an index, by SimilarityFunction in VectorIndex definition. 3. Allow choosing quantization (f32/f16/bf16/i8/b1) to ask the vector index to compress stored vectors. 4. Support oversampling and rescoring, defined per-index and per-query. 5. Support HNSW tuning parameters — maximum_node_connections, construction_beam_width, search_beam_width. 6. Support pre-filtering over key columns, which are available at the vector store, by sending the filter to the vector store (translated from DynamoDB filter syntax to the vector's store's filter syntax). A decision still need to be made if this will use KeyConditionExpression or FilterExpression. This version supports only post-filtering (with `FilterExpression`). 7. Support projecting non-key attributes into the index (Projection=INCLUDE and Projection=ALL), and then 1. pre-filtering using these attributes, and 2. efficiently return these attributes (using Select=ALL_PROJECTED_ATTRIBUTES, which today returns just the key columns). 8. Optimize the performance of `Query`, which today is inefficient for Select=ALL_ATTRIBUTES because it serially retrieves the matching items one at a time. 9. Returning the similarity scores with the items (the design proposes ReturnVectorSearchSimilarity). 10. Add more vector-search-specific metrics, beyond the metric we already have counting Query requests. For example separate latency and request-count metrics for vector-search Queries (distinct from GSI/LSI queries), and a metric accumulating the total Limit (K) across all vector search queries. 11. Consider how (and if at all) we want to run the tests in test/alternator/test_vector.py that need the vector store in the CI. Currently they are skipped in CI and only run manually (with `test/alternator/run --vs test_vector`). 12. UpdateTable 'Update' operation to modify index parameters. Only some can be modified, e.g., Oversampling. 13. Support for "local index" (separate index for each partition). 14. Make sure that vector search and Streams can be enabled concurrently on the same table - both need CDC but we need to verify that one doesn't confuse the other or disables options that the other needs. We can only do this after we have Alternator Streams running on tablets (since vector store requires tablets). Testing the new Alternator vector search end-to-end requires running both Scylla and the vector store together. We will have such end-to-end tests in the vector store repository (see https://github.com/scylladb/vector-store/pull/392), but we also add in this pull request many end-to-end tests written in Python, that can be run with the command "test/alternator/run --vs test_vector.py". The "--vs" option tells the run script to run both Scylla and the vector store (currently assumed to be in `.../vector-store/target/release/vector-store`). About 65% of the tests in this pull request check supported syntax and error paths so can run without the vector store, while about 35% of the tests do perform actual Query operations and require the vector store to be running. Currently, the tests that do require the vector store will not get run by CI, but can be easily re-run manually with `test/alternator/run --vs test_vector.py`. In total, this series includes 78 functional tests in 2200 lines of Python code. This series also includes documentation for the new Alternator feature and the new APIs introduced. You can see a more detailed design document here: https://docs.google.com/document/d/1cxLI7n-AgV5hhH1DTyU_Es8_f-t8Acql-1f58eQjZLY/edit Two patches in this series split the huge alternator/executor.cc, after this series continued to grow it and it reached a whoppng 7,000 lines. These patches are just reorganization of code, no functional changes. But it's time that we finally do this (Refs #5783), we can't just continue to grow executor.cc with no end... Closes scylladb/scylladb#29046 * github.com:scylladb/scylladb: test/alternator: add option to "run" script to run with vector search alternator: document vector search test/alternator: fix retries in new_dynamodb_session test/alternator: test for allowed characters in attribute names test/alternator: tests for vector index support alternator, vector: add validation of non-finite numbers in Query alternator: Query: improve error message when VectorSearch is missing alternator: add per-table metrics for vector query alternator: clean up duplicated code alternator: fix default Select of Query alternator: split executor.cc even more alternator: split alternator/executor.cc alternator: validate vector index attribute values on write alternator: DescribeTable for vector index: add IndexStatus and Backfilling alternator: implement Query with a vector index alternator: fix bug in describe_multi_item() alternator: prevent adding GSI conflicting with a vector index alternator: implement UpdateTable with a vector index alternator: implement DescribeTable with a vector index alternator: implement CreateTable with a vector index alternator: reject empty attribute names cdc: fix on_pre_create_column_families to create CDC log for vector search	2026-04-17 10:25:45 +02:00
Aleksandra Martyniuk	b4c0ad20cf	service: fix indentation	2026-04-17 09:58:08 +02:00
Aleksandra Martyniuk	88c55cf7ed	docs: update documentation	2026-04-17 09:58:08 +02:00
Aleksandra Martyniuk	2c0de7d9b3	test: test multi RF changes	2026-04-17 09:58:08 +02:00
Aleksandra Martyniuk	1b2b453782	service: tasks: allow aborting ongoing RF changes Allow aborting an ongoing RF change using task manager. RF change can only be aborted if: - it is currently paused (existing); - it is a multi-RF change that still has replicas to be added. In the second case, we set error for the request in system.topology_requests and set next_replication to replication_v2. This makes load balancer roll back the RF change.	2026-04-17 09:58:08 +02:00
Aleksandra Martyniuk	38bad5f316	cql3: allow changing RF by more than one when adding or removing a DC rf_rack_valid_keyspaces relies on the fact that replicas of base table and mv are streamed concurrently. This is no longer true for newly introduced method of adding a DC. Disable rf_rack_valid_keyspaces in test_mv_first_replica_in_dc to force the old method.	2026-04-17 09:58:08 +02:00
Aleksandra Martyniuk	1bafc8394c	service: handle multi_rf_change Extend keyspace_rf_change handler to handle multi_rf_change. multi_rf_change is allowed only if we add or remove DCs and the keyspace uses rack list replication factor. The handler adds the request id to topology::ongoing_rf_changes. The request is further processed by load balancer.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	8fb91e245f	service: implement make_rf_change_plan In make_rf_change_plan, load balancer schedules necessary migrations, considering the load of nodes and other pending tablet transitions. Requests from ongoing_rf_changes are processed concurrently, independently from one another. In each request racks are processed concurrently. No tablet replica will be removed until all required replicas are added. While adding replicas to each rack we always start with base tables and won't proceed with views until they are done (while removing - the other way around). Node availability is checked at two levels for extending actions: 1) In prepare_per_rack_rf_change_plan: the entire RF change request is aborted if any node in the target dc+rack is down, or if there are no live (non-excluded) nodes at all. Shrinking is never aborted. 2) In make_rf_change_plan: extending is skipped for a given round if any normal, non-excluded node in the target dc+rack is missing from the balanced node set. Shrinking always proceeds regardless. The resulting behavior per node state combination (extending only): - all up -> proceed - some excluded + some up -> proceed (excluded nodes are skipped) - any down node -> abort - all excluded (no live) -> abort When the last step is finished: - in system_schema.keyspaces: - next_replication is cleared; - new keyspace properties are saved (if request succeeded); - request is removed from ongoing_rf_changes; - the request is marked as done in system.topology_requests.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	89a17491db	service: add keyspace_rf_change_plan to migration_plan Add keyspace_rf_change_plan to migration_plan. The keyspace_rf_change_plan consists of: - completion - info about the request for which all migrations are done. Only one request can be completed at the time, even if more have finished migrations (the rest will be completed later). Based on it: - next_replication is cleared; - new keyspace properties are saved (only if succeeded); - request is removed from ongoing_rf_changes; - the request is marked as done in system.topology_requests. - aborts - info about requests that cannot complete because the required rf change is impossible (e.g. no available nodes in a required rack). Multiple requests can be aborted in a single plan. Based on each: - next_replication is set to current_replication (rolling back); - the request is marked as aborted with an error in system.topology_requests. The scheduled rebuilds will be kept in migration_plan::_migrations. Based on that the canonical_mutations are generated. Add update_topology_state_with_mixed_change and use it if any schema changes are required, i.e. if plan contains keyspace_rf_change_plan::completion.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	bcdab2e012	service: extend tablet_migration_info to handle rebuilds Make tablet_migration_info::{src,dst} optional, so that it can be reused by rebuild, for respectively leaving and pending replica.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	d41c5a7db4	service: split update_node_load_on_migration Split update_node_load_on_migration into decrease_node_load and increase_node_load - in the following changes for rebuilds we will need only one of those at the time.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	dd83666733	service: rearrange keyspace_rf_change handler In the following changes, keyspace_rf_change handler will also consider a change of RF by more than one. Rearrange the handler, so that it first chooses a kind of RF change and then creates relevant updates. Do not wrap the code in schedule_migration function, as we no longer need a quick return possibility.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	72bb3113ac	db: add columns to system_schema.keyspaces Add a new next_replication column to system_schema.keyspaces table. While there is an ongoing RF change: - next_replication keeps the target RF values; - existing replication_v2 column keeps initial RF values - the ones we started the RF change with. DESCRIBE KEYSPACE statement shows replication_v2. When there is no ongoing RF change for this keyspace, its next_replication is empty. In this commit no data is kept in the new column.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	751af38f2a	db: service: add ongoing_rf_changes to system.topology Following changes, will allow adding or removing all keyspace replicas in a DC with a single ALTER KEYSPACE. For such operations, the tablet load balancer needs to schedule rebuilds. To track which RF change requests require rebuilds, we maintain a vector of RF changes along with their ongoing rebuild phases. Add a new ongoing_rf_changes column to system.topology to keep track of those requests. In this commit no data is kept in the new column.	2026-04-17 09:58:07 +02:00
Aleksandra Martyniuk	7cdf7d62a2	gms: add keyspace_multi_rf_change feature	2026-04-17 09:58:05 +02:00
Łukasz Paszkowski	4657d9e32c	streaming: reject mutation fragments on critical disk utilization The stream_mutation_fragments RPC handler did not check is_in_critical_disk_utilization_mode before accepting incoming mutation fragments. This meant load-and-stream (nodetool refresh --load-and-stream) could push data onto a node at critical disk utilization, potentially filling the disk completely. Add a critical disk utilization check in the get_next_mutation_fragment lambda, throwing critical_disk_utilization_exception when the node is in critical mode. This mirrors the existing protection in stream_blob.cc. Also remove the xfail marker from the corresponding test added in the previous commit.	2026-04-17 09:31:26 +02:00
Avi Kivity	04b54f363b	Merge 'Enable vnodes-to-tablets migrations with arbitrary tokens' from Nikos Dragazis This PR removes the power-of-two token constraint from vnodes-to-tablets migrations, allowing clusters with randomly generated tokens to migrate without manual token reassignment. Previously, migrations required vnode tokens to be a power of two and aligned. In practice, these conditions are not met with Scylla's default random token assignment, so the constraint is a blocker for real-world use. With the introduction of arbitrary tablet boundaries in PR #28459, the tablet layer can now support arbitrary tablet boundaries. This PR builds on that capability to allow arbitrary vnode tokens during migration. When the highest vnode token does not coincide with the end of the token ring, the vnode wraps around, but tablets do not support that. This is handled by splitting it into two tablets: one covering the tail end of the ring and one covering the beginning. Testing has been updated accordingly: existing cluster tests now use randomly generated tokens instead of precomputed power-of-two values, and a new Boost test validates the wrap-around tablet boundary logic. Fixes SCYLLADB-724. New feature, no backport is needed. Closes scylladb/scylladb#29319 * github.com:scylladb/scylladb: test: Use arbitrary tokens in vnodes->tablets migration tests test: boost: Add test for wrap-around vnodes storage_service: Support vnodes->tablets migrations w/ arbitrary tokens storage_service: Hoist migration precondition	2026-04-17 00:46:35 +03:00
Andrei Chekun	745debe9ec	test.py: remove testpy_test_fixture_scope With migration to pyest this fixture is useless. Removing and setting the session to the module for the most of the tests. Add dynamic_scope function to support running alternator fixtures in session scope, while Test and TestSuite are not deleted. This is for migration period, later on this function should be deleted.	2026-04-16 22:08:33 +02:00
Andrei Chekun	21addb2173	test.py: add logger for 3rd party service With migration of preparation environment and starting 3rd party services to the pytest, they're output the logs to the terminal. So this PR binds them their own log file to avoid polluting the terminal.	2026-04-16 22:08:33 +02:00
Andrei Chekun	13770ab394	test.py: delete dead code in test.py With the latest changes, there are a lot of code that is redundant in the test.py. This PR just cleans this code. Changes in other files are related to cleaning code from the test.py, especially with redundant parameter --test-py-init and moving prepare_environment to pytest itself.	2026-04-16 22:08:31 +02:00
Avi Kivity	999e108139	Merge 'test: lib: fix broken retry in start_docker_service' from Dario Mirovic The retry loop in `start_docker_service` passes the parse callbacks via `std::move` into `create_handler` on each iteration. After the first iteration, the moved-from `std::function` objects are empty. All subsequent retries skip output parsing entirely and immediately treat the service as successfully started. This defeats the entire purpose of the retry mechanism. Fix by passing the callbacks by copy instead of move, so the original callbacks remain valid across retries. Fixes SCYLLADB-1542 This is a CI stability issue and should be backported. Closes scylladb/scylladb#29504 * github.com:scylladb/scylladb: test/lib: fix typos in proc_utils, gcs_fixture, and dockerized_service test: gcs_fixture: rename container from "local-kms" to "fake-gcs-server" test: fix proc_utils.cc formatting from previous commit test: lib: use unique container name per retry attempt test: lib: fix broken retry in start_docker_service	2026-04-16 21:48:25 +03:00
Radosław Cybulski	c5ed6b22ae	alternator: add CHILD_SHARDS filtering Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user need to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only list of stream shards, that are direct descendants of passed parent stream shard. Each stream shard cover a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is a Amazon's functionality and is required by KCL. Add unit tests. Fixes: #25160 Closes scylladb/scylladb#28189	2026-04-16 18:27:55 +03:00
Andrei Chekun	ba04e1e2c3	codeowners: add owner for the test framework Add @xtrey as a codeowner of the test framework Closes scylladb/scylladb#29518	2026-04-16 17:57:21 +03:00
Piotr Szymaniak	d0c3f78d76	test/alternator: extend local TTL streams timeout Increase the non-AWS wait in the TTL streams test to reduce vnode CI flakes caused by delayed expiration visibility. Fixes SCYLLADB-1556 Closes scylladb/scylladb#29516	2026-04-16 15:53:35 +03:00
copilot-swe-agent[bot]	ec7450bff8	topology_coordinator, tablets: Log active tablet transitions when going idle This will make debugging of stalled tablet transitions easier. We saw several issues when topology state machine was blocked by active tablet migrations, which was not obvious at first glance of the logs. Now it will be east to tell if tablet transitions are blocking progress and which transitions are stuck. Closes scylladb/scylladb#28616	2026-04-16 14:34:37 +03:00
Benny Halevy	05a00fe140	compaction_manager: fix use-after-free in postponed_compactions_reevaluation() drain() signals the postponed_reevaluation condition variable to terminate the postponed_compactions_reevaluation() coroutine but does not await its completion. When enable() is called afterwards, it overwrites _waiting_reevalution with a new coroutine, orphaning the old one. During shutdown, really_do_stop() only awaits the latest coroutine via _waiting_reevalution, leaving the orphaned coroutine still alive. After sharded::stop() destroys the compaction_manager, the orphaned coroutine resumes and reads freed memory (is_disabled() accesses _state). Fix by introducing stop_postponed_compactions(), awaiting the reevaluation coroutine in both drain() and stop() after signaling it, if postponed_compactions_reevaluation() is running. It uses an std::optional<future<>> for _waiting_reevalution and std::exchange to leave _waiting_reevalution disengaged when postponed_compactions_reevaluation() is not running. This prevents a race between drain() and stop(). While at it, fix typo in _waiting_reevalution -> _waiting_reevaluation. Fixes: SCYLLADB-1463 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#29443	2026-04-16 14:33:31 +03:00
Nadav Har'El	d3d5db37d7	test/alternator: add option to "run" script to run with vector search Add to test/alternator/run the option "-vs" which runs alongside with Scylla a vector store, to allow running Alternator tests with vector indexing. To get the vector store, do git clone git@github.com:scylladb/vector-store.git cargo build --release "run -vs" looks for an executable in ../vector-store/target/*/vector-store but can also be overridden by the VECTOR_STORE environment variable. test/alternator/run runs the vector store exactly like it runs Scylla - in a temporary directory, on a temporary IP address in the localhost subnet (127.0.0/8), killing it when the test end, and showing the output of both programs (Scylla and vector store). These transient runs of Scylla and vector store are configured to be able to communicate to each other. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:18 +03:00
Nadav Har'El	3d8463ccd2	alternator: document vector search This patch adds a new document, docs/alternator/vector-search.md, on the new vector search feature in Alternator. It introduces this feature, and the DynamoDB APIs that we extended to support it. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	164b0e37e1	test/alternator: fix retries in new_dynamodb_session The new_dynamodb_session() function had a bug which we never noticed because we hardly used it, but it became more noticable when the new test/alternator/test_vector.py started to use it: By default, boto3 retries a request up to 9 times when it encounters a retriable error (such as an Internal Server Error). We don't want such retries in our tests - it makes failures slower, but more importantly it can hide "flaky" bugs by retrying 9 times until it happens to succeed. The new_dynamodb_session() had code (copied from the dynamodb fixture) to set boto3's "max_attempts" configuration to 0, to disable this retry. But this code had an incorrect "if" to only be done if we're testing on "localhost". This is wrong: We almost never use "localhost" as the target of the test; Both test/cqlpy/run and test.py pick an IP address in the localhost subnet (127/8) and uses that IP address - not the string "localhost". This bug only existed in new_dynamodb_session() - the more commonly used "dynamodb" fixture didn't have this bug. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	858dee0b30	test/alternator: test for allowed characters in attribute names One of the tests in the previous patch checked that strange characters are allowed in attribute names used for vector indexing. It turns out we never had a test that verifies that regardless of vector indexes - any character whatsoever is allowed in attribute names. This is different from table names which are much more limited. So this patch adds the missing test. As usual, the new test also passes on DynamoDB, showing that these stange characters in attribute names are also allowed by DynamoDB.	2026-04-16 14:30:17 +03:00
Nadav Har'El	58538e18e8	test/alternator: tests for vector index support In this patch we add a large collection of basic functional tests for the vector index support, covering the CreateTable, UpdateTable, DescribeTable and Query operations and the various ways in which those are allowed to work - or expected to fail. These tests were written in parallel with writing the code so they (hopefully) cover all the corner cases considered during development, and make sure these corner cases are all handled correctly and will not regress in the future. Some of these tests do not involve querying of the index and focus on the structure of requests and the kind of syntax allowed. But other tests are end-to-end, requiring the vector store to be running and trying to index Alternator data and query it. These tests are marked "needs_vector_store", and are immediately skipped in Scylla is not configured to connect to a vector store. In a later patch we'll add a an option to test/alternator/run to be able to run these end-to-end tests by automatically running both Scylla and the Vector Store. We'll have additional end-to-end tests in the vector-store repository. Note that vector search is a new API feature that doesn't exist in DynamoDB, so we are adding new parameters and outputs to existing operations. The AWS SDKs don't normally allow doing that, so the test added here begins by teaching the Python SDK to use the new APIs we added. This piece of code can also be used by end-users to use vector search (at least in Python...) before we officially add this support to ScyllaDB's SDK wrappers.	2026-04-16 14:30:17 +03:00
Nadav Har'El	fe5a5a813f	alternator, vector: add validation of non-finite numbers in Query Non-finite numbers (Inf, NaN) don't make sense in vector search, and also not allowed in the DynamoDB API as numbers. But the parsing code in Query's QueryVector accepted "Inf" and "NaN" and then failed to send the request to the vector store, resulting in a strange error message. Let's fix it in the parsing code. We have a test (test_query_vectorsearch_queryvector_bad_number_string) that verifies this fix. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:17 +03:00
Nadav Har'El	aa070fae5b	alternator: Query: improve error message when VectorSearch is missing Before this patch, if we attempt a Query with IndexName is a vector index but forget a "VectorSearch" parameter, the error is misleading: The code expects a GSI or LSI, and when it can't find a GSI or LSI with that name, it reports that the index is missing. But this is not helpful. So in this patch we produce a more helpful message: That the index does exist, and is a vector index, so a "VectorSearch" parameter is mandatory and is missing.	2026-04-16 14:30:16 +03:00
Nadav Har'El	f932f94422	alternator: add per-table metrics for vector query The per-table metrics for Query were not incremented for the vector variant of the Query operations, only the global metrics were incremented. This patch fixes this oversight, and add a test that reproduces it (the new test fails before this patch, and passes after).	2026-04-16 14:30:16 +03:00
Nadav Har'El	8cf510e06c	alternator: clean up duplicated code De-duplicate some code introduced in earlier patches, such a two nearly-identical loops over the indexes (one to check if there is a vector index, the second to get its dimensions), and two nearly- identical chunks of code to get the item contents when there is or there isn't a clustering key. There should be no functional changes in this patch. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	f15c6634a7	alternator: fix default Select of Query In earlier patches, when Query'ing a vector index, we set the default Select to ALL_ATTRIBUTES. However, according to the DynamoDB documentation for Query, "If neither Select nor ProjectionExpression are specified, DynamoDB defaults to ALL_ATTRIBUTES when accessing a table, and ALL_PROJECTED_ATTRIBUTES when accessing an index." This default should also apply to vector index, so this patch fixes this. The new behavior is not only more compatible with DynamoDB, it is also much more efficient by default, as ALL_PROJECTED_ATTRIBUTES does not need to read from the base table - it returns the results that the vector store returned. Of course, if the user needs the more efficient ALL_ATTRIBUTES this option is still available - it's just no longer the default. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	2e274bbdba	alternator: split executor.cc even more This patch continues the effort to split the huge executor.cc (5000 lines before this patch) even more. In this patch we introduce a new source file, executor_util.cc, for various utility functions that are used for many different operations and therefore are useful to have in a header file. These utility functions will now be in executor_util.cc and executor_util.hh - instead of executor.cc and executor.hh. Various source files, including executor.cc, the executor_read.cc introduced in the previous patch, as well as older source files like as streams.cc, ttl.cc and serialization.cc, use the new header file. This patch removes over 700 lines of code from executor.cc, and also removes a large amount of utility functions declerations from executor.hh. Originally, executor.hh was meant to be about the interface that the Alternator server needs to execute the different DynamoDB API operations - and after this patch it returns closer to this original goal. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:16 +03:00
Nadav Har'El	751da00692	alternator: split alternator/executor.cc Already six years ago, in #5783, we noticed that alternator/executor.cc has grown too large. The previous patches added hundreds of more lines to it to implement vector search, and it reached a whopping 7,000 lines of code. This is too much. This patch splits from executor.cc two major chunks: 1. The implementation of read requests - GetItem, BatchGetItem, Query (base table, GSI/LSI, and vector-search), and Scan - was moved to a new source file alternator/executor_read.cc. The new file has 2,000 lines. 2. Moved 250 lines of template functions dealing with attribute paths and maps of them to a new header file, attribute_path.hh. These utilities are used for many different operations - various read operations use them for ProjectionExpression, and UpdateItem uses them for modifications to nested attributes, so we need the new header file from both executor.cc and executor_read.cc The remaining executor.cc is still pretty big, 5,000 lines, and contains write operations (PutItem, UpdateItem, DeleteItem, BatchWriteItem) as well as various table and other operations, and also many utility functions used by many types of operations, so we can later continue this refactoring effort. Refs #5783 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 14:30:10 +03:00
Emil Maskovsky	91df3795fc	encryption: cover system.raft table in system_info_encryption Extend system_info_encryption to encrypt system.raft SSTables. system.raft contains the Raft log, which may hold sensitive user data (e.g. batched mutations), so it warrants the same treatment as system.batchlog and system.paxos. During upgrade, existing unencrypted system.raft SSTables remain readable. Existing data is rewritten encrypted via compaction, or immediately via nodetool upgradesstables -a. Update the operator-facing system_info_encryption description to mention system.raft and add a focused test that verifies the schema extension is present on system.raft. Fixes: CUSTOMER-268 Backport: 2026.1 - closes an encryption-at-rest coverage gap: system.raft may persist sensitive user-originated data unencrypted; backport to the current LTS. Closes scylladb/scylladb#29242	2026-04-16 13:22:10 +02:00
Botond Dénes	d006c4c476	Merge 'Untie (partially) cql3/statements from db::config' from Pavel Emelyanov There's a bunch of db::config options that are used by cql3/statements/ code. For that they use data_dictionary/database as a proxy to get db::config reference. This PR moves most of these accessed options onto cql_config Options migrated to cql_config: 1. select_internal_page_size 2. strict_allow_filtering 3. enable_parallelized_aggregation 4. batch_size_warn_threshold_in_kb 5. batch_size_fail_threshold_in_kb 6. 7 keyspace replication restriction options 7. 2 TWCS restriction options 8. restrict_future_timestamp 9. strict_is_not_null_in_views (with view_restrictions struct) 10. enable_create_table_with_compact_storage Some options need special treatment and are still abused via database, namely: 1. enable_logstor 2. cluster_name 3. partitioner 4. endpoint_snitch Fixing components inter-dependencies, not backporting Closes scylladb/scylladb#29424 * github.com:scylladb/scylladb: cql3: Move enable_create_table_with_compact_storage to cql_config cql3: Move strict_is_not_null_in_views to cql_config cql3: Move restrict_future_timestamp to cql_config cql3: Move TWCS restriction options to cql_config cql3: Move keyspace restriction options to cql_config cql3: Move batch_size_fail_threshold_in_kb to cql_config cql3: Move batch_size_warn_threshold_in_kb to cql_config cql3: Move enable_parallelized_aggregation to cql_config cql3: Move strict_allow_filtering to cql_config cql3: Move select_internal_page_size to cql_config test: Fix cql_test_env to use updateable cql_config from db::config cql3: Add cql_config parameter to parsed_statement::prepare()	2026-04-16 14:04:43 +03:00
Botond Dénes	88a8324e68	erge 'db: store large data records in SSTable metadata and serve via virtual tables' from Benny Halevy `system.large_partitions`, `system.large_rows`, and `system.large_cells` store records keyed by SSTable name. When SSTables are migrated between shards or nodes (resharding, streaming, decommission), the records are lost because the destination never writes entries for the migrated SSTables. This patch series moves the source of truth for large data records into the SSTable's scylla metadata component (new `LargeDataRecords` tag 13) and reimplements the three `system.large_` tables as virtual tables that query live SSTables on demand. A cluster feature flag (`LARGE_DATA_VIRTUAL_TABLES`) gates the transition for safe rolling upgrades. When the cluster feature is enabled, each node drops the old system large_ tables and starts serving the corresponding tables using virtual tables that represent the large data records now stored on the sstables. Note that the virtual tables will be empty after upgrade until the sstables that contained large data are rewritten, therefore it is recommended to run upgrade sstables compaction or major compaction to repopulate the sstables scylla-metadata with large data records. 1. keys: move key_to_str() to keys/keys.hh — make the helper reusable across large_data_handler, virtual tables, and scylla-sstable 2. sstables: add LargeDataRecords metadata type (tag 13) — new struct with binary-serialized key fields, scylla-sstable JSON support, format documentation 3. large_data_handler: rename partition_above_threshold to above_threshold_result — generalize the struct for reuse 4. large_data_handler: return above_threshold_result from maybe_record_large_cells — separate booleans for cell size vs collection elements thresholds 5. sstables: populate LargeDataRecords from writer — bounded min-heaps (one per large_data_type), configurable top-N via `compaction_large_data_records_per_sstable` 6. test: add LargeDataRecords round-trip unit tests — verify write/read, top-N bounding, below-threshold behavior 7. db: call initialize_virtual_tables from shard 0 only — preparatory refactoring to enable cross-shard coordination 8. db: implement large_data virtual tables with feature flag gating — three virtual table classes, feature flag activation, legacy SSTable fallback, dual-threshold dedup, cross-shard collection Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1276 * Although this fixes a bug where large data entries are effectively lost when sstables are renamed or migrated, the changes are intrusive and do not warrant a backport Closes scylladb/scylladb#29257 * github.com:scylladb/scylladb: db: implement large_data virtual tables with feature flag gating db: call initialize_virtual_tables from shard 0 only test: add LargeDataRecords round-trip unit tests sstables: populate LargeDataRecords from writer large_data_handler: return above_threshold_result from maybe_record_large_cells large_data_handler: rename partition_above_threshold to above_threshold_result sstables: add LargeDataRecords metadata type (tag 13) sstables: add fmt::formatter for large_data_type keys: move key_to_str() to keys/keys.hh	2026-04-16 14:03:31 +03:00
Pavel Emelyanov	4d352c7cf5	sstables: Remove ignore_component_digest_mismatch from sstable_open_config The ignore_component_digest_mismatch flag is now initialized at sstable construction time from sstables_manager::config (which is populated from db::config at boot time). Remove the flag from sstable_open_config struct and all call sites that were setting it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:49:14 +03:00
Pavel Emelyanov	9107e055b3	sstables: Move ignore_component_digest_mismatch initialization to constructor Initialize the ignore_component_digest_mismatch flag from sstables_manager::config in the sstable constructor initializer list instead of in load(). This ensures the flag value is set at construction time when the manager config is available, rather than at load time. Mark the member const to reflect its immutability after construction. Fixes the bootstrap path which now correctly reads the flag from manager config initialized from db::config at boot time, instead of using the default value. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:49:00 +03:00
Pavel Emelyanov	8abfd9af00	sstables: Add ignore_component_digest_mismatch to sstables_manager config Copy the ignore_component_digest_mismatch flag from db::config to sstables_manager::config during database initialization. This makes the flag available early in the boot process, before SSTables are loaded, enabling later commits to move the flag initialization from load-time to construction-time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:48:49 +03:00
Nadav Har'El	83670d2493	alternator: validate vector index attribute values on write When a table has a vector index, writes to the indexed attribute (via PutItem, UpdateItem, or BatchWriteItem) must supply a value that is a vector of the appropriate length: It must be a list of exactly the declared number of elements, where each element is a numeric type ("N") representable as a 32-bit float. Before this patch, invalid values were silently accepted and the item was simply not indexed (it was skipped by the vector store when it read this item). Now these writes are rejected with a ValidationException. This is analogous to the existing validation of GSI/LSI key attribute values - in DynamoDB after a certain attribute becomes the key of a GSI or LSI, the user is no longer allowed to write the same type. The implementation we add here is also analogous to the implementation of the GSI/LSI key validation. The GSI/LSI key validation is done by validate_value_if_index_key / si_key_attributes, and in this patch we add the vector-index parallels: vector_index_attributes() collects the attribute name and declared dimensions for every vector index in the schema, and validate_value_if_vector_index_attribute() enforces the type limitations. For efficiency in the common case where a table has no vector indexes and no GSIs/LSIs, both validation functions are out-of-line and each call site guards the call with an explicit empty() check, so no function-call overhead is incurred when there is nothing to validate. For UpdateItem, the map of vector index attributes is cached in update_item_operation (alongside the existing _key_attributes cache) to avoid recomputing it on every call to update_attribute().	2026-04-16 13:31:49 +03:00
Nadav Har'El	aea7b6a66b	alternator: DescribeTable for vector index: add IndexStatus and Backfilling Add to DescribeTable's output for VectorIndexes two fields - IndexStatus and Backfilling - which are intended to exactly mirror these two fields that exist for GlobalSecondaryIndexes: When a vector index is added, IndexStatus is "CREATING" before the index is usable, and "ACTIVE" when it is finally usable for a Query. During "CREATING" phase, "Backfilling" may be set to true when the index is currently being backfilled (the table is scaned and an index is built). A user is expected to call DescribeTable in a loop after creating a vector index (via either CreateTable and UpdateTable) and only call Query on the index after the IndexStatus is finally ACTIVE. Calling Query earlier, while IndexStatus is still CREATING, will result in an error. In the current implementation, Alternator does not track the state of the vector index, so it needs to contact the vector store to inquire about the state of the index - using a new function introduced in this patch that uses an existing vector-store API. This makes DescribeTable slower on tables that have vector indexes, because the vector store is contacted on every DescribeTable call. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 13:31:49 +03:00
Nadav Har'El	e43a2e5086	alternator: implement Query with a vector index We introduce to the Query request a new "VectorSearch" parameter, which take a mandatory "QueryVector" (a value which must be a numeric vector of the right length) and "Limit". The "Limit" of a vector search (Query with VectorSearch) determines the number of nearest neighbors to return, and does not allow pagination (ExclusiveKeyStart is not allowed). ConsistentRead=True is also not allowed on a vector search query. The "Select"/"ProjectionExpression"/"AttributesToGet" parameters are also supported, requesting which attributes to fetch. Using Select= ALL_PROJECTED_ATTRIBUTES means read only the attributes found in the vector index - currently only the key columns - so it is significantly faster than ALL_ATTRIBUTES because it doesn't require reading the items from the base table. The "FilterExpression" parameter is also supported. Like in DynamoDB's traditional Query, this does post-filtering, i.e., removing some of the results returned by the vector index that don't match the filter, and as a result fewer than Limit results may be returned. Pre-filtering (done on the vector store, and always returns Limit results) is not yet implemented.	2026-04-16 13:31:47 +03:00
Nadav Har'El	68e34c57e1	alternator: fix bug in describe_multi_item() In commit `a55c5e9ec7`, the function describe_multi_item() got a new item_callback parameter, that can be used to calculate the size of the item. This new parameter has a default, an empty noncopyable_function. But an empty noncopyable_function shouldn't be called - exactly like std::function, it throws std::bad_function_call if called when empty. So describe_multi_item() should only call this item_callback if it's not empty. This became a problem in the next patch, implementing vector search query, which called describe_multi_item with the default item_callback. But in general, the function should be usable with the default parameter (or we shouldn't have defined a default value for this parameter!). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 13:30:02 +03:00
Nadav Har'El	ffe1029b7c	alternator: prevent adding GSI conflicting with a vector index All the "indexes" we implement in Alternator - GSI, LSI and the new vector index - share the same IndexName namespace, which we'll use in Query to refer to the index. In the previous patch we already prevented adding a vector index with the same name as an existing GSI or LSI. In this patch we also prevent the reverse - adding a GSI with the name of an existing vector index. Additionally, one cannot add a GSI on a key that is already the key of a vector index: The types conflict: The key of a vector index must be a vector column, while the key of a GSI must have a standard key type (string, binary or number). We have tests for this later, this the big test patch.	2026-04-16 13:30:02 +03:00
Nadav Har'El	82de16f92c	alternator: implement UpdateTable with a vector index After an earlier patch allowed CreateTable to create vector indexes together with a table, in this patch we add to UpdateTable the ability to add a new vector index to an existing table, as well as the ability to delete a vector index from an existing table. The implementation is inspired by DynamoDB's syntax for GSI - just like GSI has GlobalSecondaryIndexUpdates with "Create" and "Delete" operations, for vector indexes we have VectorIndexUpdates supporting Create and Delete. "Update" is not yet supported - we didn't implement yet any parameter that can be updated - but we can easily implement it in the future.	2026-04-16 13:30:02 +03:00
Nadav Har'El	217090a996	alternator: implement DescribeTable with a vector index In this patch we add to DescribeTable the ability to list the vector indexes enabled on an Alternator table.	2026-04-16 13:30:02 +03:00
Nadav Har'El	e156d67177	alternator: implement CreateTable with a vector index ScyllaDB supports the "vector search" feature in CQL. In this patch we start the path to adding vector search support also to Alternator. In this patch, we implement CreateTable support - allowing the user to enable vector search in a new table. The following patches will enable additional operations like UpdateTable (adding a vector index to an existing table or deleting a vector index to an existing table) and DescribeTable. Extensive tests for all these features will come at the end of the series. Those tests were written in parallel with writing this implementation so cover (hopefully) every nook and cranny of the imlementation.	2026-04-16 13:29:58 +03:00
Nadav Har'El	0afc730b7b	alternator: reject empty attribute names Alternator has a function validate_attr_name_length() used to validate an attribute name passed in different operations like PutItem, UpdateItem, GetItem, etc. It fails the request if the attribute name is longer than 65535 characters. It turns out that we forgot to check if the attribute name length isn’t 0 - which should be forbidden as well! This patch fixes the validation code, and also adds a test that confirms that after this patch empty attribute names are rejected - just like DynamoDB does - whereas before this patch they were silently accepted. We want to fix this issue now, because in a later patch we intend to use the same validation function also for vector indexes - and want it to be accurate. Fixes SCYLLADB-1069. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 13:28:15 +03:00
Nadav Har'El	8948a50f3b	cdc: fix on_pre_create_column_families to create CDC log for vector search The vector-search feature, which is already supported in CQL, introduced the somewhat confusing feature of enabling CDC without explicitly enabling CDC: When a vector index is enabled on a table, CDC is "enabled" for it even if the user didn't ask to enable CDC. For this, some code in cdc/log.cc began to use cdc_enabled() instead of checking schema.cdc_options.enabled() directly. This cdc_enabled() function checks if either this enabled() is true, or has_vector_index() is true. But there's another twist to this story: To write with CDC, we also need to create the CDC log table: 1. In CQL, a vector index can only be added on an existing table (with CREATE INDEX), so the hook on_before_update_column_family() is the one that noticed that a vector index was added, and created the CDC log table. 2. But in Alternator, a vector index can be created up-front with a brand-new table (in CreateTable), so the hook for a new table - on_pre_create_column_families(), also needs to create the CDC log table. It already did, but incorrectly checked just the explicit CDC-enabled flag instead of the new cdc_enabled() function that also allows vector index. So this patch just fixes on_pre_create_column_families to use cdc_enabled(). Before this patch, when a vector index will be created in Alternator with CreateTable, an attempt to write to the table (PutItem) will fail because it will try to write to the CDC log, which wasn't created. After this patch, it works. The reproducing test is test_putitem_vectorindex_createtable (introduced in a later patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-16 13:28:15 +03:00
Roy Dahan	d2d7604188	ci: pin GitHub Actions to commit SHAs and migrate to Node.js 24 Pin all external GitHub Actions to full commit SHAs and upgrade to their latest major versions to reduce supply chain attack surface: - actions/checkout: v3/v4/v5 -> v6.0.2 - actions/github-script: v7 -> v8.0.0 - actions/setup-python: v5 -> v6.2.0 - actions/upload-artifact: v4 -> v7.0.0 - astral-sh/setup-uv: v6 -> v8.0.0 - mheap/github-action-required-labels: v5.5.2 (pinned) - redhat-plumbers-in-action/differential-shellcheck: v5.5.6 (pinned) - codespell-project/actions-codespell: v2.2 (pinned, was @master) Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true in all 21 workflows that use JavaScript-based actions to opt into the Node.js 24 runtime now. This resolves the deprecation warning: "Node.js 20 actions are deprecated. Please check if updated versions of these actions are available that support Node.js 24. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026." See: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/ scylladb/github-automation references are intentionally left at @main as they are org-internal reusable workflows. Fixes: SCYLLADB-1410 Backport: Backport is required for live branches that run GH actions: 2026.1, 2025.4, 2025.1 and 2024.1 Closes scylladb/scylladb#29421	2026-04-16 13:03:33 +03:00
Pavel Emelyanov	207d3b4a68	test_backup: Remove create_schema() helper Test Remove the create_schema() helper function and inline its logic directly into the four call sites. This simplifies the code by eliminating a trivial wrapper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29406	2026-04-16 12:57:26 +03:00
Botond Dénes	830d28a889	Merge 'Use standard helpers to create ks:cf and populate it in test_backup.py' from Pavel Emelyanov The PR removed the create_and_ks() helper from backup test and patches all callers to create keyspace, table and populate them with standard explicit facilities. While patching it turned out that one test doesn't need to populate the table, so it even becomes tiny bit shorter and faster Enhancing test, not backporting Closes scylladb/scylladb#29417 * github.com:scylladb/scylladb: test_backup: Remove create_ks_and_cf helper Test test_backup: Replace create_ks_and_cf with async patterns Test test_backup: Add if-True blocks for indentation Test	2026-04-16 12:54:21 +03:00
Nikos Dragazis	7abcf94823	test: Use arbitrary tokens in vnodes->tablets migration tests The migration tests used to start nodes with pre-computed power-of-two tokens. This was required because the migration itself only supported power-of-two aligned tokens. Now that arbitrary tokens are supported, switch the tests to use Scylla's default random token assignment. Switching to arbitrary tokens makes the tests non-deterministic, but the migration aspects that are affected by the token distribution (resharding, wrap-around vnode split) are out of scope for these tests and covered by dedicated tests. Add a `get_all_vnode_tokens()` helper that queries system.topology at runtime to discover the actual token layout, and derive expected tablet counts from that. Also account for the possible extra wrap-around tablet when the last vnode token does not coincide with MAX_TOKEN. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:47:27 +03:00
Nikos Dragazis	26f0c038af	test: boost: Add test for wrap-around vnodes Add a Boost test to verify that `prepare_for_tablets_migration()` produces the correct tablet boundaries when a wrap-around vnode exists. Tablets cannot wrap around the token ring as vnodes do; the last token of the last tablet must always be MAX_TOKEN. When the last vnode token does not coincide with MAX_TOKEN, the wrap-around vnode must be split into two tablets. The test is parameterized over both cases: unaligned (split expected) and aligned (no split expected). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:47:16 +03:00
Botond Dénes	c355df4461	Merge 'test: Lower default log level from DEBUG to INFO' from Artsiom Mishuta 1. test.py — Removed --log-level=DEBUG flag from pytest args 2. test/pytest.ini — Changed log_level to INFO (that was set DEBUG in test.py), changed log_file_level from DEBUG to INFO, added clarifying comments +minor fix [test/pylib: save logs on success only during teardown phase](`0ede308a04`) Previously, when --save-log-on-success was enabled, logs were saved for every test phase (setup, call, teardown)in 3 files. Restrict it to only the teardown phase, that contains all 3 in case of test success, to avoid redundant log entries. Closes scylladb/scylladb#29086 * github.com:scylladb/scylladb: test/pylib: save logs on success only during teardown phase test: Lower default log level from DEBUG to INFO	2026-04-16 12:46:11 +03:00
Nikos Dragazis	098732ff76	storage_service: Support vnodes->tablets migrations w/ arbitrary tokens The vnodes-to-tablets migration creates tablet maps that mirror the vnode layout: one tablet per vnode, preserving token boundaries and replica placement. However, due to tablet restrictions, the migration requires vnode tokens to be a power of two and uniformly distributed across the token ring. In practice, this restriction is too limiting. Real clusters use randomly generated tokens and a node's token assignment is immutable. To solve this problem, prior work (`01fb97ee78`) has been done to relax the tablet constraints by allowing arbitrary tablet boundaries, removing the requirement for power-of-two sizing and uniform distribution. This patch leverages the relaxed tablet constraints to enable tablet map creation from arbitrary vnode tokens: * Removes all token-related constraints. * Handles wrap-around vnodes. If a vnode wraps (i.e., the highest vnode token is not `dht::token::last()`), it is split into two tablets: - (last_vnode_token, dht::token::last()] - [dht::token::first(), first_vnode_token] The migration ops guide has been updated to remove the power-of-two constraint. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:39:23 +03:00
Nikos Dragazis	8ea8c05120	storage_service: Hoist migration precondition `prepare_for_tablets_migration()` is idempotent; it filters out tables that already have tablet maps and returns early if no tablet maps need to be created. However, this precondition is currently misplaced. Move it higher to skip extra work. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:19:34 +03:00
Botond Dénes	9bfcc25cf7	Merge 'streaming: stream_blob: hold table for streaming' from Michael Litvak When initializing streaming sources in tablet_stream_files_handler we use a reference to the table. We should hold the table while doing so, because otherwise the table may be dropped and destroyed when we yield. Use the table.stream_in_progress() phaser to hold the table while we access it. For sstable file streaming we can release the table after the snapshot is initialized, and the table may be dropped safely because the files are held by the snapshot and we don't access the table anymore. There was a single access to the table for logging but it is replaced by a pre-calculated variable. For logstor segment streaming, currently it doesn't support discarding the segments while they are streamed - when the table is dropped it discard the segments by overwriting and freeing them, so they shouldn't be accessed after that. Therefore, in that case continue to hold the table until streaming is completed. Fixes [SCYLLADB-1533](https://scylladb.atlassian.net/browse/SCYLLADB-1533) It's a pre-existing use-after-free issue in sstable file streaming so should be backported to all releases. It's also made worse with the recent changes of logstor, and affects also non-logstor tables, so the logstor fixes should be in the same release (2026.2). [SCYLLADB-1533]: https://scylladb.atlassian.net/browse/SCYLLADB-1533?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29488 * github.com:scylladb/scylladb: test: test drop table during streaming streaming: stream_blob: hold table for streaming	2026-04-16 12:12:42 +03:00
Dario Mirovic	50e498ac0d	test/lib: fix typos in proc_utils, gcs_fixture, and dockerized_service Fix assorted typos in comments, strings, and identifiers: - path_preprend -> path_prepend (proc_utils.hh, proc_utils.cc) - laúnch -> launch (proc_utils.cc) - hand/fail -> hang/fail (dockerized_service.py) - inconvinient -> inconvenient (dockerized_service.py) - priviledges -> privileges (gcs_fixture.hh) - remove double semicolon (gcs_fixture.cc) Refs SCYLLADB-1542	2026-04-16 10:58:55 +02:00
Dario Mirovic	11b5997eaf	test: gcs_fixture: rename container from "local-kms" to "fake-gcs-server" The GCS fixture's fake-gcs-server container was named "local-kms", copy-pasted from the AWS KMS fixture. It happened when both were refactored to use the shared start_docker_service helper (`bc544eb08e`). Rename to "fake-gcs-server" to match the Python-side naming and avoid confusion in logs. Refs SCYLLADB-1542	2026-04-16 10:58:52 +02:00
Dario Mirovic	dc7f848bf8	test: fix proc_utils.cc formatting from previous commit Fix indentation of lines moved inside the for-loop in start_docker_service (lines 208-225). Refs SCYLLADB-1542	2026-04-16 10:55:48 +02:00
Dario Mirovic	be4d32c474	test: lib: use unique container name per retry attempt The container name is generated once before the retry loop, so all retry attempts reuse the same name. Move the name generation inside the loop so each attempt gets a fresh name via the incrementing counter, consistent with the comment "publish port ephemeral, allows parallel instances". Formatting changes (indentation) of lines 208-225 in test/lib/proc_utils.cc will be fixed in the next commit. Refs SCYLLADB-1542	2026-04-16 10:55:04 +02:00
Botond Dénes	33682fd14e	Merge 'sstables/storage_manager: fix race between object storage config update and keyspace creation' from Dimitrios Symonidis Previously, config_updater used a serialized_action to trigger update_config() when object_storage_endpoints changed. Because serialized_action::trigger() always schedules the action as a new reactor task (via semaphore::wait().then()), there was a window between the config value becoming visible to the REST API and update_config() actually running. This allowed a concurrent CREATE KEYSPACE to see the new endpoint via is_known_endpoint() before storage_manager had registered it in _object_storage_endpoints. Now config observers run synchronously in a reactor turn and must not suspend. Split the previous monolithic async update_config() coroutine into two phases: - Sync (in the observer, never suspends): storage_manager::_object_storage_endpoints is updated in place; for already-instantiated clients, update_config_sync swaps the new config atomically - Async (per-client gate): background fibers finish the work that can't run in the observer — S3 refreshes credentials under _creds_sem; GCS drains and closes the replaced client. Config reloads triggered by SIGHUP are applied on shard 0 and then broadcast to all other shards. An rwlock has been also introduced to make sure that the configuration has been propagated to all cores. This guarantees that a client requesting a config via the REST API will see a consistent snapshot Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-757 Fixes: [28141](https://github.com/scylladb/scylladb/issues/28141) Closes scylladb/scylladb#28950 * github.com:scylladb/scylladb: test/object_store: verify object storage client creation and live reconfiguration sstables/utils/s3: split config update into sync and async parts test_config: improve logging for wait_for_config API db: introduce read-write lock to synchronize config updates with REST API	2026-04-16 10:20:43 +03:00
Michael Litvak	43c76aaf2b	logstor: split log record to header and data Split the `log_record` to `log_record_header` type that has the record metadata fields and the mutation as a separate field which is the actual record data: struct log_record { log_record_header header; canonical_mutation mut; }; Both the header and mutation have variable serialized size. When a record is serialized in a write_buffer, we first put a small `record_header` that has the header size and data size, then the serialized header and data follow. The `log_location` of a record points to the beginning of the `record_header`, and the size includes the `record_header`. This allows us to read a record header without reading the data when it's not needed and avoid deserializing it: * on recovery, when scanning all segments, we read only the record headers. * on compaction, we read the record header first to determine if the record is alive, if yes then we read the data. Closes scylladb/scylladb#29457	2026-04-16 10:00:35 +03:00
Botond Dénes	8e7ba7efe2	Merge 'commitlog: fix segment replay order by using ordered map per shard' from Sergey Zolotukhin The commitlog replayer groups segments by shard using a std::unordered_multimap, then iterates per-shard segments via equal_range(). However, equal_range() does not guarantee iteration order for elements with the same key, so segments could be replayed out of order within a shard. Correct segment ordering is required for: - Fragmented entry reconstruction, which accumulates fragments across segments and depends on ascending order for efficient processing. - Commitlog-based storage used by the strongly consistent tables feature, which relies on replayed raft items being stored in order. Fix by changing the data structure from std::unordered_multimap<unsigned, commitlog::descriptor> to std::unordered_map<unsigned, utils::chunked_vector<commitlog::descriptor>> Since the descriptors are inserted from a std::set ordered by ID, the vector preserves insertion (and thus ID) order. The per-shard iteration now simply iterates the vector, guaranteeing correct replay order. Fixes: SCYLLADB-1411 Backport: It looks like this issue doesn't cause any trouble, and is required only by the strong consistent tables, so no backporting required. Closes scylladb/scylladb#29372 * github.com:scylladb/scylladb: commitlog: add test to verify segment replay order commitlog: fix replay order by using ordered map per shard	2026-04-16 09:55:27 +03:00
Łukasz Paszkowski	61877e9dfb	test/cluster/storage: Add a reproducer for load-and-stream out-of-space rejection Add `test_load_and_stream_rejected_on_critical_disk` which verifies that `nodetool refresh --load-and-stream` is rejected when the target node reaches critical disk utilization during streaming. The test is marked xfail because the stream_mutation_fragments handler does not yet check whether the node is in the critical disk utilization mode (introduced in the next patch). The test sets up a 3-node cluster, writes data and snapshots SSTables on one node, wipes another node's data, and copies the snapshot to its upload directory. It then starts load-and-stream and uses the `write_components_writer_created` error injection to pause SSTable writing. While paused, the test fills the disk past the critical threshold, then releases the injection. The next streamed mutation fragment is rejected with critical_disk_utilization_exception. The test verifies that: - The operation fails with the expected error. - No data is persisted on the target node. - Partial SSTable files created during streaming are deleted (via the implicit mark-for-deletion mechanism in the SSTable lifecycle).	2026-04-16 08:38:34 +02:00
Łukasz Paszkowski	8d34127684	sstables: clean up TemporaryHashes file in wipe() The TemporaryHashes.db.tmp file is created during SSTable writing to store intermediate bloom filter hashes and is deleted before the SSTable is sealed. Since it is not tracked in the TOC, it is also absent from _recognized_components and all_components(). When an SSTable write fails before sealing (e.g. streaming rejected due to critical disk utilization), wipe() is called to clean up the partial SSTable. However, wipe() only iterates over all_components(), so the TemporaryHashes file was left behind as an orphan. Previously, the only cleanup mechanism for this file was the startup-time directory scanner in sstable_directory, which would not help when the orphan needs to be cleaned up at runtime. Explicitly remove the TemporaryHashes file in wipe(), ignoring ENOENT for the common case where the file was already removed before sealing.	2026-04-16 08:38:34 +02:00
Łukasz Paszkowski	159675e975	sstables: add error injection point in write_components Add a `write_components_writer_created` error injection point in `sstable::write_components()` between writer creation and fragment consumption. This injection is needed by the out-of-space streaming test (added in the next patch) to reliably pause SSTable writing at the right moment: after the SSTable writer has been created and files exist on disk, but before mutation fragments are consumed. Pausing earlier (before writer creation) would not work because there are no files on disk yet, while pausing later (after consuming fragments) would be too late to reliably push the node into critical disk utilization.	2026-04-16 08:38:34 +02:00
Łukasz Paszkowski	d1a24aa16a	test/cluster/storage: extract validate_data_existence to module scope Move validate_data_existence out of test_user_writes_rejection into module scope so it can be reused by other tests in the file. No functional change.	2026-04-16 08:38:33 +02:00
Łukasz Paszkowski	9c82b76755	test/cluster: enable suppress_disk_space_threshold_checks in tests using data_file_capacity Tests that override disk capacity via the data_file_capacity config option trigger the disk space monitor's critical utilization mode and as a consequence activate out-of-space prevention mechanisms. This will cause bootstrap failures with critical_disk_utilization_exception during mutation-based streaming introduced later in the series. Enable the `suppress_disk_space_threshold_checks` error injection at startup in the affected tests to prevent the disk space monitor from interfering with the test-configured capacity values. Affected tests: - test_balance_empty_tablets (test/cluster/test_size_based_load_balancing.py) - test_load_stats_on_coordinator_failover (test/cluster/test_tablet_stats.py)	2026-04-16 08:38:33 +02:00
Łukasz Paszkowski	3726e31c03	utils/disk_space_monitor: add error injection to suppress threshold checks Add the `suppress_disk_space_threshold_checks` error injection point to the disk space monitor. When enabled, the threshold listener short-circuits without evaluating disk utilization. This is useful for tests that override disk capacity via `data_file_capacity`, where the real disk usage causes the monitor to incorrectly report critical utilization and activate out-of-space prevention mechanisms.	2026-04-16 08:38:33 +02:00
Pavel Emelyanov	335261f351	cql3: Move enable_create_table_with_compact_storage to cql_config Move enable_create_table_with_compact_storage option from db::config to cql_config. This improves separation of concerns by consolidating CQL-specific table creation policies in the cql_config structure. Update the CREATE TABLE statement prepare() function to use the new location for the configuration check. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:52:20 +03:00
Pavel Emelyanov	f20ede79f9	cql3: Move strict_is_not_null_in_views to cql_config Move strict_is_not_null_in_views option from db::config to cql_config via new view_restrictions sub-struct. This improves separation of concerns by keeping view-specific validation policies with other CQL configuration. Update prepare_view() to take view_restrictions reference instead of reaching into db::config, and update all callsites to pass the sub-struct. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:52:19 +03:00
Pavel Emelyanov	027c91f45e	cql3: Move restrict_future_timestamp to cql_config Move restrict_future_timestamp option from db::config to cql_config. This improves separation of concerns as timestamp validation is part of CQL query execution behavior. Update validate_timestamp() function signature to take cql_config reference instead of db::config, and update all callsites in modification_statement and batch_statement to pass cql_config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:51:53 +03:00
Pavel Emelyanov	7264581881	cql3: Move TWCS restriction options to cql_config Move twcs_max_window_count and restrict_twcs_without_default_ttl options from db::config to cql_config via new twcs_restrictions sub-struct. This improves separation of concerns by keeping TWCS-specific validation policies with other CQL configuration. Update check_restricted_table_properties() to remove unused db parameter and take twcs_restrictions reference instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:51:52 +03:00
Pavel Emelyanov	8b853505cd	cql3: Move keyspace restriction options to cql_config Introduce replication_restrictions, a sub-struct of cql_config, to hold the seven keyspace-level policy options that govern how CREATE/ALTER KEYSPACE statements are validated: - restrict_replication_simplestrategy - replication_strategy_warn_list / replication_strategy_fail_list - minimum/maximum_replication_factor_warn/fail_threshold Pass replication_restrictions into check_against_restricted_replication_strategies() instead of having it reach into db::config directly (via both qp.db().get_config() and qp.proxy().data_dictionary().get_config()). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:51:24 +03:00
Benny Halevy	ce00d61917	db: implement large_data virtual tables with feature flag gating Replace the physical system.large_partitions, system.large_rows, and system.large_cells CQL tables with virtual tables that read from LargeDataRecords stored in SSTable scylla metadata (tag 13). The transition is gated by a new LARGE_DATA_VIRTUAL_TABLES cluster feature flag: - Before the feature is enabled: the old physical tables remain in all_tables(), CQL writes are active, no virtual tables are registered. This ensures safe rollback during rolling upgrades. - After the feature is enabled: old physical tables are dropped from disk via legacy_drop_table_on_all_shards(), virtual tables are registered on all shards, and CQL writes are skipped via skip_cql_writes() in cql_table_large_data_handler. Key implementation details: - Three virtual table classes (large_partitions_virtual_table, large_rows_virtual_table, large_cells_virtual_table) extend streaming_virtual_table with cross-shard record collection. - generate_legacy_id() gains a version parameter; virtual tables use version 1 to get different UUIDs than the old physical tables. - compaction_time is derived from SSTable generation UUID at display time via UUID_gen::unix_timestamp(). - Legacy SSTables without LargeDataRecords emit synthetic summary rows based on above_threshold > 0 in LargeDataStats. - The activation logic uses two paths: when the feature is already enabled (test env, restart), it runs as a coroutine; when not yet enabled, it registers a when_enabled callback that runs inside seastar::async from feature_service::enable(). - sstable_3_x_test updated to use a simplified large_data_test_handler and validate LargeDataRecords in SSTable metadata directly.	2026-04-16 08:49:02 +03:00
Benny Halevy	cb6004b625	db: call initialize_virtual_tables from shard 0 only Move the smp::invoke_on_all dispatch from the callers into initialize_virtual_tables() itself, so the function is called once from shard 0 and internally distributes the per-shard virtual table setup to all shards. This simplifies the callers and allows a single place to add cross-shard coordination logic (e.g. feature-gated table registration) in future commits.	2026-04-16 08:49:02 +03:00
Benny Halevy	90d4ff34fb	test: add LargeDataRecords round-trip unit tests Add three new test cases to sstable_3_x_test.cc that verify the LargeDataRecords metadata written by the SSTable writer can be read back after open_data(): - test_large_data_records_round_trip: verifies partition_size, row_size, and cell_size records are written with correct field semantics when thresholds are exceeded - test_large_data_records_top_n_bounded: verifies the bounded min-heap keeps only the top-N largest entries per type - test_large_data_records_none_when_below_threshold: verifies no records are written when data is below all thresholds Also wire large_data_records_per_sstable from db_config into the test env's sstables_manager::config so that config changes propagate through the updateable_value chain to configure_writer().	2026-04-16 08:49:02 +03:00
Benny Halevy	1f7faeef57	sstables: populate LargeDataRecords from writer During compaction (SSTable writing), maintain bounded min-heaps (one per large_data_type) that collect the top-N above-threshold records. On stream end, drain all five heaps into a single LargeDataRecords array and write it into the SSTable's scylla metadata component. Five separate heaps are used: - partition_size, row_size, cell_size: ordered by value (size bytes) - rows_in_partition, elements_in_collection: ordered by elements_count A new config option 'compaction_large_data_records_per_sstable' (default 10) controls the maximum number of records kept per type.	2026-04-16 08:49:02 +03:00
Benny Halevy	8f4976f65d	large_data_handler: return above_threshold_result from maybe_record_large_cells Change maybe_record_large_cells to return above_threshold_result with separate booleans for cell size (.size) and collection elements (.elements) thresholds. This allows the writer to track above_threshold counts for cell_size and elements_in_collection independently.	2026-04-16 08:49:02 +03:00
Benny Halevy	c1b797f288	large_data_handler: rename partition_above_threshold to above_threshold_result Rename partition_above_threshold to above_threshold_result and its 'rows' field to 'elements', making it a generic struct that can be reused for other large data types (e.g., cells with collection elements). Use designated initializers for clarity.	2026-04-16 08:49:02 +03:00
Benny Halevy	d92cd42fe6	sstables: add LargeDataRecords metadata type (tag 13) Add a new scylla metadata component LargeDataRecords (tag 13) that stores per-SSTable top-N large data records. Each record carries: - large_data_type (partition_size, row_size, cell_size, etc.) - binary serialized partition key and clustering key - column name (for cell records) - value (size in bytes) - element count (rows or collection elements, type-dependent) - range tombstones and dead rows (partition records only) The struct uses disk_string<uint32_t> for key/name fields and is serialized via the existing describe_type framework into the SSTable Scylla metadata component. Add JSON support in scylla-sstable and format documentation.	2026-04-16 08:49:01 +03:00
Benny Halevy	85e2c6f2a7	sstables: add fmt::formatter for large_data_type Add a fmt::formatter specialization for sstables::large_data_type and use it in scylla-sstable.cc instead of the local to_string() overload, which is removed.	2026-04-16 08:42:54 +03:00
Benny Halevy	d4283d0ffc	keys: move key_to_str() to keys/keys.hh Move the key_to_str() template function from a file-local static in db/large_data_handler.cc to keys/keys.hh so it can be reused by: - large_data_handler.cc for log messages - virtual tables (db/virtual_tables.cc) for converting binary keys to human-readable CQL display - scylla-sstable for JSON output of LargeDataRecords No functional change.	2026-04-16 08:42:54 +03:00
Pavel Emelyanov	1af26a1dd6	cql3: Move batch_size_fail_threshold_in_kb to cql_config The batch_size_fail_threshold_in_kb option controls the batch size at which an oversized batch error is returned to the client. It belongs in cql_config rather than db::config as it directly governs CQL batch statement behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:27 +03:00
Pavel Emelyanov	4d255cf533	cql3: Move batch_size_warn_threshold_in_kb to cql_config The batch_size_warn_threshold_in_kb option controls the batch size at which a client warning is emitted during batch execution. It belongs in cql_config rather than db::config as it directly governs CQL batch statement behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:27 +03:00
Pavel Emelyanov	a3f097f100	cql3: Move enable_parallelized_aggregation to cql_config The enable_parallelized_aggregation option controls whether aggregation queries are fanned out across shards for parallel execution. It belongs in cql_config rather than db::config as it directly governs CQL query behavior at prepare time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:27 +03:00
Pavel Emelyanov	4314fc0642	cql3: Move strict_allow_filtering to cql_config The strict_allow_filtering option controls whether queries that require ALLOW FILTERING are silently accepted, warned about, or rejected. It belongs in cql_config rather than db::config as it directly governs CQL query behavior at prepare time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:26 +03:00
Pavel Emelyanov	3411ed8bcc	cql3: Move select_internal_page_size to cql_config The select_internal_page_size option controls CQL query execution behavior (internal paging for aggregate/filtered SELECTs) and belongs in cql_config rather than being read directly from db::config at execution time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:26 +03:00
Pavel Emelyanov	728eb20b42	test: Fix cql_test_env to use updateable cql_config from db::config The test environment was creating cql_config with hardcoded default values that were never updated when system.config was modified via CQL. This broke tests that dynamically change configuration values (e.g., TWCS tests). Fix by creating cql_config from db::config using sharded_parameter, which ensures updateable_value fields track the actual db::config sources and reflect changes made during test execution. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-04-16 07:57:26 +03:00
Pavel Emelyanov	60a834d9fa	cql3: Add cql_config parameter to parsed_statement::prepare() Pass cql_config to prepare() so that statement preparation can use CQL-specific configuration rather than reaching into db::config directly. Callers that use default_cql_config: - db/view/view.cc: builds a SELECT statement internally to compute view restrictions, not in response to a user query - cql3/statements/create_view_statement.cc: same -- parses the view's WHERE clause as a synthetic SELECT to extract restrictions - tools/schema_loader.cc: offline schema loading tool, no runtime config available - tools/scylla-sstable.cc: offline sstable inspection tool, no runtime config available Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:25 +03:00
Nadav Har'El	f0e9177130	Merge 'audit/alternator: Make Alternator requests audited' from Piotr Szymaniak Each Alternator API call results in the request being audited, provided the auditing is enabled. Both successful as well as the failed requests are audited, with few exceptions. The chosen audit types for the operations: - CreateTable - DDL - DescribeTable - QUERY - DeleteTable - DDL - UpdateTable - DDL - PutItem - DML - UpdateItem - DML - GetItem - QUERY - DeleteItem - DML - ListTables - QUERY - Scan - QUERY - DescribeEndpoints - QUERY - BatchWriteItem - DML - BatchGetItem - QUERY - Query - QUERY - TagResource - DDL - UntagResource - DDL - ListTagsOfResource - QUERY - UpdateTimeToLive - DDL - DescribeTimeToLive - QUERY - ListStreams - QUERY - DescribeStream - QUERY - GetShardIterator - QUERY - GetRecords - QUERY - DescribeContinuousBackups - QUERY FIXME: The tests are now covering the new functionality only partially. Fixes: scylladb/scylla-enterprise#3796 Fixes: SCYLLADB-467 No need to backport, new functionality. Closes scylladb/scylladb#27953 * github.com:scylladb/scylladb: audit/alternator: support audit_tables=alternator.<table> shorthand audit/alternator: Add negative audit tests audit/alternator: Add testing of auditing audit/alternator: Audit requests audit/alternator: Refactor in preparation for auditing Alternator	2026-04-15 22:17:57 +03:00
Nikos Dragazis	d38f44208a	test/cqlpy: Harden mutation_fragments tests against background flushes Several tests in test_select_from_mutation_fragments.py assume that all mutations end up in a single SSTable. This assumption can be violated by background memtable flushes triggered by commitlog disk pressure. Since the Scylla node is taken from a pool, it may carry unflushed data from prior tests that prevents closed segments from being recycled, thereby increasing the commitlog disk usage. A main source of such pressure is keyspace-level flushes from earlier tests in this module, which rotate commitlog segments without flushing system tables (e.g., `system.compaction_history`), leaving closed segments dirty. Additionally, prior tests in the same module may have left unflushed data on the shared test table (`test_table` fixture), keeping commitlog segments dirty on its behalf as well. When commitlog disk usage exceeds its threshold, the system flushes the test table to reclaim those segments, potentially splitting a running test's mutations across multiple SSTables. This was observed in CI, where test_paging failed because its data was split across two SSTables, resulting in more mutation fragments than the hardcoded expected count. This patch fixes the affected tests in two ways: 1. Where possible, tests are reworked to not assume a single SSTable: - test_paging - test_slicing_rows - test_many_partition_scan 2. Where rework is impractical, major compaction is added after writes and before validation to ensure that only one SSTable will exist: - test_smoke - test_count - test_metadata_and_value - test_slicing_range_tombstone_changes Fixes SCYLLADB-1375. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#29389	2026-04-15 21:46:00 +03:00
Michael Litvak	cc94467097	test: test drop table during streaming Add a test that drops a table while tablet streaming is running for the table. The table is dropped after taking the storage snapshot and initializating streaming sources - after that streaming should be able to complete or abort correctly if the table is dropped. We want to verify there is no incorrect access to the destroyed table. The test tests both types of streaming in stream_blob - sstables and logstor segments.	2026-04-15 19:23:00 +02:00
Michael Litvak	69d2a90106	streaming: stream_blob: hold table for streaming When initializing streaming sources in tablet_stream_files_handler we use a reference to the table. We should hold the table while doing so, because otherwise the table may be dropped and destroyed when we yield. Use the table.stream_in_progress() phaser to hold the table while we access it. For sstable file streaming we can release the table after the snapshot is initialized, and the table may be dropped safely because the files are held by the snapshot and we don't access the table anymore. There was a single access to the table for logging but it is replaced by a pre-calculated variable. For logstor segment streaming, currently it doesn't support discarding the segments while they are streamed - when the table is dropped it discard the segments by overwriting and freeing them, so they shouldn't be accessed after that. Therefore, in that case continue to hold the table until streaming is completed. Fixes SCYLLADB-1533	2026-04-15 19:22:42 +02:00
Avi Kivity	59ec93b86b	Merge 'Allow arbitrary tablet boundaries and count' from Tomasz Grabiec There are several reasons we want to do that. One is that it will give us more flexibility in distributing the load. We can subdivide tablets at any token, and achieve more evenly-sized tablets. In particular, we can isolate large partitions into separate tablets. We can also split and merge incrementally individual tablets. Currently, we do it for the whole table or nothing, which makes splits and merges take longer and cause wide swings of the count. This is not implemented in this PR yet, we still split/merge the whole table. Another reason is vnode to tablets migration. We now could construct a tablet map which matches exactly the vnode boundaries, so migration can happen transparently from CQL-coordinator point of view. Tablet count is still a power-of-two by default for newly created tables. It may be different if tablet map is created by non-standard means, or if per-table tablet option "pow2_count" is set to "false". build/release/scylla perf-tablets: Memory footprint for 131k tablets increased from 56 MiB to 58.1 MiB (+3.5%) Before: ``` Generating tablet metadata Total tablet count: 131072 Size of tablet_metadata in memory: 57456 KiB Copied in 0.014346 [ms] Cleared in 0.002698 [ms] Saved in 1234.685303 [ms] Read in 445.577881 [ms] Read mutations in 299.596313 [ms] 128 mutations Read required hosts in 247.482742 [ms] Size of canonical mutations: 33.945053 [MiB] Disk space used by system.tablets: 1.456761 [MiB] Tablet metadata reload: full 407.69ms partial 2.65ms ``` After: ``` Generating tablet metadata Total tablet count: 131072 Size of tablet_metadata in memory: 59504 KiB Copied in 0.032475 [ms] Cleared in 0.002965 [ms] Saved in 1093.877441 [ms] Read in 387.027100 [ms] Read mutations in 255.752121 [ms] 128 mutations Read required hosts in 211.202805 [ms] Size of canonical mutations: 33.954453 [MiB] Disk space used by system.tablets: 1.450162 [MiB] Tablet metadata reload: full 354.50ms partial 2.19ms ``` Closes scylladb/scylladb#28459 * github.com:scylladb/scylladb: test: boost: tablets: Add test for merge with arbitrary tablet count tablets, database: Advertise 'arbitrary' layout in snapshot manifest tablets: Introduce pow2_count per-table tablet option tablets: Prepare for non-power-of-two tablet count tablets: Implement merged tablet_map constructor on top of for_each_sibling_tablets() tablets: Prepare resize_decision to hold data in decisions tablets: table: Make storage_group handle arbitrary merge boundaries tablets: Make stats update post-merge work with arbitrary merge boundaries locator: tablets: Support arbitrary tablet boundaries locator: tablets: Introduce tablet_map::get_split_token() dht: Introduce get_uniform_tokens()	2026-04-15 18:57:22 +03:00
Andrzej Jackowski	78926d9c96	test/random_failures: remove gossip shadow round injection Commit `c17c4806a1` removed check_for_endpoint_collision() from the fresh bootstrap path, which was the only code path that called do_shadow_round() for new nodes. Since the gossip shadow round is no longer executed during bootstrap, remove the stop_during_gossip_shadow_round error injection from the test. The entry is marked as REMOVED_ rather than deleted to preserve the shuffle order for seed-based test reproducibility. The injection point in gms/gossiper.cc is also removed since it is no longer used by any test. Fixes: SCYLLADB-1466 Closes scylladb/scylladb#29460	2026-04-15 16:30:55 +02:00
Dario Mirovic	336dab1eec	test: lib: fix broken retry in start_docker_service The retry loop in start_docker_service passes the parse callbacks via std::move into create_handler on each iteration. After the first iteration, the moved-from std::function objects are empty. All subsequent retries skip output parsing entirely and immediately treat the service as successfully started. This defeats the entire purpose of the retry mechanism. Fix by passing the callbacks by copy instead of move, so the original callbacks remain valid across retries. Fixes SCYLLADB-1542	2026-04-15 15:25:52 +02:00
Asias He	4137a4229c	test: Stabilize tablet incremental repair error test Use async tablet repair task flow to avoid a race where client timeout returns while server-side repair continues after injections are disabled. Start repair with await_completion=false, assert it does not complete within timeout under injection, abort/wait the task, then verify sstables_repaired_at is unchanged. Fixes SCYLLADB-1184 Closes scylladb/scylladb#29452	2026-04-15 16:24:43 +03:00
Dimitrios Symonidis	ca003680a7	test/object_store: verify object storage client creation and live reconfiguration	2026-04-15 14:28:39 +02:00
Dimitrios Symonidis	24a7b146fa	sstables/utils/s3: split config update into sync and async parts Config observers run synchronously in a reactor turn and must not suspend. Split the previous monolithic async update_config() coroutine into two phases: Sync (runs in the observer, never suspends): - S3: atomically swap _cfg (lw_shared_ptr) and set a credentials refresh flag. - GCS: install a freshly constructed client; stash the old one for async cleanup. - storage_manager: update _object_storage_endpoints and fire the async cleanup via a gate-guarded background fiber. Async (gate-guarded background fiber): - S3: acquire _creds_sem, invalidate and rearm credentials only if the refresh flag is set. - GCS: drain and close stashed old clients.	2026-04-15 14:28:31 +02:00
Dimitrios Symonidis	a958da0ab9	test_config: improve logging for wait_for_config API	2026-04-15 14:28:31 +02:00
Dimitrios Symonidis	71714fdc0e	db: introduce read-write lock to synchronize config updates with REST API Config is reloaded from SIGHUP on shard 0 and broadcast to all shards under a write lock. REST API callers reading find_config_id acquire a read lock via value_as_json_string_for_name() and are guaranteed a consistent snapshot even when a reload is in progress.	2026-04-15 14:28:31 +02:00
dependabot[bot]	d584e8e321	build(deps): bump sphinx-scylladb-theme from 1.9.1 to 1.9.2 in /docs Bumps [sphinx-scylladb-theme](https://github.com/scylladb/sphinx-scylladb-theme) from 1.9.1 to 1.9.2. - [Release notes](https://github.com/scylladb/sphinx-scylladb-theme/releases) - [Commits](https://github.com/scylladb/sphinx-scylladb-theme/compare/1.9.1...1.9.2) --- updated-dependencies: - dependency-name: sphinx-scylladb-theme dependency-version: 1.9.2 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Closes scylladb/scylladb#29476	2026-04-15 14:57:37 +03:00
Gleb Natapov	ca24dd4a5f	topology coordinator: log request cancellation only when request are really canceled Currently cancellation is logged in get_next_task, but the function is called by tablets code as well where we do not act upon its result, only yield to the topology coordinator. But the topology coordinator will not necessary do the cancellation as well since it can be busy with tablets migration. As a result cancellation is logged, but not done which is confusing. Fix it by logging cancellation when it is actually happens. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1409 Closes scylladb/scylladb#29471	2026-04-15 14:46:59 +03:00
Botond Dénes	280fe7cfb7	Merge 'Make inclusion of config.hh cheaper' from Nadav Har'El This is an attempt (mostly suggested and implemented by AI, but with a few hours of human babysitting...), to somewhat reduce compilation time by picking one template, named_value<T>, which is used in more than a hundred source files through the config.hh header, and making it use external instantiation: The different methods of named_value<T> for various T are instantiated only once (in config.cc), and the individual translation units don't need to compile them a hundred times. The resulting saving is a little underwhelming: The total object-file size goes down about 1% (from 346,200 before the patch to 343,488 after the patch), and previous experience shows that this object-file size is proportional to the compilation time, most of which involves code generation. But I haven't been able to measure speedup of the build itself. 1% is not nothing, but not a huge saving either. Though arguably, with 50 more of these patches, we can make the build twice faster :-) Refs #1. Closes scylladb/scylladb#28992 * github.com:scylladb/scylladb: config: move named_value<T> method bodies out-of-line config: suppress named_value<T> instantiation in every source file	2026-04-15 14:40:15 +03:00
Botond Dénes	00d8470554	Merge 'test: filter benign shutdown errors in tests that grep logs directly' from Marcin Maliszkiewicz Tests that call grep_for_errors() directly and assert no errors can fail spuriously due to benign RPC errors during graceful shutdown (e.g. "connection dropped: Semaphore broken"), which are already filtered by the after_test hook via filter_errors(). Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1464 Backport: no, tests fix (we may decide to backport later if it occurs on release branches) Closes scylladb/scylladb#29463 * github.com:scylladb/scylladb: test: filter benign errors in tests that grep logs during shutdown test: filter_errors: support list[list[str]] error groups	2026-04-15 14:40:15 +03:00
Piotr Szymaniak	5b00675bf0	storage_proxy: expedite speculative retry on replica disconnect When a replica disconnects during a digest read (e.g., during decommission), the speculating_read_executor now immediately fires the pending speculative retry instead of waiting for the timer. On DISCONNECT, the digest_read_resolver invokes an _on_disconnect callback set by the executor. The callback cancels the speculate timer and rearms it to clock_type::now() (lowres_clock::now() = thread-local memory read, no syscall). The existing timer callback fires on the next reactor poll with all its logic intact — checking is_completed(), calling add_wait_targets(1), sending the request, and incrementing speculative_digest_reads/speculative_data_reads. The notification is fire-and-forget: on_error() does NOT absorb the DISCONNECT. The existing error arithmetic in digest_read_resolver already handles this correctly because _target_count_for_cl accounts for the speculative target. For never_speculating_read_executor (no spare target) and always_speculating_read_executor (all requests sent upfront), _on_disconnect is never set — no behavior change. Fixes scylladb/scylladb#26307 Closes scylladb/scylladb#29428	2026-04-15 14:40:15 +03:00
Raphael S. Carvalho	a2eed4bb45	service: Use optimistic replicas in all_sibling_tablet_replicas_colocated all_sibling_tablet_replicas_colocated was using committed ti.replicas to decide whether sibling tablets are co-located and merge can be finalized. This caused a false non-co-located window when a co-located pair was moved by the load balancer: as both tablets migrate together, their del_transition commits may land in different Raft rounds. After the first commit, ti.replicas diverge temporarily (one tablet shows the new position, the other the old), causing all_sibling_tablet_replicas_colocated to return false. This clears finalize_resize, allowing the load balancer to start new cascading migrations that delay merge finalization by tens of seconds. Fix this by using the optimistic replica view (trinfo->next when transitioning, ti.replicas otherwise) — the same view the load balancer uses for load accounting — so finalize_resize stays populated throughout an in-flight migration and no spurious cascades are triggered. Steps that lead to the problem: 1. Merge is triggered. The load balancer generates co-location migrations for all sibling pairs that are not yet on the same shard. Some pairs finish co-location before others. 2. Once all pairs are co-located in committed state, all_sibling_tablet_replicas_colocated returns true and finalize_resize is set. Meanwhile the load balancer may have already started a regular LB migration on one co-located pair (both tablets are stable and the load balancer is free to move them). 3. The LB migration moves both tablets together (colocated_tablets). Their two del_transition commits land in separate Raft rounds. After the first commit, ti.replicas[t1] = new position but ti.replicas[t2] = old position. 4. In this window, all_sibling_tablet_replicas_colocated sees the pair as NOT co-located, clears finalize_resize, and the load balancer generates new migrations for other tablets to rebalance the load that the pair move created. 5. Those new migrations can take tens of seconds to stream, keeping the coordinator in handle_tablet_migration mode and preventing maybe_start_tablet_resize_finalization from being called. The merge finalization is delayed until all those cascaded migrations complete. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-821. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1459. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29465	2026-04-15 14:40:15 +03:00
Marcin Maliszkiewicz	53b6e9fda5	Merge 'Make DESCRIBE CLUSTER get cluster information from storage_service' from Pavel Emelyanov Currently the statement returns cluster, partitioner and snitch names by accessing global db::config via database. As the part of an effort to detach components from global db::config, this PR tweaks the statement handler to get the cluster information from some other source. Currently the needed cluster information is stored in different components, but they are all under storage_service umbrella which seems to be a good central source of this truth. Unit test included. Cleaning components inter-dependencies, not backporting Closes scylladb/scylladb#29429 * github.com:scylladb/scylladb: test: Add test_describe_cluster_sanity for DESCRIBE CLUSTER validation describe_statement: Get cluster info from storage_service storage_service: Add describe_cluster() method query_processor: Expose storage_service accessor	2026-04-15 14:40:15 +03:00
Botond Dénes	d0e99e018b	reader_concurrency_semaphore: drop unused stop_ext_{pre,post}() Left over from primordial times, when reader_concurrency_semaphore was baseclass for extensions in the separate enterprise repository. Also remove the now unneded virtual marker from the destructor. Closes scylladb/scylladb#29399	2026-04-15 14:40:15 +03:00
Botond Dénes	4a2d032c6f	Merge 'query: result_set: change row member to a chunked vector' from Benny Halevy To prevent large memory allocations. This series shows over 3% improvement in perf-simple-query throughput. ``` $ build/release/scylla perf-simple-query --default-log-level=error --smp=1 --random-seed=1855519715 random-seed=1855519715 enable-cache=1 Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... Before: random-seed=1775976514 enable-cache=1 enable-index-cache=1 sstable-summary-ratio=0.0005 sstable-format=me Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 336345.11 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32788 insns/op, 12430 cycles/op, 0 errors) 348748.14 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32794 insns/op, 12335 cycles/op, 0 errors) 349012.63 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32800 insns/op, 12326 cycles/op, 0 errors) 350629.97 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32770 insns/op, 12270 cycles/op, 0 errors) 348585.00 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32804 insns/op, 12338 cycles/op, 0 errors) throughput: mean= 346664.17 standard-deviation=5825.77 median= 348748.14 median-absolute-deviation=2348.46 maximum=350629.97 minimum=336345.11 instructions_per_op: mean= 32791.35 standard-deviation=13.60 median= 32794.47 median-absolute-deviation=8.65 maximum=32804.45 minimum=32769.57 cpu_cycles_per_op: mean= 12340.05 standard-deviation=57.57 median= 12335.05 median-absolute-deviation=13.94 maximum=12430.42 minimum=12270.28 After: random-seed=1775976514 enable-cache=1 enable-index-cache=1 sstable-summary-ratio=0.0005 sstable-format=me Running test with config: {partitions=10000, concurrency=100, mode=read, query_single_key=no, counters=no} Disabling auto compaction Creating 10000 partitions... 353770.85 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32762 insns/op, 11893 cycles/op, 0 errors) 364447.98 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32738 insns/op, 11818 cycles/op, 0 errors) 365268.97 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32734 insns/op, 11788 cycles/op, 0 errors) 344304.87 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32746 insns/op, 12506 cycles/op, 0 errors) 362263.57 tps ( 58.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 32756 insns/op, 11888 cycles/op, 0 errors) throughput: mean= 358011.25 standard-deviation=8916.76 median= 362263.57 median-absolute-deviation=6436.74 maximum=365268.97 minimum=344304.87 instructions_per_op: mean= 32747.06 standard-deviation=11.85 median= 32745.80 median-absolute-deviation=9.36 maximum=32762.18 minimum=32734.01 cpu_cycles_per_op: mean= 11978.65 standard-deviation=298.06 median= 11887.96 median-absolute-deviation=160.96 maximum=12505.72 minimum=11788.49 ``` Refs #28511 (Refs rather than Fixes for the lack of a reproducer unit test) * No backport needed as the issue is rare and not severe Closes scylladb/scylladb#28631 * github.com:scylladb/scylladb: query: result_set: change row member to a chunked vector query: result_set_row: make noexcept query: non_null_data_value: assert is_nothrow_move_constructible and assignable types: data_value: assert is_nothrow_move_constructible and assignable	2026-04-15 14:40:15 +03:00
Nadav Har'El	1eb8d170dd	Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik This series allows creating multiple vector indexes on the same column so users can rebuild an index without losing query availability. The intended flow is: 1. Create a new vector index on a column that already has one. 2. Keep serving ANN queries from the old index while the new one is being built. 3. Verify the new index is ready. 4. Automatically switch to the remaining index. 5. Drop the old index. To make that deterministic, `index_version` is changed from the base table schema version to a real creation timeuuid. When multiple vector indexes exist on the same column, ANN query planning now picks the index according to the routing implemented in Vector Store (newest serving index). This keeps queries on the old index until it the new one is up and ready. This patch also removes the create-time restriction that rejected a second vector index on the same column. Name collisions are still rejected as before. Test coverage is updated accordingly: - Scylla now verifies that two vector indexes can coexist on the same column. - Cassandra/SAI behavior is still covered and is still expected to reject duplicate indexes on the same column. Fixes: VECTOR-610 Closes scylladb/scylladb#29407 * github.com:scylladb/scylladb: docs: document vector index metadata and duplicate handling test/cqlpy: cover vector index duplicate creation rules vector_index: allow multiple named indexes on one column vector_index: store `index_version` as creation timeuuid	2026-04-15 14:40:15 +03:00
Botond Dénes	a9c86fc2e4	docs: document schema subcomponent in sstable-scylla-format.md Commit `234f905` (sstables: scylla_metadata: add schema member) added a new Schema subcomponent (tag 11) to scylla_metadata. Document it in the sstable Scylla format reference: - Add schema to the subcomponent grammar enumeration - Add a summary entry describing the subcomponent (tag 11) and its purpose - Add a detailed ## schema subcomponent section with the binary grammar, covering table_id, table_schema_version, keyspace_name, table_name and the column_description array (column_kind, column_name, column_type) Fixes https://github.com/scylladb/scylladb/issues/27960 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#28983	2026-04-15 14:40:15 +03:00
Botond Dénes	5891efc2ca	Merge 'service: add missing replicas if tablet rebuild was rolled back' from Aleksandra Martyniuk RF change of tablet keyspace starts tablet rebuilds. Even if any of the rebuilds is rolled back (because pending replica was excluded), rf change request finishes successfully. In this case we end up with the state of the replicas that isn't compatible with the expected keyspace replication. Modify topology coordinator so that if it were to be idle, it starts checking if there are any missing replicas. It moves to transition_state::tablet_migration and run required rebuilds. If a new RF change request encounters invalid state of replicas it fails. The state will be fixed later and the analogical ALTER KEYSPACE statement will be allowed. Fixes: SCYLLADB-109. Requires backport to all versions with tablet keyspace rf change. Closes scylladb/scylladb#28709 * github.com:scylladb/scylladb: test: add test_failed_tablet_rebuild_is_retried_on_alter test: add a test to ensure that failed rebuilds are retried service: fail ALTER KEYSPACE if replicas do not satisfy the replication service: retry failed tablet rebuilds service: maybe_start_tablet_migration returns std::optional<group0_guard>	2026-04-15 14:40:15 +03:00
David Garcia	0eaa42c846	docs: Makefile: drop redundant -t $(FLAG) from sphinx options Related scylladb/scylladb-docs-homepage#153. make multiversion failed under Sphinx 8+ with: ``` sphinx-build: error: argument --tag/-t: expected one argument subprocess.CalledProcessError: Command '(..., '-m', 'sphinx', '-t', '-D', 'smv_metadata_path=...', ..., 'manual')' returned non-zero exit status 2. make: *** [multiversion] Error 1 ``` sphinx-multiversion's arg forwarding splits `-t manual`, sending `-t` into the options slot and `manual` to the trailing FILENAMES positional. Sphinx 7 silently tolerated the dangling `-t`; Sphinx 8+'s stricter argparse CLI rejects it. Instead, it now reads FLAGS from an env variable. How to test: ```` make multiversion make FLAG=opensource multiversion ```` Both complete and switch variants correctly. chore: rm empty lines Closes scylladb/scylladb#29472	2026-04-15 14:40:15 +03:00
dependabot[bot]	280ffe107f	build(deps): bump sphinx-multiversion-scylla in /docs Bumps [sphinx-multiversion-scylla](https://holzhaus.github.io/sphinx-multiversion/) from 0.3.7 to 0.3.8. --- updated-dependencies: - dependency-name: sphinx-multiversion-scylla dependency-version: 0.3.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Closes scylladb/scylladb#29466	2026-04-15 14:40:15 +03:00
Raphael S. Carvalho	1529605b32	logstor: Fix dangling reference captures and shadowed loc variable Three bugs fixed in segment_manager.cc: 1. write_to_separator(): captured [&index] where index was a local coroutine-frame reference. The future is stored in buf.pending_updates and resolved later in flush_separator_buffer(), by which time the enclosing coroutine frame is destroyed, making &index a dangling pointer. This is a use-after-free that manifests as a segfault. Fix: capture index_ptr (raw pointer by value) instead. 2. add_segment_to_compaction_group(): same dangling [&index] pattern inside the for_each_live_record lambda during recovery. Same fix applied. 3. write(): local 'auto loc = seg->allocate(...)' shadowed the outer 'log_location loc', causing the function to always return a zero-initialized log_location{}. Fix: remove 'auto' so the assignment targets the outer variable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29451	2026-04-15 14:40:15 +03:00
Tomasz Grabiec	266a225416	utils: avoid exceptions in disk_space_monitor polling loop The poll loop used condition_variable::wait(timeout) to sleep between iterations. On every normal timeout expiry, this threw a condition_variable_timed_out exception, which incremented the C++ exception metric and triggered false alerts for support. Replace the timed wait with a seastar::timer that broadcasts the condition variable on expiry, combined with an untimed wait(). The timer is cancelled automatically on scope exit when the wait is woken early by trigger_poll() or abort. Fixes SCYLLADB-1477 Closes scylladb/scylladb#29438	2026-04-15 14:40:15 +03:00
Pavel Emelyanov	a428472e50	db: Remove redundant enable_logstor config option The enable_logstor configuration option is redundant with the 'logstor' experimental feature flag. Consolidate to a single gate: use the experimental feature to control both whether logstor is available for table creation and whether it is initialized at database startup. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29427	2026-04-15 14:40:15 +03:00
Botond Dénes	87eb20ba33	Merge 'cql: Include parallelized queries in the scylla_cql_select_partition_range_scan_no_bypass_cache metric' from Tomasz Grabiec This metric is used to catch execution of scans which go via row cache, which can have bad effect on performance. Since `f344bd0aaa`, aggregate queries go via new statement class: parallelized_select_statement. This class inherits from select_statement directly rather than from primary_key_select_statement. The range scan detection logic (_range_scan, _range_scan_no_bypass_cache) was only in primary_key_select_statement's constructor, so parallelized queries were not counted in select_partition_range_scan and select_partition_range_scan_no_bypass_cache metrics. Fix by moving the range scan detection into select_statement's constructor, so that all subclasses get it. No backport: enhancement Closes scylladb/scylladb#29422 * github.com:scylladb/scylladb: cql: Include parallelized queries in the scylla_cql_select_partition_range_scan_no_bypass_cache metric test: cluster: dtest: Fix double-counting of metrics	2026-04-15 14:40:15 +03:00
Botond Dénes	aecb6b1d76	Merge 'auth: sanitize {USER} substitution in LDAP URL template' from Piotr Smaron `LDAPRoleManager` interpolated usernames directly into `ldap_url_template`, allowing LDAP filter injection and URL structure manipulation via crafted usernames. This PR adds two layers of encoding when substituting `{USER}`: 1. RFC 4515 filter escaping — neutralises ``, `(`, `)`, `\`, NUL 2. URL percent-encoding* — prevents `%`, `?`, `#` from breaking `ldap_url_parse`'s component splitting or undoing the filter escaping It also adds `validate_query_template()` at startup to reject templates that place `{USER}` outside the filter component (e.g. in the host or base DN), where filter escaping would be the wrong defense. Fixes: SCYLLADB-1309 Compatibility note: Templates with `{USER}` in the host, base DN, attributes, or extensions were previously silently accepted. They are now rejected at startup with a descriptive error. Only templates with `{USER}` in the filter component (after the third `?`) are valid. Fixes: SCYLLADB-1309 Due to severeness, should be backported to all maintained versions. Closes scylladb/scylladb#29388 * github.com:scylladb/scylladb: auth: sanitize {USER} substitution in LDAP URL templates test/ldap: add LDAP filter-injection reproducers	2026-04-15 14:40:15 +03:00
Artsiom Mishuta	146a67cf6f	test: explicitly wait for schema agreement in create_new_test_keyspace Add an explicit wait_for_schema_agreement() call after CREATE KEYSPACE in create_new_test_keyspace to ensure all nodes have applied the schema before proceeding. Closes scylladb/scylladb#29371	2026-04-15 14:40:15 +03:00
Pavel Emelyanov	54e3c648a5	test/cluster/dtest: improve diagnostics in test_update_schema_while_node_is_killed The alter_table case has a known failure where point lookups at QUORUM return 0 rows after node2 restarts, even though: - the schema was correctly synced (ALTER TABLE received from cluster) - the data commitlog was replayed (21 mutations, 0 skipped) - all 3 nodes were alive, so QUORUM (2/3) should be satisfiable by node1+node3 regardless of node2's state The LIMIT 1 table scan succeeds (data is present somewhere), but specific key lookups return empty. This points to a bug in how node2, acting as coordinator after restart, routes single-partition reads — most likely stale tablet routing metadata. Add diagnostics to help distinguish data loss from a coordinator/routing bug on the next failure: - log which key is missing - dump all rows visible at QUORUM - query each node individually at ONE consistency for the missing key Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29350	2026-04-15 14:40:15 +03:00
Piotr Szymaniak	4c93c2af62	audit/alternator: support audit_tables=alternator.<table> shorthand The real keyspace name of an Alternator table T is "alternator_T". Expand the "alternator.T" format used in the audit_tables config flag to the real keyspace name at parse time, so users don't need to spell out the internal "alternator_T.T" form.	2026-04-15 12:29:15 +02:00
Piotr Szymaniak	0714d8aded	audit/alternator: Add negative audit tests Add tests for the unhappy path of Alternator audit logging: - Category filtering: operations are not logged when their category (DML, QUERY, DDL) is excluded from audit_categories. - Keyspace filtering: operations on a keyspace not listed in audit_keyspaces are not logged. - Error entries: a failed operation (thrown exception after audit_info is set) produces an audit entry with error=true. - Empty-keyspace bypass: global operations like ListTables and DescribeEndpoints are logged regardless of audit_keyspaces because should_log() short-circuits on an empty keyspace.	2026-04-15 12:29:15 +02:00
Piotr Szymaniak	ad05b44931	audit/alternator: Add testing of auditing There is a new test file created, `test/alternator/test_audit.py`. The file contains a suite of tests of all auditing operations.	2026-04-15 12:29:15 +02:00
Piotr Szymaniak	6913efab5c	audit/alternator: Audit requests Both the successful ones as well as the failed ones are audited. Each Alternator operation sets up audit metadata via an executor::maybe_audit() helper, which checks will_log() and only heap-allocates audit_info_alternator when auditing is enabled. DDL and metadata operations pass no consistency level; data read/write operations pass the actual CL used. BatchWriteItem and BatchGetItem guard table name collection with will_log() to avoid unnecessary work when auditing is disabled. ListStreams audits the input table name rather than collecting output table names during iteration. UntagResource sets up auditing after parameter validation. Exception re-throw in server.cc uses co_return coroutine::exception(). The chosen audit types for the operations: - CreateTable - DDL - DescribeTable - QUERY - DeleteTable - DDL - UpdateTable - DDL - PutItem - DML - UpdateItem - DML - GetItem - QUERY - DeleteItem - DML - ListTables - QUERY - Scan - QUERY - DescribeEndpoints - QUERY - BatchWriteItem - DML - BatchGetItem - QUERY - Query - QUERY - TagResource - DDL - UntagResource - DDL - ListTagsOfResource - QUERY - UpdateTimeToLive - DDL - DescribeTimeToLive - QUERY - ListStreams - QUERY - DescribeStream - QUERY - GetShardIterator - QUERY - GetRecords - QUERY - DescribeContinuousBackups - QUERY	2026-04-15 11:55:42 +02:00
Piotr Szymaniak	9646ee05bd	audit/alternator: Refactor in preparation for auditing Alternator Prepare API in audit for auditing Alternator. The API provides an externally-callable functions `inspect()`, for both CQL and Alternator. Both variants of the function would unpack parameters and merge into calling a common `maybe_log()`, which can then call `log()` when conditions are met. Also, while I was at it, (const) references were favoured over raw pointers. The Alternator audit_info subclass (audit_info_alternator) carries an optional consistency level — only data read/write operations have a meaningful CL, while DDL and metadata queries store an empty string in the audit table and syslog (matching the existing write_login behavior). The storage helpers are updated accordingly. Add a will_log(category, keyspace, table) method that checks whether an operation should be audited (category check AND keyspace/table filtering) without requiring a constructed audit_info object. should_log() delegates to will_log().	2026-04-15 11:46:44 +02:00
Tomasz Grabiec	84361194c2	test: boost: tablets: Add test for merge with arbitrary tablet count	2026-04-15 10:40:56 +02:00
Tomasz Grabiec	7af9f5366d	tablets, database: Advertise 'arbitrary' layout in snapshot manifest Currently, the manifest advertises "powof2", which is wrong for arbitrary count and boundaries. Introduce a new kind of layout called "arbitrary", and produce it if the tablet map doesn't conform to "powof2" layout. We should also produce tablet boundaries in this case, but that's worked on in a different PR: https://github.com/scylladb/scylladb/pull/28525	2026-04-15 10:40:56 +02:00
Tomasz Grabiec	50fbac6ea6	tablets: Introduce pow2_count per-table tablet option By default it's true, in which case tablet count of the table is rounded up to a power of two. This option allows lifting this, in which case the count can be arbitrary. This will allow testing the logic of arbitrary tablet count.	2026-04-15 10:40:56 +02:00
Tomasz Grabiec	b6a7023f68	tablets: Prepare for non-power-of-two tablet count This is a step towards more flexibility in managing tablets. A prerequisite before we can split individual tablets, isolating hot partitions, and evening-out tablet sizes by shifting boundaries. After this patch, the system can handle tables with arbitrary tablet count. Tablet allocator is still rounding up desired tablet count to the nearest power of two when allocating tablets for a new table, so unless the tablet map is allocated in some other way, the counts will be still a power of two. We plan to utilize arbitrary count when migrating from vnodes to tablets, by creating a tablet map which matches vnode boundaries. One of the reasons we don't give up on power-of-two by default yet is that it creates an issue with merges. If tablet count is odd, one of the tablets doesn't have a sibling and will not be merged. That can obviously cause imbalance of token space and tablet sizes between tablets. To limit the impact, this patch dynamically chooses which tablet to isolate when initiating a merge. The largest tablet is chosen, as that will minimize imbalance. Otherwise, if we always chose the last tablet to isolate, its size would remain the same while other tablets double in size with each odd-count merge, leading to imbalance. The imbalance will still be there, but the difference in tablet sizes is limited to 2x. Example (3 tablets): [0] owns 1/3 of tokens [1] owns 1/3 of tokens [2] owns 1/3 of tokens After merge: [0] owns 2/3 of tokens [1] owns 1/3 of tokens What we would like instead: Step 1 (split [1]): [0] owns 1/3 of tokens [1] old 1.left, owns 1/6 of tokens [2] old 1.right, owns 1/6 of tokens [3] owns 1/3 of tokens Step 2 (merge): [0] owns 1/2 of tokens [1] owns 1/2 of tokens To do that, we need to be able to split individual tablets, but we're not there yet.	2026-04-15 10:40:55 +02:00
Tomasz Grabiec	f54daef4ec	tablets: Implement merged tablet_map constructor on top of for_each_sibling_tablets() This way it doesn't need to know how the scheduler chose to merge tablets. We'll have less duplication of logic.	2026-04-15 10:40:55 +02:00
Tomasz Grabiec	66fc7967b8	tablets: Prepare resize_decision to hold data in decisions merge decision will carry a plan - which replica to isolate. So construction from a string will no longer do.	2026-04-15 10:40:55 +02:00
Tomasz Grabiec	d543f260bd	tablets: table: Make storage_group handle arbitrary merge boundaries We only assume that new tablets have boundaries which are equal to some boundaries of old tablets. In preparation for supporting arbitrary merge plan, where any replica can be isolated (not merged with siblings) by the merge plan.	2026-04-15 10:40:55 +02:00
Nadav Har'El	022add117e	test/cluster: fix flaky test test_row_ttl_scheduling_group The test test/cluster/test_ttl_row.py::test_row_ttl_scheduling_group wants to verify that the new CQL per-row TTL feature does all its work (expiration scanning, deletion of expired items) on all nodes in the "streaming" scheduling group, not in the statement scheduling group. As originally written, the test couldn't require that it uses exactly zero time in the statement scheduling group - because some things do happen there - specifically the ALTER TABLE request we use to enable TTL. So the test checked that the time in the "wrong" group is less than 0.2 of the total time, not zero. But in one CI run, we got to exactly 0.2 and the test failed. Running this test locally, I see the margin is pretty narrow: The test almost always fails if I set the threshold ratio to 0.1. The solution in this patch is to move the ALTER TABLE work to a different scheduling group (by using an additional service level). After doing that the CPU usage in sl:default goes down to exactly zero - not close to zero but exactly zero. However, it seems that there is always some rare background work in sl:default and debug builds it can come out more than 0ms (e.g., in one test we saw 1ms), so we keep checking that sl:default is much lower than sl:stream - not exactly zero. Incidentally, I converted the serial loop adding the 200 rows in the test's setup to a parallel loop, to make the test setup slightly faster. I also added to the test a sanity check that the scheduling group sl:default that we are measuring that TTL does zero work in, is actually the scheduling group that normal writes work in (to avoid the risk of having a test that verifies that some irrelevant scheduling group is unsurprisingly getting zero usage...). Fixes SCYLLADB-1495. Closes scylladb/scylladb#29447	2026-04-15 08:42:29 +03:00
Jenkins Promoter	3d0582d51e	Update pgo profiles - aarch64	2026-04-15 05:26:22 +03:00
Jenkins Promoter	a4d3ab9f0e	Update pgo profiles - x86_64	2026-04-15 04:26:28 +03:00
Tomasz Grabiec	6d510bcd1c	tablets: Make stats update post-merge work with arbitrary merge boundaries We only assume that new tablets share boundaries with some old tablets. In preparation for supporting arbitrary merge plan, where any replica can be isolated (not merged with siblings) by the merge plan.	2026-04-15 01:25:16 +02:00
Tomasz Grabiec	01fb97ee78	locator: tablets: Support arbitrary tablet boundaries There are several reasons we want to do that. One is that it will give us more flexibility in distributing the load. We can subdivide tablets at any points, and achieve more evenly-sized tablets. In particular, we can isolate large partitions into separate tablets. Another reason is vnode-to-tablet migration. We could construct a tablet map which matches exactly the vnode boundaries, so migration can happen transparently from the CQL-coordinator's point of view. Implementation details: We store a vector of tokens which represent tablet boundaries in the tablet_id_map. tablet_id keeps its meaning, it's an index into vector of tablets. To avoid logarithmic lookup of tablet_id from the token, we introduce a lookup structure with power-of-two aligned buckets, and store the tablet_id of the tablet which owns the first token in the bucket. This way, lookup needs to consider tablet id range which overlaps with one bucket. If boundaries are more or less aligned, there are around 1-2 tablets overlapping with a bucket, and the lookup is still O(1). Amount of memory used increased, but not significantly relative to old size (because tablet_info is currently fat): For 131'072 tablets: Before: Size of tablet_metadata in memory: 57456 KiB After: Size of tablet_metadata in memory: 59504 KiB	2026-04-15 01:25:14 +02:00
Tomasz Grabiec	82acdae74b	locator: tablets: Introduce tablet_map::get_split_token() And reimplement existing split-related methods around it. This way we avoid calling dht::compaction_group_of(), and assuming anything about tablet boundaries or tablet count being a power of two. This will make later refactoring easier.	2026-04-15 01:24:48 +02:00
Tomasz Grabiec	2e1d41c206	dht: Introduce get_uniform_tokens()	2026-04-15 01:24:48 +02:00
Tomasz Grabiec	a58243bc1e	Merge 'hint_sender: send hints to all tablet replicas if the tablet leaving due to RF--' from Ferenc Szili Currently, hints that are sent to tablet replicas which are leaving due to RF-- can be lost, because `hint_sender` only checks if the destination host is leaving. To avoid this, we add a new method `effective_replication_map::is_leaving(host, token)` which checks if the tablet identified by the given token is leaving the host. This method is called by the `hint_sender` to check if the hint should be sent only to the destination host, or to all the replicas. This way, we increase consistency. For v-node based ERPs, `is_leaving()` calls `token_metadata::is_leaving(host)`. Fixes: SCYLLADB-287 This is an improvement, and backport is not needed. Closes scylladb/scylladb#28770 * github.com:scylladb/scylladb: test: verify hints are delivered during tablet RF reduction hint_sender: use per-tablet is_leaving() to avoid losing hints on RF reduction erm: add is_leaving() to effective_replication_map	2026-04-14 22:51:34 +02:00
Tomasz Grabiec	7fe4ae16f0	Merge 'table: don't create new split compaction groups if main compaction group is disabled' from Ferenc Szili Fixes a race condition where tablet split can crash the server during truncation. `truncate_table_on_all_shards()` disables compaction on all existing compaction groups, then later calls `discard_sstables()` which asserts that compaction is disabled. Between these two points, tablet split can call `set_split_mode()`, which creates new compaction groups via `make_empty_group()` — these start with `compaction_disabled_counter == 0`. When `discard_sstables()` checks its assertion, it finds these new groups and fires `on_internal_error`, aborting the server. In `storage_group::set_split_mode()`, before creating new compaction groups, check whether the main compaction group has compaction disabled. If it does, bail out early and return `false` (not ready). This is safe because the split will be retried once truncation completes and re-enables compaction. A new regression test `test_split_emitted_during_truncate` reproduces the exact interleaving using two error injection points: - `database_truncate_wait` — pauses truncation after compaction is disabled but before `discard_sstables()` runs. - `tablet_split_monitor_wait` (new, in `service/storage_service.cc`) — pauses the split monitor at the start of `process_tablet_split_candidate()`. The test creates a single-tablet table, triggers both operations, uses the injection points to force the problematic ordering, then verifies that truncation completes successfully and the split finishes afterward. Fixes: SCYLLADB-1035 This needs to be backported to all currently supported version. Closes scylladb/scylladb#29250 * github.com:scylladb/scylladb: test: add test_split_emitted_during_truncate table: fix race between tablet split and truncate	2026-04-14 22:00:40 +02:00
Avi Kivity	21d9f54a9a	partition_snapshot_row_cursor: fix reversed maybe_refresh() losing latest version entry In partition_snapshot_row_cursor::maybe_refresh(), the !is_in_latest_version() path calls lower_bound(_position) on the latest version's rows to find the cursor's position in that version. When lower_bound returns null (the cursor is positioned above all entries in the latest version in table order), the code unconditionally sets _background_continuity = true and allows the subsequent if(!it) block to erase the latest version's entry from the heap. This is correct for forward traversal: null means there are no more entries ahead, so removing the version from the heap is safe. However, in reversed mode, null from lower_bound means the cursor is above all entries in table order -- those entries are BELOW the cursor in query order and will be visited LATER during reversed traversal. Erasing the heap entry permanently loses them, causing live rows to be skipped. The fix mirrors what prepare_heap() already does correctly: when lower_bound returns null in reversed mode, use std::prev(rows.end()) to keep the last entry in the heap instead of erasing it. Add test_reversed_maybe_refresh_keeps_latest_version_entry to mvcc_test, alongside the existing reversed cursor tests. The test creates a two-version partition snapshot (v0 with range tombstones, v1 with a live row positioned below all v0 entries in table order), and traverses in reverse calling maybe_refresh() at each step -- directly exercising the buggy code path. The test fails without the fix. The bug was introduced by `6b7473be53` ("Handle non-evictable snapshots", 2022-11-21), which added null-iterator handling for non-evictable snapshots (memtable snapshots lack the trailing dummy entry that evictable snapshots have). prepare_heap() got correct reversed-mode handling at that time, but maybe_refresh() received only forward-mode logic. The bug is intermittent because multiple mechanisms cause iterators_valid() to return false, forcing maybe_refresh() to take the full rebuild path via prepare_heap() (which handles reversed mode correctly): - Mutation cleaner merging versions in the background (changes change_mark) - LSA segment compaction during reserve() (invalidates references) - B-tree rebalancing on partition insertion (invalidates references) - Debug mode's always-true need_preempt() creating many multi-version partitions via preempted apply_monotonically() A dtest reproducer confirmed the same root cause: with 100K overlapping range tombstones creating a massively multi-version memtable partition (287K preemption events), the reversed scan's latest_iterator was observed jumping discontinuously during a version transition -- the latest version's heap entry was erased -- causing the query to walk the entire partition without finding the live row. Fixes: SCYLLADB-1253 Closes scylladb/scylladb#29368	2026-04-14 21:50:25 +02:00
Nadav Har'El	986167a416	Merge 'cql3: fix authorization bypass via BATCH prepared cache poisoning' from Marcin Maliszkiewicz execute_batch_without_checking_exception_message() inserted entries into the authorized prepared cache before verifying that check_access() succeeded. A failed BATCH therefore left behind cached 'authorized' entries that later let a direct EXECUTE of the same prepared statement skip the authorization check entirely. Move the cache insertion after the access check so that entries are only cached on success. This matches the pattern already used by do_execute_prepared() for individual EXECUTE requests. Introduced in `98f5e49ea8` Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1221 Backport: all supported versions Closes scylladb/scylladb#29432 * github.com:scylladb/scylladb: test/cqlpy: add reproducer for BATCH prepared auth cache bypass cql3: fix authorization bypass via BATCH prepared cache poisoning	2026-04-14 22:31:54 +03:00
Pavel Emelyanov	cec44dc68d	test: Add test_describe_cluster_sanity for DESCRIBE CLUSTER validation Add parametrized integration test that verifies DESCRIBE CLUSTER returns correct values in both normal and maintenance modes: The parametrization keeps the validation logic (CQL queries and assertions) identical for both modes, while the setup phase is mode-specific. This ensures the same assertions apply to both cluster states: - partitioner is org.apache.cassandra.dht.Murmur3Partitioner - snitch is org.apache.cassandra.locator.SimpleSnitch - cluster name matches system.local cluster_name Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-14 19:33:21 +03:00
Pavel Emelyanov	debfb147f5	describe_statement: Get cluster info from storage_service Update cluster_describe_statement::describe() to retrieve cluster metadata from storage_service::describe_cluster() instead of directly from db::config or gossiper. The storage_service provides a centralized API for accessing cluster metadata (cluster_name, partitioner, snitch_name) that works in both normal and maintenance modes, improving separation of concerns. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-14 19:33:06 +03:00
Pavel Emelyanov	53361358ef	storage_service: Add describe_cluster() method Add cluster_info struct containing cluster_name, partitioner, and snitch_name. Implement describe_cluster() method to provide cluster metadata by combining data from gossiper (cluster_name, partitioner) and snitch (snitch_name). It will be used by next patch Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-14 19:29:24 +03:00
Pavel Emelyanov	0d4a8a04ec	query_processor: Expose storage_service accessor Add storage_service() method to expose the sharded storage service to callers. To be used by next patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-14 19:29:11 +03:00
Radosław Cybulski	4b984212ba	alternator: improve parsing / generating of StreamArn parameter Previously Alternator, when emit Amazon's ARN would not stick to the standard. After our attempt to run KCL with scylla we discovered few issues. Amazon's ARN looks like this: arn:partition:service:region:account-id:resource-type/resource-id for example: arn:aws:dynamodb:us-west-2:111122223333:table/TestTable/stream/2015-05-11T21:21:33.291 KCL checks for: - ARN provided from Alternator calls must fit with basic Amazon's ARN pattern shown above, - region constisting only of lower letter alphabets and `-`, no underscore character - account-id being only digits (exactly 12) - service being `dynamodb` - partition starting with `aws` The patch updates our code handling ARNs to match those findings. 1. Split `stream_arn` object into `stream_arn` - ARN for streams only and `stream_shard_id` - id value for stream shards. The latter receives original implementation. The former emits and parses ARN in a Amazon style. for example: 2. Update new `stream_arn` class to encode keyspace and table together separating them by `@`. New ARN looks like this: arn:aws:dynamodb:us-east-1:000000000000:table/TestKeyspace@TestTable/stream/2015-05-11T21:21:33.291 3. hardcode `dynamodb` as service, `aws` as partition, `us-east-1` as region and `000000000000` as account-id (must have 12 digits) 4. Update code handling ARNs for tags manipulation to be able to parse Amazon's style ARNs. Emiting code is left intact - the parser is now capable of parsing both styles. 5. Added unit tests. Fixes #28350 Fixes: SCYLLADB-539 Fixes: #28142 Closes scylladb/scylladb#28187	2026-04-14 18:07:05 +03:00
Marcin Maliszkiewicz	de19714763	Merge 'cql3: prepare list statments metadta_id during prepare statement , send the correct metadata_id directly to the client ' from Alex Dathskovsky This series makes result metadata handling for auth LIST statements consistent and adds coverage for the driver-visible behavior. The first patch makes the result-column metadata construction shared across the affected statements, so the metadata shape used for PREPARE and EXECUTE stays uniform and easier to reason about. The second patch adds regression coverage for both sides of the metadata-id flow: - a Python auth-cluster test verifies that prepared LIST ROLES OF returns a non-empty result metadata id and that a later EXECUTE reuses it without METADATA_CHANGED - a Boost transport test covers the recovery path where the client sends an empty request metadata id and the server responds with METADATA_CHANGED and the full metadata Together these patches tighten the implementation and protect the prepared-metadata-id behavior exposed to drivers. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1218 backport: this change should be backported to all active branches to help the driver operation Closes scylladb/scylladb#29347	2026-04-14 16:09:49 +02:00
bitpathfinder	c1315f9f1e	commitlog: add test to verify segment replay order Add a boost test that verifies commitlog segments are replayed in ascending segment ID order within each shard. The test creates multiple segments, triggers replay via commitlog_replayer, and captures the "Replaying" debug log messages to verify the order. Correct segment ordering is required by the strongly consistent tables feature, particularly commitlog-based storage that relies on replayed raft items being stored in order. Ref SCYLLADB-1411.	2026-04-14 16:06:13 +02:00
bitpathfinder	c06adffd6a	commitlog: fix replay order by using ordered map per shard The commitlog replayer groups segments by shard using a std::unordered_multimap, then iterates per-shard segments via equal_range(). However, equal_range() does not guarantee iteration order for elements with the same key, so segments could be replayed out of order within a shard. This can increase memory and disk consumption during fragmented entry reconstruction, which accumulates fragments across segments and benefits from ascending ID order. This is also required by the strongly consistent tables feature, particularly commitlog-based storage that relies on replayed raft items being stored in order. Fix by changing the data structure from std::unordered_multimap<unsigned, commitlog::descriptor> to std::unordered_map<unsigned, utils::chunked_vector<commitlog::descriptor>> Since the descriptors are inserted from a std::set ordered by ID, the vector preserves insertion (and thus ID) order. The per-shard iteration now simply iterates the vector, guaranteeing correct replay order. Fixes SCYLLADB-1411.	2026-04-14 16:05:17 +02:00
Anna Stuchlik	633297b15d	doc: remove an oudated troubleshooting page Fixes https://github.com/scylladb/scylladb/issues/29405 Closes scylladb/scylladb#29431	2026-04-14 15:14:32 +03:00
Ernest Zaslavsky	0eb6270c82	ci: add build system comparison workflow Add a GitHub Actions workflow that runs scripts/compare_build_systems.py on PRs touching build system files (configure.py, /CMakeLists.txt, cmake/). This prevents future deviations between the two build systems by catching mismatches early in the CI pipeline. Closes scylladb/scylladb#29426	2026-04-14 14:53:12 +03:00
Avi Kivity	4a9fdb17f0	build: cmake: fix -fno-sanitize-address-use-after-scope for CQL parser The CMake build had -fsanitize-address-use-after-scope (enable) when it should have been -fno-sanitize-address-use-after-scope (disable). The comment on lines 24-25 of cql3/CMakeLists.txt explains the intent: the use-after-scope sanitizer uses too much stack space on CqlParser and overflows the stack. The Python-ninja path in configure.py:2801-2802 correctly had -fno-sanitize-address-use-after-scope. Found by black-box comparison of compiler flags between the Python-ninja and CMake build paths (ninja -nv output, debug mode, CqlParser.o): Python-ninja: -fno-sanitize-address-use-after-scope (correct: disable) CMake: -fsanitize-address-use-after-scope (wrong: enable) Closes scylladb/scylladb#29439	2026-04-14 14:48:52 +03:00
Avi Kivity	ebdfa10c8f	test: fix flaky test_incremental_repair_race_window_promotes_unrepaired_data The test waited for two "Finished tablet repair" log messages on the coordinator, expecting one per tablet. But there are two log sources that emit messages matching this pattern: repair module (repair/repair.cc:2329): "Finished tablet repair for table=..." topology coordinator (topology_coordinator.cc:2083): "Finished tablet repair host=..." When the coordinator is also a repair replica (always the case with RF=3 and 3 nodes), both messages appear in the coordinator log for the same tablet within 1ms of each other. The test consumed both, thinking both tablets were done, while the second tablet repair was still running. From the CI failure logs: 04:08:09.658 Found: repair[...]: Finished tablet repair for table=... global_tablet_id=e42fd650-3542-11f1-9756-85403784a622:0 04:08:09.660 Found: raft_topology - Finished tablet repair host=... tablet=e42fd650-3542-11f1-9756-85403784a622:0 Both messages are for tablet :0. Tablet :1 repair had not finished yet. The test then wrote keys 20-29 while the second tablet repair was still in progress. That repair flushed the memtable (via prepare_sstables_for_incremental_repair), including keys 20-29 in the repair scan, and mark_sstable_as_repaired set repaired_at=2 on the resulting sstable. This caused the assertion failure on servers[0]: "should not have post-repair keys in repaired sstables, got: {20, 21, 22, 23, 24, 25, 26, 27, 28, 29}" Fix by matching "Finished tablet repair host=" which is unique to the topology coordinator message and avoids the ambiguity. Also fix an incorrect comment that said being_repaired=null when at that point in the test being_repaired is still set to the session_id (the delay_end_repair_update injection prevents end_repair from running). Fixes: SCYLLADB-1478 Closes scylladb/scylladb#29444	2026-04-14 13:32:51 +02:00
Piotr Dulikowski	9fc2c65d18	Merge 'cql3: implement WRITETIME() and TTL() of individual elements of map, set, and UDT' from Nadav Har'El In commit `727f68e0f5` we added the ability to SELECT: * Individual elements of a map: `SELECT map_col[key]`. * Individual elements of a set: `SELECT set_col[key]` returns key if the key exists in the set, or null if it doesn't, allowing to check if the element exists in the set. * Individual pieces of a UDT: `SELECT udt_col.field`. But at the time, we didn't provide any way to retrieve the meta-data for this value, namely its timestamp and TTL. We did not support `SELECT TIMESTAMP(collection[key])`, or `SELECT TIMESTAMP(udt.field)`. Users requested to support such SELECTs in the past (see issue #15427), and Cassandra 5.0 added support for this feature - for both maps and sets and udts - so we also need this feature for compatibility. This feature was also requested recently by vector-search developers, who wanted to read Alternator columns - stored as map elements, not individual columns - with their WRITETIME information. The first four patches in this series adds the feature (in four smaller patches instead one big one), the fifth and sixth patches add tests (cqlpy and boost tests, respectively). The seventh patch adds documentation. All the new tests pass on Cassandra 5, failed on Scylla before the present fix, and pass with it. The fix was surprisingly difficult. Our existing implementation (from `727f68e0f5` building on earlier machinery) doesn't just "read" `map_col[key]` and allow us to return just its timestamp. Rather, the implementation reads the entire map, serializes it in some temporary format that does not include the timestamps and ttls, and then takes the subscript key, at which point we no longer have the timestamp or ttl of the element. So the fix had to cross all these layers of the implementation. While adding support for UDT fields in a pre-existing grammar nonterminal "subscriptExpr", we unintentionally added support for UDT fields also in LWT expressions (which used this nonterminal). LWT missing support for UDT fields was a long-time known compatibility issue (#13624) so we unintentionally fixed it :-) Actually, to completely fix it we needed another small change in the expression implementation, so the eighth patch in this series does this. Fixes #15427 Fixes #13624 Closes scylladb/scylladb#29134 * github.com:scylladb/scylladb: cql3: support UDT fields in LWT expressions cql3: document WRITETIME() and TTL() for elements of map, set or UDT test/boost: test WRITETIME() and TTL() on map collection elements test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT cql3: prepare and evaluate WRITETIME/TTL on collection elements and UDT fields cql3: parse per-element timestamps/TTLs in the selection layer cql3: add extended wire format for per-element timestamps and TTLs cql3: extend WRITETIME/TTL grammar to accept collection and UDT elements	2026-04-14 12:35:46 +02:00
Dawid Pawlik	f40ab83d02	docs: document vector index metadata and duplicate handling Document the new vector index behavior in the user-facing and developer docs. Describe `index_version` as a creation timeuuid stored in `system_schema.indexes`, clarify that recreating an index changes it while ALTER TABLE does not, and document that Scylla allows multiple named vector indexes on the same column while still rejecting unnamed duplicates.	2026-04-14 12:21:38 +02:00
Dawid Pawlik	800dec2180	test/cqlpy: cover vector index duplicate creation rules Add cqlpy tests for the current CREATE INDEX behavior of vector indexes. Cover named and unnamed duplicates, IF NOT EXISTS, coexistence of multiple named vector indexes on the same column, interactions between named and unnamed indexes, and the same-name-on-different-table case.	2026-04-14 12:21:38 +02:00
Marcin Maliszkiewicz	db5e4f2cb8	test/cqlpy: add reproducer for BATCH prepared auth cache bypass An unprivileged user could bypass authorization checks by exploiting the BATCH prepared statement cache: 1. Prepare an INSERT on a table the user has no access to 2. Execute it inside a BATCH — gets Unauthorized 3. Execute the same prepared INSERT directly — succeeds	2026-04-14 10:37:42 +02:00
Marcin Maliszkiewicz	8401e9cbbd	test: filter benign errors in tests that grep logs during shutdown Apply filter_errors() to grep_for_errors() results in test_split_stopped_on_shutdown and test_group0_apply_while_node_is_being_shutdown. Without filtering, benign RPC errors like 'connection dropped: Semaphore broken' that occur during graceful shutdown cause spurious test failures.	2026-04-13 18:33:41 +02:00
Marcin Maliszkiewicz	e78e6cd584	test: filter_errors: support list[list[str]] error groups Accept both list[str] (from distinct_errors=True) and list[list[str]] (from distinct_errors=False) in filter_errors(), matching against the first line of each error group. This allows tests that call grep_for_errors() with default arguments to pipe results directly through filter_errors().	2026-04-13 18:33:29 +02:00
Alex	fdce8824a5	test/cluster: cover prepared LIST metadata ids in one setup Precompute the expected metadata-id hashes for the prepared LIST auth and service-level statements and verify that PREPARE returns them while EXECUTE reuses the prepared metadata without METADATA_CHANGED. Run all cases in a single auth-cluster test after preparing the cluster, role, and service level once through the regular manager fixture.	2026-04-13 19:13:12 +03:00
Marcin Maliszkiewicz	4d3ca041bb	cql3: fix authorization bypass via BATCH prepared cache poisoning execute_batch_without_checking_exception_message() inserted entries into the authorized prepared cache before verifying that check_access() succeeded. A failed BATCH therefore left behind cached 'authorized' entries that later let a direct EXECUTE of the same prepared statement skip the authorization check entirely. Move the cache insertion after the access check so that entries are only cached on success. This matches the pattern already used by do_execute_prepared() for individual EXECUTE requests. Introduced in `98f5e49ea8` Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1221	2026-04-13 17:57:22 +02:00
Alex	0f6d9ffd22	cql: expose stable result metadata for prepared LIST statements Prepared LIST statements were not calculating metadata in PREPARE path, and sent empty string hash to client causing problematic behaviour where metadat_id was not recalculated correctly. This patch moves metadata construction into get_result_metadata() for the affected LIST statements and reuse that metadata when building the result set. This gives PREPARE a stable metadata id for LIST ROLES, LIST USERS, LIST PERMISSIONS and the service-level variants. This patch also adds a new boost test that verifies that when an EXECUTE request carries an empty result metadata id while the server has a real metadata id for the result set, the response is marked METADATA_CHANGED and includes the full result metadata plus the server metadata id. This covers the recovery path for clients that send an empty or otherwise unusable metadata id instead of a matching cached one.	2026-04-13 17:49:27 +03:00
Dawid Pawlik	63b782451e	vector_index: allow multiple named indexes on one column Allow creating multiple named vector indexes on the same column while still rejecting duplicate unnamed ones. `index_metadata::equals_noname()` now ignores `index_version`, which is unique for every vector index creation, so duplicate detection keeps working for unnamed vector indexes. CREATE INDEX keeps using structural duplicate detection for regular indexes and unnamed vector indexes, but named vector indexes are checked by name only. The explicit name check is also needed for IF NOT EXISTS when the same index name already exists on a different table in the same keyspace, because vector indexes have no backing view table to catch that case.	2026-04-13 15:04:59 +02:00
Ferenc Szili	e904e7a715	test: add test_split_emitted_during_truncate Add a regression test that reproduces the race between tablet split and truncation. The test: 1. Creates a single-tablet table and inserts data. 2. Triggers truncation and pauses it (via database_truncate_wait) after compaction is disabled but before discard_sstables() runs. 3. Triggers tablet split and pauses it (via tablet_split_monitor_wait) at the start of process_tablet_split_candidate(). 4. Releases split so set_split_mode() creates new compaction groups. 5. Waits for the set_split_mode log confirming the groups exist. 6. Releases truncation so discard_sstables() encounters the new groups. 7. Verifies truncation completes and split finishes. Adds a tablet_split_monitor_wait error injection point in process_tablet_split_candidate() to allow pausing the split monitor before it enters the split loop.	2026-04-13 11:05:03 +02:00
Ferenc Szili	13d9561398	table: fix race between tablet split and truncate Tablet split can call set_split_mode() between the point where truncate_table_on_all_shards() disables compaction on all existing compaction groups and the point where discard_sstables() checks that compaction is disabled. The new split-ready compaction groups created by set_split_mode() won't have compaction disabled, causing discard_sstables() to fire on_internal_error. Fix by preventing set_split_mode() from creating new compaction groups when compaction is disabled on the main group. If truncation has already disabled compaction, split will simply report not-ready rather than creating groups which have compaction enabled. This is safe because split will be retried once truncation completes and re-enables compaction.	2026-04-13 11:04:38 +02:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Avi Kivity	22949bae52	Merge 'logstor: implement tablet split/merge and migration' from Michael Litvak implement tablet split, tablet merge and tablet migration for tables that use the experimental logstor storage engine. * tablet merge simply merges the histograms of segments of one compaction group with another. * for tablet split we take the segments from the source compaction group, read them and write all live records to separate segments according to the split classifier, and move separated segments to the target compaction groups. * for tablet migration we use stream_blob, similarly to file streaming of sstables. we add a new op type for streaming a logstor segment. on the source we take a snapshot of the segments with an input stream that reads the segment, and on the target we create a sink that allocates a new segment on the target shard and writes to it. * we also do some improvements for recovery and loading of segments. we add a segment header that contains useful information for non-mixed segments, such as the table and token range. Refs SCYLLADB-770 no backport - still a new and experimental feature Closes scylladb/scylladb#29207 * github.com:scylladb/scylladb: test: logstor: additional logstor tests docs/dev: add logstor on-disk format section logstor: add version and crc to buffer header test: logstor: tablet split/merge and migration logstor: enable tablet balancing logstor: streaming of logstor segments using stream_blob logstor: add take_logstor_snapshot logstor: segment input/output stream logstor: implement compaction_group::cleanup logstor: tablet split logstor: tablet merge logstor: add compaction reenabler logstor: add segment header logstor: serialize writes to active segment replica: extend compaction_group functions for logstor replica: add compaction_group_for_logstor_segment logstor: code cleanup	2026-04-12 16:11:12 +03:00
Israel Fruchter	79c736455e	cqlsh: update to v6.0.34-scylla Update cqlsh to version v6.0.34-scylla. Notable fix: - Fix vector type formatting error (scylladb/scylla-cqlsh#165) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Closes scylladb/scylladb#29401	2026-04-12 14:54:50 +03:00
Nadav Har'El	33dbb63aef	cql3: support UDT fields in LWT expressions In an earlier patch, we used the CQL grammar's "subscriptExpr" in the rule for WRITETIME() and TTL(). But since we also wanted these to support UDT fields (x.a), not just collection subscripts (x[3]), we expanded subscriptExpr to also support the field syntax. But LWT expressions already used this subscriptExpr, which meant that LWT expressions unintentionally gained support for UDT fields. Missing support for UDT fields in LWT is a long-standing known Cassandra-compatibility bug (#13624), and now our grammar finally supports the missing syntax. But supporting the syntax is not enough for correct implementation of this feature - we also need to fix the expression handling: Two bugs prevented expressions like `v.a = 0` from working in LWT IF clauses, where `v` is a column of user-defined type. The first bug was in get_lhs_receiver() in prepare_expr.cc: it lacked a handler for field_selection nodes, causing an "unexpected expression" internal error when preparing a condition like `IF v.a = 0`. The fix adds a handler that returns a column_specification whose type is taken from the prepared field_selection's type field. The second bug was in search_and_replace() in expression.cc: when recursing into a field_selection node it reconstructed it with only `structure` and `field`, silently dropping the `field_idx` and `type` fields that are set during preparation. As a result, any transformation that uses search_and_replace() on a prepared expression containing a field_selection — such as adjust_for_collection_as_maps() called from column_condition_prepare() — would zero out those fields. At evaluation time, type_of() on the field_selection returned a null data_type pointer, causing a segmentation fault when the comparison operator tried to call ->equal() through it. The fix preserves field_idx and type when reconstructing the node. Fixes #13624.	2026-04-12 14:28:01 +03:00
Nadav Har'El	bb2fb810bb	cql3: document WRITETIME() and TTL() for elements of map, set or UDT Add to the SELECT documentation (docs/cql/dml/select.rst) documentation of the new ability to select WRITETIME() and TTL() of a single element of map, set or UDT. Also in the TTL documentation (docs/cql/time-to-live.rst), which already had a section on "TTL for a collection", add a mention of the ability to read a single element's TTL(), and an example. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-12 14:28:01 +03:00
Nadav Har'El	a544dae047	test/boost: test WRITETIME() and TTL() on map collection elements Add tests in test/boost/expr_test.cc for the low-level implementation of writetime() and ttl() on a map element. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-12 14:28:01 +03:00
Nadav Har'El	ccb94618cc	test/cqlpy: test WRITETIME() and TTL() on element of map, set or UDT This patch adds many tests verifying the behavior of WRITETIME() and TTL() on individual elements of maps, sets and UDTs, serving as a regression test for issue #15427. We also add tests verifying our understanding of related issues like WRITETIME() and TTL() of entire collections and of individual elements of frozen collections. All new tests pass on Cassandra 5.0, helping to verify that our implementation is compatible with Cassandra. They also pass on ScyllaDB after the previous patch (most didn't before that patch). Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-04-12 14:27:40 +03:00
Nadav Har'El	35e807a36c	cql3: prepare and evaluate WRITETIME/TTL on collection elements and UDT fields Complete the implementation of SELECT WRITETIME(col[key])/TTL(col[key]) and WRITETIME(col.field)/TTL(col.field), building on the grammar (commit 1), wire format (commit 2), and selection-layer (commit 3) changes in the preceding patches. * prepare_column_mutation_attribute() (prepare_expr.cc) now handles the subscript and field_selection nodes that the grammar produces: - For subscripts, it validates that the inner column is a non-frozen map or set and checks the 'writetime_ttl_individual_element' feature flag so the feature is rejected during rolling upgrades. - For field selections, it validates that the inner column is a non-frozen UDT, with the same feature-flag check. * do_evaluate(column_mutation_attribute) (expression.cc) handles the same two cases. For a field selection it serializes the field index as a key and looks it up in collection_element_metadata; for a subscript it evaluates the subscript key and looks it up in the same map. A missing key (element not found or expired) returns NULL, matching Cassandra behavior. Together with the preceding three patches, this finally fixes #15427. The next three patches will add tests and documentation for the new feature, and the final eighth patch will fix the implementation of UDT fields in LWT expressions - which the first patch made the grammar allow but is still not implemented correctly.	2026-04-12 13:28:28 +03:00
Nadav Har'El	4ac63de063	cql3: parse per-element timestamps/TTLs in the selection layer Wire up the selection and result-set infrastructure to consume the extended collection wire format introduced in the previous patch and expose per-element timestamps and TTLs to the expression evaluator. * Add collection_cell_metadata: maps from raw element-key bytes to timestamp and remaining TTL, one entry per collection or UDT cell. Add a corresponding collection_element_metadata span to evaluation_inputs so that evaluators can access it. * Add a flag _collect_collection_timestamps to selection (selection.hh/cc). When any selected expression contains a WRITETIME(col[key])/TTL(col[key]) or WRITETIME(col.field)/TTL(col.field) attribute, the flag is set and the send_collection_timestamps partition-slice option is enabled, causing storage nodes to use the extended wire format from the previous patch. * Implement result_set_builder::add_collection() (selection.cc): when _collect_collection_timestamps is set, parse the extended format, decode per-element timestamps and remaining TTLs (computed from the stored expiry time and the query time), and store them in _collection_element_metadata indexed by column position. When the flag is not set, the existing plain-bytes path is unchanged. After this patch, the new selection feature is still not available to the end-user because the prepare step still forbids it. The next patch will finally complete the expression preparation and evaluation. It will read the new collection_element_metadata and return the correct timestamp or TTL value.	2026-04-12 12:51:06 +03:00
Nadav Har'El	bb63db34e5	cql3: add extended wire format for per-element timestamps and TTLs Introduce the infrastructure needed to transport per-element timestamps and TTL expiry times from replicas to coordinators, required for WRITETIME(col[key]) / TTL(col[key]) and WRITETIME(col.field) / TTL(col.field). * Add a 'writetime_ttl_individual_element' cluster feature flag that guards usage of the new wire format during rolling upgrades: the extended format is only emitted and consumed when every node in the cluster supports it. * Implement serialize_for_cql_with_timestamps() (types/types.cc), a variant of serialize_for_cql() that appends a per-element section to the regular CQL bytes, listing each live element's serialized key, timestamp, and expiry. The format is: [uint32 cql_len][cql bytes] [int32 entry_count] [per entry: (int32 key_len)(key bytes)(int64 timestamp)(int64 expiry)] expiry is -1 when the element has no TTL. * Add partition_slice::option::send_collection_timestamps and modify write_cell() (mutation_partition.cc) to use the new function serialize_for_cql_with_timestamps() when this option is available. This commit stands alone with no user-visible effect: nothing yet sets the new partition-slice option. The next patch adds the selection-layer code that sets the option and parses the extended response.	2026-04-12 11:49:06 +03:00
Nadav Har'El	38b675737d	cql3: extend WRITETIME/TTL grammar to accept collection and UDT elements Previously, WRITETIME() and TTL() only accepted a simple column name (cident), so WRITETIME(m['key']) or WRITETIME(x.a) was a syntax error. This patch begins to implements support for applying WRITETIME() and TTL() to individual elements of a non-frozen map, set or UDT, as requested in issue #15427. On its own this commit only changes the parser (Cql.g). The prepare step still rejects subscript and field-selection nodes with an invalid_request_exception, so there is no user-visible behavior change yet - just that a syntax error is replaced by a different error. Upcoming patches add the extended wire format for per-element timestamps (commit 2), the selection layer that consumes it (commit 3), and the prepare/evaluate logic that ties everything together (commit 4), after which WRITETIME() and TTL(col[key]) for collection or UDT elements will finally be fully functional. The parser change in this patch expands the subscriptExpr rule to support the col.field syntax, not only col[key]. This change also allows the UDT field syntax to be used in LWT conditions, which is another long-standing missing feature (#13624); But to correctly support this feature we'll need an additional patch to fix a couple of remaining bugs - this will be the eighth commit in this series.	2026-04-12 11:10:23 +03:00
Benny Halevy	e4f0539acf	query: result_set: change row member to a chunked vector To prevent large memory allocations. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-04-12 10:00:49 +03:00
Benny Halevy	b433a5bcf8	query: result_set_row: make noexcept Remove const specifier from result_set_row._cells member to make the class nothrow_move_constructible and nothrow_move_assignable To be used later in query result_set and friends. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-04-12 10:00:39 +03:00
Benny Halevy	c0607110c4	query: non_null_data_value: assert is_nothrow_move_constructible and assignable To be used later in query result_set{row,} and friends. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-04-12 10:00:34 +03:00
Benny Halevy	afa438d60d	types: data_value: assert is_nothrow_move_constructible and assignable Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-04-12 10:00:13 +03:00
Avi Kivity	8ccee6803e	Merge 'Remove upgrade view builder' from Gleb Natapov Since we do no longer support upgrade from versions that do not support v2 of "view building status" code (building status is managed by raft) we can remove v1 code and upgrade code and make sure we do not boot with old "builder status" version. v2 version was introduced by `8d25a4d678` which is included in scylla-2025.1.0. No backport needed since this is code removal. Closes scylladb/scylladb#29105 * github.com:scylladb/scylladb: view: drop unused v1 builder code view: remove upgrade to raft code	2026-04-12 00:39:26 +03:00
Botond Dénes	9770a4c081	test/cluster/test_encryption.py: use single-partition reads in read_verify_workload() Replace the range scan in read_verify_workload() with individual single-partition queries, using the keys returned by prepare_write_workload() instead of hard-coding them. The range scan was previously observed to time out in debug mode after a hard cluster restart. Single-partition reads are lighter on the cluster and less likely to time out under load. The new verification is also stricter: instead of merely checking that the expected number of rows is returned, it verifies that each written key is individually readable, catching any data-loss or key-identity mismatch that the old count-only check would have missed. This is the second attemp at stabilizing this test, after the recent `854c374ebf`. That fix made sure that the cluster has converged on topology and nodes see each other before running the verify workload. Fixes: SCYLLADB-1331 Closes scylladb/scylladb#29313	2026-04-12 00:38:20 +03:00
Avi Kivity	ca80ee8586	Merge 'Introduce maintenance scheduling supergroup and do initial population' from Pavel Emelyanov The supergroup replaces streaming (a.k.a. maintenance as well) group, inherits 200 shares from it and consists of four sub-groups (all have equal shares of 200 withing the new supergroup) * maintenance_compaction. This group configures `compaction_manager::maintenance_sg()` group. User-triggered compaction runs in it * backup. This group configures `snapshot_ctl::config::backup_sched_group`. Native backup activity runs there * maintenance. It's a new "visible" name, everything that was called "maintenance" in the code ran in "streaming" group. Now it will run in "maintenance". The activities include those that don't communicate over RPC (see below why) * `tablet_allocator::balance_tablets()` * `sstables_manager::components_reclaim_reload_fiber()` * `tablet_storage_group_manager::merge_completion_fiber()` * metrics exporting http server altogether * streaming. This is purely existing streaming group that just moves under the new supergroup. Everything else that was run there, continues doing so, including * hints sender * all view building related components (update generator, builder, workers) * repair * stream_manager * messaging service (except for verb handlers that switch groups) * join_cluster() activity * REST API * ... something else I forgot The `--maintenance_io_throughput_mb_per_sec` option is introduced. It controls the IO throughput limit applied to the maintenance supergroup. If not set, the `--stream_io_throughput_mb_per_sec` option is used to preserve backward compatibility. All new sched groups inherit `request_class::maintenance` (however, "backup" seem not to make any requests yet). Moving more activities from "streaming" into "maintenance" (or its own group) is possible, but one will need to take care of RPC group switching. The thing is that when a client makes an RPC call, the server may switch to one of pre-negotiated scheduling groups. Verbs for existing activities that run in "streaming" group are routed through RPC index that negotiates "streaming" group on the server side. If any of that client code moves to some other group, server will still run the handlers in "streaming" which is not quite expected. That's one of the main reasons why only the selected fibers were moved to their own "maintenance" group. Similar for backup -- this code doesn't use RPC, so it can be moved. Restoring code uses load-and-stream and corresponding RPCs, so it cannot be just moved into its own new group. Fixes SCYLLADB-351 New feature, not backporting Closes scylladb/scylladb#28542 * github.com:scylladb/scylladb: code: Add maintenance/maintenance group backup: Add maintenance/backup group compaction: Add maintenance/maintenance_compaction group main: Introduce maintenance supergroup main: Move all maintenance sched group into streaming one database: Use local variable for current_scheduling_group code: Live-update IO throughputs from main	2026-04-12 00:34:48 +03:00
Botond Dénes	3289928679	repair: fix quadratic complexity when loading repair history shared_tombstone_gc_state::update_repair_time() uses copy-on-write semantics: each call copies the entire per_table_history_maps and the per-table repair_history_map. repair_service::load_history() called this once per history entry, making the load O(N²) in both time and memory. Introduce batch_update_repair_time() which performs a single copy-on-write for any number of entries belonging to the same table. Restructure load_history() to collect entries into batches of up to 1000 and flush each batch in one call, keeping peak memory bounded. The batch size limit is intentional: the repair history table currently has no bound on the number of entries and can grow large. Note that this does not cause a problem in the in-memory history map itself: entries are coalesced internally and only the latest repair time is kept per range. The unbounded entry count only makes the batched update during load expensive. Fixes: SCYLLADB-104 Closes scylladb/scylladb#29326	2026-04-11 23:54:26 +03:00
Petr Gusev	8a16746e55	strong_consistency: fix crash when DROP TABLE races with in-flight DML When DROP TABLE races with an in-flight DML on a strongly-consistent table, the node aborts in groups_manager::acquire_server() because the raft group has already been erased from _raft_groups. A concurrent DROP TABLE may have already removed the table from database registries and erased the raft group via schedule_raft_group_deletion. The schema.table() in create_operation_ctx() might not fail though because someone might be holding lw_shared_ptr<table>, so that the table is dropped but the table object is still alive. Fix by accepting table_id in acquire_server and checking that the table still exists in the database via find_column_family before looking up the raft group. If the table has been dropped, find_column_family throws no_such_column_family instead of the node aborting via on_internal_error. When the table does exist, acquire_server proceeds to acquire state.gate; schedule_raft_group_deletion co_awaits gate::close, so it will wait for the DML operation to complete before erasing the group. Fixes SCYLLADB-1450	2026-04-10 22:56:16 +02:00
Petr Gusev	82460e7a38	test: add regression test for DROP TABLE racing with in-flight DML Add test_drop_table_during_insert that reproduces a crash when DROP TABLE races with an in-flight INSERT on a strongly-consistent table. The test uses an error injection to pause INSERT between obtaining the ERM and calling acquire_server, then drops the table (which destroys the raft group), then resumes the INSERT. Without a fix, the node aborts in acquire_server via on_internal_error. The test is marked as skip until the fix is in place.	2026-04-10 22:56:16 +02:00
Michał Hudobski	7d648961ed	vector_search: forward non-primary key restrictions to Vector Store service Include non-primary key restrictions (e.g. regular column filters) in the filter JSON sent to the Vector Store service. Previously only partition key and clustering column restrictions were forwarded, so filtering on regular columns was silently ignored. Add get_nonprimary_key_restrictions() getter to statement_restrictions. Add unit tests for non-primary key equality, range, and bind marker restrictions in filter_test. Fixes: SCYLLADB-970 Closes scylladb/scylladb#29019	2026-04-10 17:16:29 +02:00
Piotr Smaron	477353b15c	auth: sanitize {USER} substitution in LDAP URL templates LDAPRoleManager interpolated usernames directly into ldap_url_template. That allowed LDAP filter metacharacters to change the query, and URL metacharacters such as %, ?, and # to change how ldap_url_parse() split the URL. Apply two layers of encoding when substituting {USER}: 1. RFC 4515 filter escaping -- neutralises filter operators. 2. URL percent-encoding -- prevents ldap_url_parse from misinterpreting %-sequences, ? delimiters, or # fragments. Add validate_query_template() (called from start()) which uses a sentinel round-trip through ldap_url_parse to reject templates that place {USER} outside the filter component. Templates that previously placed {USER} in the host or base DN were silently accepted; they are now rejected at startup with a descriptive error. Change parse_url() to take const sstring& instead of string_view to enforce the null-termination requirement of ldap_url_parse() at the type level. Add regression coverage for %2a, ?, #, and invalid {USER} placement in the base DN, host, attributes, and extensions. Update LDAP authorization docs to document the escaping behavior and the {USER} placement restriction. Fixes: SCYLLADB-1309	2026-04-10 14:00:47 +02:00
Dawid Pawlik	2dd8eef38c	vector_index: store `index_version` as creation timeuuid Vector indexes currently store the base table schema version in `index_version`. That value is name-based, not time-based, so it does not represent when the index was created. Store a timeuuid instead and change the relevant interfaces from `table_schema_version` to `utils::UUID`. This is a prerequisite for supporting multiple vector indexes on the same column where the oldest index must be selected deterministically via routing implemented in Vector Store. Update the cqlpy tests to check the new semantics directly: recreating the index changes `index_version`, while ALTER TABLE does not.	2026-04-10 13:05:21 +02:00
Piotr Dulikowski	3bd770d4d9	Merge 'counters: reuse counter IDs by rack' from Michael Litvak For counter updates, use a counter ID that is constructed from the node's rack instead of the node's host ID. A rack can have at most two active tablet replicas at a time: a single normal tablet replica, and during tablet migration there are two active replicas, the normal and pending replica. Therefore we can have two unique counter IDs per rack that are reused by all replicas in the rack. We construct the counter ID from the rack UUID, which is constructed from the name "dc:rack". The pending replica uses a deterministic variation of the rack's counter ID by negating it. This improves the performance and size of counter cells by having less unique counter IDs and less counter shards in a counter cell. Previously the number of counter shards was the number of different host_id's that updated the counter, which can be typically the number of nodes in the cluster and continue growing indefinitely when nodes are replaced. with the rack-based counter id the number of counter shards will be at most twice the number of different racks (including removed racks, which should not be significant). Fixes SCYLLADB-356 backport not needed - an enhancement Closes scylladb/scylladb#28901 * github.com:scylladb/scylladb: docs/dev: add counters doc counters: reuse counter IDs by rack	2026-04-10 12:24:18 +02:00
Wojciech Mitros	163c6f71d6	transport: refactor result_message bounce interface Replace move_to_shard()/move_to_host() with as_bounce()/target_shard()/ target_host() to clarify the interface after bounce was extended to support cross-node bouncing. - Add virtual as_bounce() returning const bounce* to the base class (nullptr by default, overridden in bounce to return this), replacing the virtual move_to_shard() which conflated bounce detection with shard access - Rename move_to_shard() -> target_shard() (now non-virtual, returns unsigned directly) and move_to_host() -> target_host() on bounce - Replace dynamic_pointer_cast with static_pointer_cast at call sites that already checked as_bounce() - Move forward declarations of message types before the virtual methods so as_bounce() can reference bounce Fixes: SCYLLADB-1066 Closes scylladb/scylladb#29367	2026-04-10 12:17:43 +02:00
Piotr Dulikowski	32e3a01718	Merge 'service: strong_consistency: Allow for aborting operations' from Dawid Mędrek Motivation ---------- Since strongly consistent tables are based on the concept of Raft groups, operations on them can get stuck for indefinite amounts of time. That may be problematic, and so we'd like to implement a way to cancel those operations at suitable times. Description of solution ----------------------- The situations we focus on are the following: * Timed-out queries * Leader changes * Tablet migrations * Table drops * Node shutdowns We handle each of them and provide validation tests. Implementation strategy ----------------------- 1. Auxiliary commits. 2. Abort operations on timeout. 3. Abort operations on tablet removal. 4. Extend `client_state`. 5. Abort operation on shutdown. 6. Help `state_machine` be aborted as soon as possible. Tests ----- We provide tests that validate the correctness of the solution. The total time spent on `test_strong_consistency.py` (measured on my local machine, dev mode): Before: ``` real 0m31.809s user 1m3.048s sys 0m21.812s ``` After: ``` real 0m34.523s user 1m10.307s sys 0m27.223s ``` The incremental differences in time can be found in the commit messages. Fixes SCYLLADB-429 Backport: not needed. This is an enhancement to an experimental feature. Closes scylladb/scylladb#28526 * github.com:scylladb/scylladb: service: strong_consistency: Abort state_machine::apply when aborting server service: strong_consistency: Abort ongoing operations when shutting down service: client_state: Extend with abort_source service: strong_consistency: Handle abort when removing Raft group service: strong_consistency: Abort Raft operations on timeout service: strong_consistency: Use timeout when mutating service: strong_consistency: Fix indentation service: strong_consistency: Enclose coordinator methods with try-catch service: strong_consistency: Crash at unexpected exception test: cluster: Extract default config & cmdline in test_strong_consistency.py	2026-04-10 11:11:21 +02:00
Pavel Emelyanov	0b336da89d	Revert "cmake: add missing rolling_max_tracker_test and symmetric_key_test" This reverts commit `8b4a91982b`. Two commits independently added rolling_max_tracker_test to test/boost/CMakeLists.txt: `8b4a919` cmake: add missing rolling_max_tracker_test and symmetric_key_test `f3a91df` test/cmake: add missing tests to boost test suite The second was merged two days after the first. They didn't conflict on code-level and applied cleanly resulting in a duplicate add_scylla_test() entries that breaks the CMake build: CMake Error: add_executable cannot create target "test_boost_rolling_max_tracker_test" because another target with the same name already exists. Remove the duplicate. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Reported-by: Łukasz Paszkowski <lukasz.paszkowski@scylladb.com>	2026-04-10 11:19:43 +03:00
Patryk Jędrzejczak	751bf31273	Merge 'More gossiper cleanups' from Gleb Natapov The PR contains more code cleanups, mostly in gossiper. Dropping more gossiper state leaving only NORMAL and SHUTDOWN. All other states are checked against topology state. Those two are left because SHUTDOWN state is propagated through gossiper only and when the node is not in SHUTDOWN it should be in some other state. No need to backport. Cleanups. Closes scylladb/scylladb#29129 * https://github.com/scylladb/scylladb: storage_service: cleanup unused code storage_service: simplify get_peer_info_for_update gossiper: send shutdown notifications in parallel gms: remove unused code virtual_tables: no need to call gossiper if we already know that the node is in shutdown gossiper: print node state from raft topology in the logs gossiper: use is_shutdown instead of code it manually gossiper: mark endpoint_state(inet_address ip) constructor as explicit gossiper: remove unused code gossiper: drop last use of LEFT state and drop the state gossiper: drop unused STATUS_BOOTSTRAPPING state gossiper: rename is_dead_state to is_left since this is all that the function checks now. gossiper: use raft topology state instead of gossiper one when checking node's state storage_service: drop check_for_endpoint_collision function storage_service: drop is_first_node function gossiper: remove unused REMOVED_TOKEN state gossiper: remove unused advertise_token_removed function	2026-04-10 09:56:20 +02:00
Nadav Har'El	6674aa29ca	Merge 'Add Cassandra SAI (StorageAttachedIndex) compatibility' from Szymon Wasik Cassandra's native vector index type is StorageAttachedIndex (SAI). Libraries such as CassIO, LangChain, and LlamaIndex generate `CREATE CUSTOM INDEX` statements using the SAI class name. Previously, ScyllaDB rejected these with "Non-supported custom class". This PR adds compatibility so that SAI-style CQL statements work on ScyllaDB without modification. 1. test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests Enables the `SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS` Cassandra system property so that `search_beam_width` tests pass against Cassandra 5.0.7. 2. test: modernize vector index test comments and fix xfail Updates test comments from "Reproduces" to "Validates fix for" for clarity, and converts the `test_ann_query_with_pk_restriction` xfail into a stripped-down CREATE INDEX syntax test (removing unused INSERT/SELECT lines). Removes the redundant `test_ann_query_with_non_pk_restriction` test. 3. cql: add Cassandra SAI (StorageAttachedIndex) compatibility Core implementation: the SAI class name is detected and translated to ScyllaDB's native `vector_index`. The fully-qualified class name (`org.apache.cassandra.index.sai.StorageAttachedIndex`) requires exact case; short names (`StorageAttachedIndex`, `sai`) are matched case-insensitively — matching Cassandra's behavior. Non-vector and multi-column SAI targets are rejected with clear errors. Adds `skip_on_scylla_vnodes` fixture, SAI compatibility docs, and the Cassandra compatibility table entry (split into "SAI general" vs "SAI for vector search"). 4. cql: accept source_model option for Cassandra SAI compatibility The `source_model` option is a Cassandra SAI property used by Cassandra libraries (e.g., CassIO) to tag vector indexes with the name of the embedding model. ScyllaDB accepts it for compatibility but does not use it — the validator is a no-op lambda. The option is preserved in index metadata and returned in DESCRIBE INDEX output. - `cql3/statements/create_index_statement.cc`: SAI class detection and rewriting logic - `index/secondary_index_manager.cc`: case-insensitive class name lookup (lowercasing restored before `classes.find()`) - `index/vector_index.cc`: `source_model` accepted as a valid option with no-op validator - `docs/cql/secondary-indexes.rst`: SAI compatibility documentation with `source_model` table row - `docs/using-scylla/cassandra-compatibility.rst`: SAI entry split into general (not supported) and vector search (supported) - `test/cqlpy/conftest.py`: `scylla_with_tablets` renamed to `skip_on_scylla_vnodes` - `test/cqlpy/test_vector_index.py`: SAI tests inlined (no constants), `check_bad_option()` helper for numeric validation, uppercase class name test, merged `source_model` tests with DESCRIBE check \| Backend \| Passed \| Skipped \| Failed \| \|--------------------\|--------\|---------\|--------\| \| ScyllaDB (dev) \| 42 \| 0 \| 0 \| \| Cassandra 5.0.7 \| 16 \| 26 \| 0 \| None: new feature. Fixes: SCYLLADB-239 Closes scylladb/scylladb#28645 * github.com:scylladb/scylladb: cql: accept source_model option and show options in DESCRIBE cql: add Cassandra SAI (StorageAttachedIndex) compatibility test: modernize vector index test comments and fix xfail test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests	2026-04-10 10:21:20 +03:00
Tomasz Grabiec	88bea5aaf3	cql: Include parallelized queries in the scylla_cql_select_partition_range_scan_no_bypass_cache metric This metric is used to catch execution of scans which go via row cache, which can have bad effect on performance. Since `f344bd0aaa`, aggreagte queries go via new statement class: parallelized_select_statement. This class inherits from select_statement directly rather than from primary_key_select_statement. The range scan detection logic (_range_scan, _range_scan_no_bypass_cache) was only in primary_key_select_statement's constructor, so parallelized queries were not counted in select_partition_range_scan and select_partition_range_scan_no_bypass_cache metrics. Fix by moving the range scan detection into select_statement's constructor, so that all subclasses get it.	2026-04-10 02:12:48 +02:00
Tomasz Grabiec	dc95d26464	test: cluster: dtest: Fix double-counting of metrics get_node_metrics() in test/cluster/dtest/tools/metrics.py used re.search(metric_name, metric) to match Prometheus metric lines. The metric name select_partition_range_scan is a substring of select_partition_range_scan_no_bypass_cache. So when querying for select_partition_range_scan, the regex matched both Prometheus lines: scylla_cql_select_partition_range_scan{shard="0",...} 1 scylla_cql_select_partition_range_scan_no_bypass_cache{shard="0",...} 1 And because the code does metrics_res[metric_name] += val, it summed both values, making it look like the counter was incremented by 2 when it was actually incremented by 1. The fix appends r"[\s{]" to the regex so the metric name must be followed by { (labels) or whitespace (value), preventing substring matches.	2026-04-10 02:12:48 +02:00
Avi Kivity	f67d0739d0	test: user_function_test: adjust Lua error message tests Lua 5.5 changed the error message slightly ("?:-1" -> "?:?"). Relax the error message tests to avoid this unimportant fragment. Closes scylladb/scylladb#29414	2026-04-10 01:09:35 +03:00
Piotr Szymaniak	98d6edaa88	alternator: add comment explaining delta_mode::keys in add_stream_options() Clarify that cdc::delta_mode is ignored by Alternator, so we use the least expensive mode (keys) to reduce overhead. Fixes scylladb/scylladb#24812 Closes scylladb/scylladb#29408	2026-04-10 01:07:21 +03:00
Michał Hudobski	c8b9fde828	auth: allow VECTOR_SEARCH_INDEXING permission to access system.tablets Add system.tablets to the set of system resources that can be accessed with the VECTOR_SEARCH_INDEXING permission. Fixes: VECTOR-605 Closes scylladb/scylladb#29397	2026-04-09 21:53:07 +03:00
Pavel Emelyanov	5ffd3ccc8e	test_backup: Remove create_ks_and_cf helper Test Remove the create_ks_and_cf() helper function and its now-unused import of format_tuples(). All callers have been converted to use the new async patterns with new_test_keyspace() and cql.run_async(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 20:40:13 +03:00
Pavel Emelyanov	2c81e54d6d	test_backup: Replace create_ks_and_cf with async patterns Test Replace all 6 calls to create_ks_and_cf() with new async patterns: - Use new_test_keyspace() context manager for keyspace creation - Use cql.run_async() for CREATE TABLE statement - Use asyncio.gather() with cql.run_async() for data insertion The test_restore_with_non_existing_sstable only needs the ks:table structure to exist; it doesn't use the pre-populated data. This change makes the code more explicit and maintains proper async semantics throughout. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 20:39:56 +03:00
Pavel Emelyanov	66d9f6e042	test_backup: Add if-True blocks for indentation Test Add if-True blocks to wrap code that uses create_ks_and_cf() in all 6 test functions. This is a mechanical change to set up the next step where the helper will be replaced with new async patterns. All code after the create_ks_and_cf() call until the end of each test is now indented under the if-True block. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-09 20:35:21 +03:00
Szymon Wasik	573def7cd8	cql: accept source_model option and show options in DESCRIBE Accept the Cassandra SAI 'source_model' option for vector indexes. This option is used by Cassandra libraries (e.g., CassIO, LangChain) to tag vector indexes with the name of the embedding model that produced the vectors. ScyllaDB does not use the source_model value but stores it and includes it in the DESCRIBE INDEX output for Cassandra compatibility. Additionally, extend vector_index::describe() to emit a WITH OPTIONS = {...} clause containing all user-provided index options (filtering out system keys: target, class_name, index_version). This makes options like similarity_function, source_model, etc. visible in DESCRIBE output.	2026-04-09 17:20:03 +02:00
Szymon Wasik	80a2e4a0ab	cql: add Cassandra SAI (StorageAttachedIndex) compatibility Libraries such as CassIO, LangChain, and LlamaIndex create vector indexes using Cassandra's StorageAttachedIndex (SAI) class name. This commit lets ScyllaDB accept these statements without modification. When a CREATE CUSTOM INDEX statement specifies an SAI class name on a vector column, ScyllaDB automatically rewrites it to the native vector_index implementation. Accepted class names (case-insensitive): - org.apache.cassandra.index.sai.StorageAttachedIndex - StorageAttachedIndex - sai SAI on non-vector columns is rejected with a clear error directing users to a secondary index instead. The SAI detection and rewriting logic is extracted into a dedicated static function (maybe_rewrite_sai_to_vector_index) to keep the already-long validate_while_executing method manageable. Multi-column (local index) targets and nonexistent columns are skipped with continue — the former are treated as filtering columns by vector_index::check_target(), and the latter are caught later by vector_index::validate(). Tests that exercise features common to both backends (basic creation, similarity_function, IF NOT EXISTS, bad options, etc.) now use the SAI class name with the skip_on_scylla_vnodes fixture so they run against both ScyllaDB and Cassandra. ScyllaDB-specific tests continue to use USING 'vector_index' with scylla_only.	2026-04-09 17:20:03 +02:00
Szymon Wasik	fa7edc627c	test: modernize vector index test comments and fix xfail - Change 'Reproduces' to 'Validates fix for' in test comments to reflect that the referenced issues are already fixed. - Condense the VECTOR-179 comment to two lines. - Replace the xfailed test_ann_query_with_restriction_works_only_on_pk with a focused test (test_ann_query_with_pk_restriction) that creates a vector index on a table with a PK column restriction, validating the VECTOR-374 fix.	2026-04-09 17:20:02 +02:00
Szymon Wasik	4eab050be4	test: enable SAI_VECTOR_ALLOW_CUSTOM_PARAMETERS for Cassandra tests	2026-04-09 17:20:02 +02:00
Andrzej Jackowski	23c386a27f	test: perf: add audit-unix-socket-path to perf-simple-query To allow performance benchmarking with custom syslog sinks. Example use case: -- Audit + default syslog: ~100k tps taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" ``` 110263.72 tps ( 66.1 allocs/op, 16.0 logallocs/op, 25.7 tasks/op, 254900 insns/op, 144796 cycles/op, 0 errors) throughput: mean= 107137.48 standard-deviation=3142.98 median= 106665.00 median-absolute-deviation=1786.03 maximum=111435.19 minimum=97620.79 instructions_per_op: mean= 256311.36 standard-deviation=5037.13 median= 256288.09 median-absolute-deviation=2223.08 maximum=274220.89 minimum=248141.40 cpu_cycles_per_op: mean= 146443.47 standard-deviation=2844.19 median= 146001.85 median-absolute-deviation=1514.82 maximum=157177.54 minimum=142981.03 ``` -- Audit + custom syslog: ~400k tps socat -u UNIX-RECV:/tmp/audit-null.sock,type=2 OPEN:/dev/null taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 --audit "syslog" --audit-keyspace "ks" --audit-categories "DCL,DDL,AUTH,DML,QUERY" --audit-unix-socket-path /tmp/audit-null.sock ``` 404929.62 tps ( 65.9 allocs/op, 16.0 logallocs/op, 25.5 tasks/op, 77406 insns/op, 35559 cycles/op, 0 errors) throughput: mean= 399868.39 standard-deviation=6232.88 median= 401770.65 median-absolute-deviation=3859.09 maximum=406126.79 minimum=383434.84 instructions_per_op: mean= 77481.26 standard-deviation=168.31 median= 77405.54 median-absolute-deviation=84.33 maximum=78081.46 minimum=77332.84 cpu_cycles_per_op: mean= 35871.32 standard-deviation=516.83 median= 35699.70 median-absolute-deviation=251.15 maximum=37454.86 minimum=35432.60 ``` -- No audit: ~800k tps taskset -c 0,2,4 ./build/release/scylla perf-simple-query --smp 3 --write --duration 30 ``` 808970.95 tps ( 53.3 allocs/op, 16.0 logallocs/op, 14.9 tasks/op, 49904 insns/op, 20471 cycles/op, 0 errors) throughput: mean= 809065.31 standard-deviation=6222.39 median= 810507.10 median-absolute-deviation=1827.99 maximum=815213.41 minimum=782104.84 instructions_per_op: mean= 49905.50 standard-deviation=21.81 median= 49900.12 median-absolute-deviation=7.72 maximum=50010.97 minimum=49892.57 cpu_cycles_per_op: mean= 20429.00 standard-deviation=41.40 median= 20425.18 median-absolute-deviation=29.11 maximum=20530.74 minimum=20355.42 ``` Closes scylladb/scylladb#29396	2026-04-09 16:00:41 +03:00
Anna Stuchlik	c6587c6a70	doc: Fix malformed markdown link in alternator network docs Fixes https://github.com/scylladb/scylladb/issues/29400 Closes scylladb/scylladb#29402	2026-04-09 15:54:43 +03:00
Botond Dénes	5886d1841a	Merge 'cmake: align CMake build system with configure.py and add comparison script' from Ernest Zaslavsky Every time someone modifies the build system — adding a source file, changing a compilation flag, or wiring a new test — the change tends to land in only one of our two build systems (configure.py or CMake). Over time this causes three classes of problems: 1. CMake stops compiling entirely. Missing defines, wrong sanitizer flags, or misplaced subdirectory ordering cause hard build failures that are only discovered when someone tries to use CMake (e.g. for IDE integration). 2. Missing build targets. Tests or binaries present in configure.py are never added to CMake, so `cmake --build` silently skips them. This PR fixes several such cases (e.g. `symmetric_key_test`, `auth_cache_test`, `sstable_tablet_streaming`). 3. Missing compilation units in targets. A `.cc` file is added to a test binary in one system but not the other, causing link errors or silently omitted test coverage. To fix the existing drift and prevent future divergence, this series: Adds a build-system comparison script (`scripts/compare_build_systems.py`) that configures both systems into a temporary directory, parses their generated `build.ninja` files, and compares per-file compilation flags, link target sets, and per-target libraries. configure.py is treated as the baseline; CMake must match it. The script supports a `--ci` mode suitable for gating PRs that touch build files. Fixes all current mismatches found by the script: - Mode flag alignment in `mode.common.cmake` and `mode.Coverage.cmake` (sanitizer flags, `-fno-lto`, stack-usage warnings, coverage defines). - Global define alignment (`SEASTAR_NO_EXCEPTION_HACK`, `XXH_PRIVATE_API`, `BOOST_ALL_DYN_LINK`, `SEASTAR_TESTING_MAIN` placement). - Seastar build configuration (shared vs static per mode, coverage sanitizer link options). - Abseil sanitizer flags (`-fno-sanitize=vptr`). - Missing test targets in `test/boost/CMakeLists.txt`. - Redundant per-test flags now covered by global settings. - Lua library resolution via a custom `cmake/FindLua.cmake` using pkg-config, matching configure.py's approach. Adds documentation (`docs/dev/compare-build-systems.md`) describing how to run the script and interpret its output. No backport needed — this is build infrastructure improvement only. Closes scylladb/scylladb#29273 * github.com:scylladb/scylladb: scripts: remove lua library rename workaround from comparison script cmake: add custom FindLua using pkg-config to match configure.py test/cmake: add missing tests to boost test suite test/cmake: remove per-test LTO disable cmake: add BOOST_ALL_DYN_LINK and strip per-component defines cmake: move SEASTAR_TESTING_MAIN after seastar and abseil subdirs cmake: add -fno-sanitize=vptr for abseil sanitizer flags cmake: align Seastar build configuration with configure.py cmake: align global compile defines and options with configure.py cmake: fix Coverage mode in mode.Coverage.cmake cmake: align mode.common.cmake flags with configure.py configure.py: add sstable_tablet_streaming to combined_tests docs: add compare-build-systems.md scripts: add compare_build_systems.py to compare ninja build files	2026-04-09 15:46:09 +03:00
Yaniv Michael Kaul	13879b023f	tracing: set_skip_when_empty() for error-path metrics Add .set_skip_when_empty() to all error-path metrics in the tracing module. Tracing itself is not a commonly used feature, making all of these metrics almost always zero: Tier 1 (very rare - corruption/schema issues): - tracing_keyspace_helper::bad_column_family_errors: tracing schema missing or incompatible, should never happen post-bootstrap - tracing::trace_errors: internal error building trace parameters Tier 2 (overload - tracing backend saturated): - tracing::dropped_sessions: too many pending sessions - tracing::dropped_records: too many pending records Tier 3 (general tracing write errors): - tracing_keyspace_helper::tracing_errors: errors during writes to system_traces keyspace Since tracing is an opt-in feature that most deployments rarely use, all five metrics are almost always zero and create unnecessary reporting overhead. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29346	2026-04-09 14:28:16 +03:00
Michael Litvak	3964040008	docs/dev: add counters doc Add a documentation of the counters feature implementation in docs/dev/counters.md. The documentation is taken from the wiki and updated according to the current state of the code - legacy details are removed, and a section about the counter id is added.	2026-04-09 13:08:02 +02:00
Michael Litvak	b71762d5da	counters: reuse counter IDs by rack For counter updates, use a counter ID that is constructed from the node's rack instead of the node's host ID. A rack can have at most two active tablet replicas at a time: a single normal tablet replica, and during tablet migration there are two active replicas, the normal and pending replica. Therefore we can have two unique counter IDs per rack that are reused by all replicas in the rack. We construct the counter ID from the rack UUID, which is constructed from the name "dc:rack". The pending replica uses a deterministic variation of the rack's counter ID by negating it. This improves the performance and size of counter cells by having less unique counter IDs and less counter shards in a counter cell. Previously the number of counter shards was the number of different host_id's that updated the counter, which can be typically the number of nodes in the cluster and continue growing indefinitely when nodes are replaced. with the rack-based counter id the number of counter shards will be at most twice the number of different racks (including removed racks, which should not be significant). Fixes SCYLLADB-356	2026-04-09 13:08:02 +02:00
Yaniv Michael Kaul	2c0076d3ef	replica: set_skip_when_empty() for rare error-path metrics Add .set_skip_when_empty() to four metrics in replica/database.cc that are only incremented on very rare error paths and are almost always zero: - database::dropped_view_updates: view updates dropped due to overload. NOTE: this metric appears to never be incremented in the current codebase and may be a candidate for removal. - database::multishard_query_failed_reader_stops: documented as a 'hard badness counter' that should always be zero. NOTE: no increment site was found in the current codebase; may be a candidate for removal. - database::multishard_query_failed_reader_saves: documented as a 'hard badness counter' that should always be zero. - database::total_writes_rejected_due_to_out_of_space_prevention: only fires when disk utilization is critical and user table writes are disabled, a very rare operational state. These metrics create unnecessary reporting overhead when they are perpetually zero. set_skip_when_empty() suppresses them from metrics output until they become non-zero. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29345	2026-04-09 14:07:28 +03:00
Botond Dénes	86417d49de	Merge 'transport: improve memory accounting for big responses and slow network' from Marcin Maliszkiewicz After obtaining the CQL response, check if its actual size exceeds the initially acquired memory permit. If so, acquire additional semaphore units and adopt them into the permit, ensuring accurate memory accounting for large responses. Additionally, move the permit into a .then() continuation so that the semaphore units are kept alive until write_message finishes, preventing premature release of memory permit. This is especially important with slow networks and big responses when buffers can accumulate and deplete a node's memory. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1306 Related https://scylladb.atlassian.net/browse/SCYLLADB-740 Backport: all supported versions Closes scylladb/scylladb#29288 * github.com:scylladb/scylladb: transport: add per-service-level pending response memory metric transport: hold memory permit until response write completes transport: account for response size exceeding initial memory estimate	2026-04-09 13:36:31 +03:00
Yaniv Michael Kaul	5c8b4a003e	db: set_skip_when_empty() for rare error-path metrics Add .set_skip_when_empty() to four metrics in the db module that are only incremented on very rare error paths and are almost always zero: - cache::pinned_dirty_memory_overload: described as 'should sit constantly at 0, nonzero is indicative of a bug' - corrupt_data::entries_reported: only fires on actual data corruption - hints::corrupted_files: only fires on on-disk hint file corruption - rate_limiter::failed_allocations: only fires when the rate limiter hash table is completely full and gives up allocating, requiring extreme cardinality pressure These metrics create unnecessary reporting overhead when they are perpetually zero. set_skip_when_empty() suppresses them from metrics output until they become non-zero. AI-Assisted: yes Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29344	2026-04-09 13:32:09 +03:00
Gleb Natapov	dbaba7ab8a	storage_service: cleanup unused code Remove unused definition and double includes.	2026-04-09 13:31:41 +03:00
Gleb Natapov	b050b593b3	storage_service: simplify get_peer_info_for_update It does nothing for fields managed in raft, so drop their processing.	2026-04-09 13:31:41 +03:00
Gleb Natapov	d0576c109f	gossiper: send shutdown notifications in parallel	2026-04-09 13:31:40 +03:00
Gleb Natapov	1586fa65af	gms: remove unused code Also moved version_string(...) and make_token_string(...) to private: — they are internal helpers used only by normal(), not part of the public API	2026-04-09 13:31:40 +03:00
Gleb Natapov	b2e35c538f	virtual_tables: no need to call gossiper if we already know that the node is in shutdown	2026-04-09 13:31:40 +03:00
Gleb Natapov	e17fc180a0	gossiper: print node state from raft topology in the logs Raft topology has real node's state now. gossiper sate are now set to NORMAL and SHUTDOWN only.	2026-04-09 13:31:40 +03:00
Gleb Natapov	8439154851	gossiper: use is_shutdown instead of code it manually	2026-04-09 13:31:39 +03:00
Gleb Natapov	7d700d0377	gossiper: mark endpoint_state(inet_address ip) constructor as explicit get_live_members function called is_shutdown which inet_address argument, which caused temporary endpoint_state to be created. Fix it by prohibiting implicit conversion and calling the correct is_shutdown function instead.	2026-04-09 13:31:39 +03:00
Gleb Natapov	6df4f572d5	gossiper: remove unused code	2026-04-09 13:31:39 +03:00
Gleb Natapov	67102496c8	gossiper: drop last use of LEFT state and drop the state The decommission sets left gossiper state only to prevent shutdown notification be issued by the node during shutdown. Since the notification code now checks the state in raft topology this is no longer needed.	2026-04-09 13:31:39 +03:00
Gleb Natapov	54d2c95094	gossiper: drop unused STATUS_BOOTSTRAPPING state	2026-04-09 13:31:38 +03:00
Gleb Natapov	7c895ced19	gossiper: rename is_dead_state to is_left since this is all that the function checks now.	2026-04-09 13:31:38 +03:00
Gleb Natapov	7dfb0577b8	gossiper: use raft topology state instead of gossiper one when checking node's state Raft topology state is a truth source for the nodes state, so use it instead of a gossiper one.	2026-04-09 13:31:38 +03:00
Gleb Natapov	c17c4806a1	storage_service: drop check_for_endpoint_collision function All the checks that it does are also done by join coordinator and the join coordinator uses more reliable raft state instead of gossiper one.	2026-04-09 13:31:37 +03:00
Gleb Natapov	1ac8edb22b	storage_service: drop is_first_node function It make no sense now since the first node to bootstrap is determined by discover_group0 algorithm.	2026-04-09 13:31:37 +03:00
Gleb Natapov	681aa9ebe1	gossiper: remove unused REMOVED_TOKEN state	2026-04-09 13:31:37 +03:00
Gleb Natapov	5af17aa578	gossiper: remove unused advertise_token_removed function	2026-04-09 13:31:36 +03:00
Dawid Mędrek	f0dfe29d88	service: strong_consistency: Abort state_machine::apply when aborting server The state machine used by strongly consistent tablets may block on a read barrier if the local schema is insufficient to resolve pending mutations [1]. To deal with that, we perform a read barrier that may block for a long time. When a strongly consistent tablet is being removed, we'd like to cancel all ongoing executions of `state_machine::apply`: the shard is no longer responsible for the tablet, so it doesn't matter what the outcome is. --- In the implementation, we abort the operations by simply throwing an exception from `state_machine::apply` and not doing anything. That's a red flag considering that it may lead to the instance being killed on the spot [2]. Fortunately for us, strongly consistent tables use the default Raft server implementation, i.e. `raft::server_impl`, which actually handles one type of an exception thrown by the method: namely, `abort_requested_exception`, which is the default exception thrown by `seastar::abort_source` [3]. We leverage this property. --- Unfortunately, `raft::server_impl::abort` isn't perfectly suited for us. If we look into its code, we'll see that the relevant portion of the procedure boils down to three steps: 1. Prevent scheduling adding new entries. 2. Wait for the applier fiber. 3. Abort the state machine. Since aborting the state machine happens only after the applier fiber has already finished, there will no longer be anything to abort. Either all executions of `state_machine::apply` have already finished, or they are hanging and we cannot do anything. That's a pre-existing problem that we won't be solving here (even though it's possible). We hope the problem will be solved, and it seems likely: the code suggests that the behavior is not intended. For more details, see e.g. [4]. --- We provide two validation tests. They simulate the abortion of `state_machine::apply` in two different scenarios: * when the table is dropped (which should also cover the case of tablet migration), * when the node is shutting down. The value of the tests isn't high since they don't ensure that the state of the group is still valid (though it should be), nor do they perform any other check. Instead, we rely on the testing framework to spot any anomalies or errors. That's probably the best we can do at the moment. Unfortunately, both tests are marked as skipped becuause of the current limitations of `raft::server_impl::abort` described above and in [4]. References: [1] `4c8dba1` [2] See the description of `raft::state_machine` in `raft/raft.hh`. [3] See `server_impl::applier_fiber` in `raft/server.cc`. [4] SCYLLADB-1056	2026-04-09 11:36:51 +02:00
Dawid Mędrek	ad8a263683	service: strong_consistency: Abort ongoing operations when shutting down These changes are complementary to those from a recent commit where we handled aborting ongoing operations during tablet events, such as tablet migration. In this commit, we consider the case of shutting down a node. When a node is shutting down, we eventually close the connections. When the client can no longer get a response from the server, it makes no sense to continue with the queries. We'd like to cancel them at that point. We leverage the abort source passed down via `client_state` down to the strongly consistent coordinator. This way, the transport layer can communicate with it and signal that the queries should be canceled. The abort source is triggered by the CQL server (cf. `generic_server::server::{stop,shutdown}`). --- Note that this is not an optional change. In fact, if we don't abort those requests, we might hang for an indefinite amount of time when executing the following code in `main.cc`: ``` // Register at_exit last, so that storage_service::drain_on_shutdown will be called first auto do_drain = defer_verbose_shutdown("local storage", [&ss] { ss.local().drain_on_shutdown().get(); }); ``` The problem boils down to the fact that `generic_server::server::stop` will wait for all connections to be closed, but that won't happen until all ongoing operations (at least those to strongly consistent tables) are finished. It's important to highlight that even though we hang on this, the client can no longer get any response. Thus, it's crucial that at that point we simply abort ongoing operations to proceed with the rest of shutdown. --- Two tests are added to verify that the implementation is correct: one focusing on local operations, the other -- on a forwarded write. Difference in time spent on the whole test file `test_strong_consistency.py` on my local machine, in dev mode: Before: ``` real 0m31.775s user 1m4.475s sys 0m22.615s ``` After: ``` real 0m32.024s user 1m10.751s sys 0m23.871s ``` Individual runs of the added tests: test_queries_when_shutting_down: ``` real 0m12.818s user 0m36.726s sys 0m4.577s ``` test_abort_forwarded_write_upon_shutdown: ``` real 0m12.930s user 0m36.622s sys 0m4.752s ```	2026-04-09 11:36:17 +02:00
Dawid Mędrek	4a87bdc778	service: client_state: Extend with abort_source We make `client_state` store a pointer to an `abort_source`. This will be useful in the following commit that will implement aborting ongoing requests to strongly consistent tables upon connection shutdowns. It might also be useful in some other places in the code in the future. We set the abort source for client states in relevant places.	2026-04-09 11:35:35 +02:00
Dawid Mędrek	89c049b889	service: strong_consistency: Handle abort when removing Raft group When a strongly consistent Raft group is being removed, it means one of the following cases: (A) The node is shutting down and it's simply part of the the shutdown procedure. (B) The tablet is somehow leaving the replica. For example, due to: - Tablet migration - Tablet split/merge - Tablet removal (e.g. because the table is dropped) In this commit, we focus on case (A). Case (B) will be handled in the following one. --- The changes in the code are literally none, and there's a reason to it. First, let's note that we've already implemented abortion of timed-out requests. There is a limit to how long a query can run and sooner or later it will finish, regardless of what we do. Second, we need to ask ourselves if the cases we're considering in this commit (i.e. case (B)) is a situation where we'd like to speed up the process. The answer is no. Tablet migrations are effectively internal operations that are invisible to the users. User requests are, quite obviously, the opposite of that. Because of that, we want to patiently wait for the queries to finish or time out, even though it's technically possible to lead to an abort earlier. Lastly, the changes in the code that actually appear in this commit are not completely irrelevant either. We consider the important case of the `leader_info_updater` fiber and argue that it's safe to not pass any abort source to the Raft methods used by it. --- Unfortunately, we don't have tablet migrations implemented yet [1], so our testing capabilities are limited. Still, we provide a new test that corresponds to case (B) described above. We simulate a tablet migration by dropping a table and observe how reads and writes behave in such a situation. There's no extremely careful validation involved there, but that's what we can have for the time being. Difference in time spent on the whole test file `test_strong_consistency.py` on my local machine, in dev mode: Before: ``` real 0m30.841s user 1m3.294s sys 0m21.091s ``` After: ``` real 0m31.775s user 1m4.475s sys 0m22.615s ``` The time spent on the new test only: ``` real 0m5.264s user 0m34.646s sys 0m3.374s ``` References: [1] SCYLLADB-868	2026-04-09 11:35:31 +02:00
Dawid Mędrek	7dcc3e85b9	service: strong_consistency: Abort Raft operations on timeout If a query, either a write, or a read to a strongly consistent table, times out, we immediately abort the operation and throw an exception. Unfortunately, due to the inconsistency in exception types thrown on timeout by the many methods we use in the code, it results in pretty messy `try-catch` clauses. Perhaps there's a better alternative to this, but it's beyond the scope of this work, so we leave it as-is. We provide a validation test that consists of three cases corresponding to reads, writes, and waiting for the leader. They verify that the code works as expected in all affected places. A comparison of time spent on the whole `test_strong_consistency.py` on my local machine, in dev mode: Before: ``` real 0m32.185s user 0m55.391s sys 0m15.745s ``` After: ``` real 0m30.841s user 1m3.294s sys 0m21.091s ``` The time spent on the new test only: ``` real 0m7.077s user 0m35.359s sys 0m3.717s ```	2026-04-09 11:35:04 +02:00
Piotr Szymaniak	65a1bdd368	docs: document Alternator auditing in the operator-facing auditing guide - Document Alternator (DynamoDB-compatible API) auditing support in the operator-facing auditing guide (docs/operating-scylla/security/auditing.rst) - Cover operation-to-category mapping, operation field format, keyspace/table filtering, and audit log examples - Document the audit_tables=alternator.<table> shorthand format - Minor wording improvements throughout (Scylla -> ScyllaDB, clarify default audit backend) Closes scylladb/scylladb#29231	2026-04-09 12:26:57 +03:00
Dawid Mędrek	2243e0ffea	service: strong_consistency: Use timeout when mutating We remove the inconsistency between reads and writes to strongly consistent tables. Before the commit, only reads used a timeout. Now, writes do as well. Although the parameter isn't used yet, that will change in the following commit. This is a prerequisite for it.	2026-04-09 11:25:57 +02:00
Dawid Mędrek	fd9c907be1	service: strong_consistency: Fix indentation	2026-04-09 11:25:57 +02:00
Dawid Mędrek	ca7f24516e	service: strong_consistency: Enclose coordinator methods with try-catch We enclose `coordinator::{mutate,query}` with `try-catch` clauses. They do nothing at the moment, but we'll use them later. We do this now to avoid noise in the upcoming commits. We'll fix the indentation in the following commit.	2026-04-09 11:25:57 +02:00
Dawid Mędrek	e9ea9e7259	service: strong_consistency: Crash at unexpected exception The loop shouldn't throw any other exception than the ones already covered by the `catch` claues. Crash, at least when `abort_on_internal_error` is set, if we catch any other type since that may be a sign of a bug.	2026-04-09 11:25:57 +02:00
Dawid Mędrek	f499a629ab	test: cluster: Extract default config & cmdline in test_strong_consistency.py All used configs and cmdlines share the same values. Let's extract them to avoid repeating them every time a new test is written. Those options should be enabled for all tests in the file anyway.	2026-04-09 11:25:57 +02:00
Geoff Montee	7d7ec7025e	docs: Document system keyspaces for developers / internal usage Fixes #29043 with the following docs changes: - docs/dev/system-keyspaces.md: Added a new file that documents all keyspaces created internally Closes scylladb/scylladb#29044	2026-04-09 11:49:58 +03:00
Guy Shtub	40a861016a	docs/faq.rst: Fixing small spelling mistake Closes scylladb/scylladb#29131	2026-04-09 11:48:46 +03:00
Pavel Emelyanov	78f5bab7cf	table: Add formatter for group_id argument in tablet merge exception message Fixes: SCYLLADB-1432 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29143	2026-04-09 11:45:57 +03:00
Botond Dénes	fbbe2bdce8	Merge 'Introduce repair_service::config and cut dependency from db::config' from Pavel Emelyanov Spreading db::config around and making all services depend on it is not nice. Most other service that need configuration provide their own config that's populated from db::config in main.cc/cql_test_env.cc and use it, not the global config. This PR does the same for repair_service. Enhancing components dependencies, not backporting Closes scylladb/scylladb#29153 * github.com:scylladb/scylladb: repair: Remove db/config.hh from repair/*.cc files repair: Move repair_multishard_reader options onto repair_service::config repair: Move critical_disk_utilization_level onto repair_service::config repair: Move repair_partition_count_estimation_ratio onto repair_service::config repair: Move repair_hints_batchlog_flush_cache_time_in_ms onto repair_service::config repair: Move enable_small_table_optimization_for_rbno onto repair_service::config repair: Introduce repair_service::config	2026-04-09 11:44:25 +03:00
Botond Dénes	76c8794f4f	Merge 'Strong consistency: allow taking snapshots (but not transfer) and make them less likely' from Piotr Dulikowski While working on benchmarks for strong consistency we noticed that the raft logic attempted to take snapshots during the benchmark. Snapshot transfer is not implemented for strong consistency yet and the methods that take or transfer snapshots throw exceptions. This causes the raft groups to stop working completely. While implementing snapshot transfers is out of scope, we can implement some mitigations now to stop the tests from breaking: - The first commit adjusts the configuration options. First, it disables periodic snapshotting (i.e. creating a snapshot every X log entries). Second, it increases the memory threshold for the raft log before which a snapshot is created from 2MB to 10MB. - The second commit relaxes the take snapshot / drop snapshot methods and makes it possible to actually use them - they are no-ops. It is still forbidden to transfer snapshots. I am including both commits because applying only the first one didn't completely prevent the issue from occurring when testing locally. Refs: SCYLLADB-1115 Strong consistency is experimental, no need for backport. Closes scylladb/scylladb#29189 * github.com:scylladb/scylladb: strong_consistency: fake taking and dropping snapshots strong_consistency: adjust limits for snapshots	2026-04-09 11:44:03 +03:00
Anna Stuchlik	dd34d2afb4	doc: remove references to old versions from Docker Hub docs This commit removes references ScyllaDB versions ("Since x.y") from the ScyllaDB documentation on Docker Hub, as they are redundant and confusing (some versions are super ancient). Fixes SCYLLADB-1212 Closes scylladb/scylladb#29204	2026-04-09 11:43:40 +03:00
Botond Dénes	c162277b28	Merge 'Perform full connection set-up for CertificateAuthorization in process_startup()' from Pavel Emelyanov The code responds ealry with READY message, but lack some necessary set up, namely: * update_scheduling_group(): without it, the connection runs under the default scheduling group instead of the one mapped to the user's service level. * on_connection_ready(): without it, the connection never releases its slot in the uninitialized-connections concurrency semaphore (acquired at connection creation), leaking one unit per cert-authenticated connection for the lifetime of the connection. * _authenticating = false / _ready = true: without them, system.clients reports connection_stage = AUTHENTICATING forever instead of READY (not critical, but not nice either) The PR fixes it and adds a regression test, that (for sanity) also covers AllowAll and Password authrticators Fixes SCYLLADB-1226 Present since 2025.1, probably worth backporting Closes scylladb/scylladb#29220 * github.com:scylladb/scylladb: transport: fix process_startup cert-auth path missing connection-ready setup transport: test that connection_stage is READY after auth via all process_startup paths	2026-04-09 11:43:02 +03:00
Raphael S. Carvalho	16e387d5f9	repair/replica: Fix race window where post-repair data is wrongly promoted to repaired During incremental repair, each tablet replica holds three SSTable views: UNREPAIRED, REPAIRING, and REPAIRED. The repair lifecycle is: 1. Replicas snapshot unrepaired SSTables and mark them REPAIRING. 2. Row-level repair streams missing rows between replicas. 3. mark_sstable_as_repaired() runs on all replicas, rewriting the SSTables with repaired_at = sstables_repaired_at + 1 (e.g. N+1). 4. The coordinator atomically commits sstables_repaired_at=N+1 and the end_repair stage to Raft, then broadcasts repair_update_compaction_ctrl which calls clear_being_repaired(). The bug lives in the window between steps 3 and 4. After step 3, each replica has on-disk SSTables with repaired_at=N+1, but sstables_repaired_at in Raft is still N. The classifier therefore sees: is_repaired(N, sst{repaired_at=N+1}) == false sst->being_repaired == null (lost on restart, or not yet set) and puts them in the UNREPAIRED view. If a new write arrives and is flushed (repaired_at=0), STCS minor compaction can fire immediately and merge the two SSTables. The output gets repaired_at = max(N+1, 0) = N+1 because compaction preserves the maximum repaired_at of its inputs. Once step 4 commits sstables_repaired_at=N+1, the compacted output is classified REPAIRED on the affected replica even though it contains data that was never part of the repair scan. Other replicas, which did not experience this compaction, classify the same rows as UNREPAIRED. This divergence is never healed by future repairs because the repaired set is considered authoritative. The result is data resurrection: deleted rows can reappear after the next compaction that merges unrepaired data with the wrongly-promoted repaired SSTable. The fix has two layers: Layer 1 (in-memory, fast path): mark_sstable_as_repaired() now also calls mark_as_being_repaired(session) on the new SSTables it writes. This keeps them in the REPAIRING view from the moment they are created until repair_update_compaction_ctrl clears the flag after step 4, covering the race window in the normal (no-restart) case. Layer 2 (durable, restart-safe): a new is_being_repaired() helper on tablet_storage_group_manager detects the race window even after a node restart, when being_repaired has been lost from memory. It checks: sst.repaired_at == sstables_repaired_at + 1 AND tablet transition kind == tablet_transition_kind::repair Both conditions survive restarts: repaired_at is on-disk in SSTable metadata, and the tablet transition is persisted in Raft. Once the coordinator commits sstables_repaired_at=N+1 (step 4), is_repaired() returns true and the SSTable naturally moves to the REPAIRED view. The classifier in make_repair_sstable_classifier_func() is updated to call is_being_repaired(sst, sstables_repaired_at) in place of the previous sst->being_repaired.uuid().is_null() check. A new test, test_incremental_repair_race_window_promotes_unrepaired_data, reproduces the bug by: - Running repair round 1 to establish sstables_repaired_at=1. - Injecting delay_end_repair_update to hold the race window open. - Running repair round 2 so all replicas complete mark_sstable_as_repaired (repaired_at=2) but the coordinator has not yet committed step 4. - Writing post-repair keys to all replicas and flushing servers[1] to create an SSTable with repaired_at=0 on disk. - Restarting servers[1] so being_repaired is lost from memory. - Waiting for autocompaction to merge the two SSTables on servers[1]. - Asserting that the merged SSTable contains post-repair keys (the bug) and that servers[0] and servers[2] do not see those keys as repaired. NOTE FOR MAINTAINER: Copilot initially only implemented Layer 1 (the in-memory being_repaired guard), missing the restart scenario entirely. I pointed out that being_repaired is lost on restart and guided Copilot to add the durable Layer 2 check. I also polished the implementation: moving is_being_repaired into tablet_storage_group_manager so it can reuse the already-held _tablet_map (avoiding an ERM lookup and try/catch), passing sstables_repaired_at in from the classifier to avoid re-reading it, and using compaction_group_for_sstable inside the function rather than threading a tablet_id parameter through the classifier. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1239. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29244	2026-04-09 11:42:28 +03:00
Dawid Mędrek	a8bc90a375	Merge 'cql3: fix DESCRIBE INDEX WITH INTERNALS name' from Piotr Smaron This series fixes two related inconsistencies around secondary-index names. 1. `DESCRIBE INDEX ... WITH INTERNALS` returned the backing materialized-view name in the `name` column instead of the logical index name. 2. The snapshot REST API accepted backing table names for MV-backed secondary indexes, but not the logical index names exposed to users. The snapshot side now resolves logical secondary-index names to backing table names where applicable, reports logical index names in snapshot details, rejects vector index names with HTTP 400, and keeps multi-keyspace DELETE atomic by resolving all keyspaces before deleting anything. The tests were also extended accordingly, and the snapshot test helper was fixed to clean up multi-table snapshots using one DELETE per table. Fixes: SCYLLADB-1122 Minor bugfix, no need to backport. Closes scylladb/scylladb#29083 * github.com:scylladb/scylladb: cql3: fix DESCRIBE INDEX WITH INTERNALS name test: add snapshot REST API tests for logical index names test: fix snapshot cleanup helper api: clarify snapshot REST parameter descriptions api: surface no_such_column_family as HTTP 400 db: fix clear_snapshot() atomicity and use C++23 lambda form db: normalize index names in get_snapshot_details() db: add resolve_table_name() to snapshot_ctl	2026-04-09 08:37:51 +03:00
Piotr Dulikowski	ec0231c36c	Merge 'db/view/view_building_worker: lock staging sstables mutex for all necessary shards when creating tasks' from Michał Jadwiszczak To create `process_staging` view building tasks, we firstly need to collect informations about them on shard0, create necessary mutations, commit them to group0 and move staging sstables objects to their original shards. But there is a possible race after committing the group0 command and before moving the staging sstables to their shards. Between those two events, the coordinator may schedule freshly created tasks and dispatch them to the worker but the worker won't have the sstables objects because they weren't moved yet. This patch fixes the race by holding `_staging_sstables_mutex` locks from all necessary shards when executing `create_staging_sstable_tasks()`. With this, even if the task will be scheduled and dispatched quickly, the worker will wait with executing it until the sstables objects are moved and the locks are released. Fixes SCYLLADB-816 This PR should be backported to all versions containing view building coordinator (2025.4 and newer). Closes scylladb/scylladb#29174 * github.com:scylladb/scylladb: db/view/view_building_worker: fix indentation db/view/view_building_worker: lock staging sstables mutex for necessary shards when creating tasks	2026-04-09 08:37:51 +03:00
Piotr Smaron	ecc3bcabd4	test/ldap: add LDAP filter-injection reproducers Add tests that reproduce LDAP filter injection via unescaped {USER} substitution (SCYLLADB-1309). A wildcard username ('') matches every group entry, and a parenthesis payload (")(uid=") breaks the search filter. Extend the LDAP test fixture (ldap_server.py, slapd.conf) with memberUid attributes and the NIS schema so the new tests can exercise direct filter-value substitution.	2026-04-08 13:53:49 +02:00
Piotr Smaron	d458ff50b0	cql3: fix DESCRIBE INDEX WITH INTERNALS name DESCRIBE INDEX ... WITH INTERNALS returned the name of the backing materialized view in the name column instead of the logical index name. Return the logical index name from schema::describe() for index schemas so all callers observe the user-facing name consistently. Fixes: SCYLLADB-1122	2026-04-08 13:38:17 +02:00
Piotr Smaron	04837ba20f	test: add snapshot REST API tests for logical index names Add focused REST coverage for logical secondary-index names in snapshot creation, deletion, and details output. Also cover vector-index rejection and verify multi-keyspace delete resolves all keyspaces before deleting anything so mixed index kinds cannot cause partial removal.	2026-04-08 13:38:17 +02:00
Piotr Smaron	6b85da3ce3	test: fix snapshot cleanup helper The snapshot REST helper cleaned up multi-table snapshots with a single DELETE request that passed a comma-separated cf filter, but the API accepts only one table name there. Delete each table snapshot separately so existing tests that snapshot multiple tables use the API as documented.	2026-04-08 13:36:27 +02:00
Piotr Smaron	3090684dad	api: clarify snapshot REST parameter descriptions Document the current /storage_service/snapshots behavior more accurately. For DELETE, cf is a table filter applied independently in each keyspace listed in kn. If cf is omitted or empty, snapshots for all tables are eligible, and secondary indexes can be addressed by their logical index name.	2026-04-08 13:36:27 +02:00
Piotr Smaron	6ee75c74bd	api: surface no_such_column_family as HTTP 400 Snapshot requests that name a non-existent table or a non-snapshotable logical index currently surface an internal server error. Translate no_such_column_family into a bad request so callers get a client-facing error that matches the invalid input.	2026-04-08 13:36:27 +02:00
Piotr Smaron	7d83a264ac	db: fix clear_snapshot() atomicity and use C++23 lambda form clear_snapshot() applies a table filter independently in each keyspace, so logical index names must be resolved per keyspace on the delete path as well. Resolve all keyspaces before deleting anything so a later failure cannot partially remove a snapshot, and use the explicit-object-parameter coroutine lambda form for the asynchronous implementation.	2026-04-08 13:36:27 +02:00
Piotr Smaron	39baa1870e	db: normalize index names in get_snapshot_details() Snapshot details exposed backing secondary-index view names instead of logical index names. Normalize index entries in get_snapshot_details() so the REST API reports the user-facing name, and update the existing REST test to assert that behavior directly.	2026-04-08 13:36:27 +02:00
Piotr Smaron	9c37f1def2	db: add resolve_table_name() to snapshot_ctl The snapshot REST API accepted backing secondary-index table names, but not logical index names. Introduce resolve_table_name() so snapshot creation can translate a logical index name to the backing table when the index is materialized as a view.	2026-04-08 13:36:27 +02:00
Petr Gusev	7750d5737c	strong consistency: replace local consistency with global Currently we don't support 'local' consistency, which would imply maintaining separate raft group for each dc. What we support is actually 'global' consistency -- one raft group per tablet replica set. We don't plan to support local consistency for the first GA. Closes scylladb/scylladb#29221	2026-04-08 12:52:32 +02:00
Patryk Jędrzejczak	850db950f8	Merge 'raft: include demoted voters in read barrier during joint config' from Qian Cheng Hi, thanks for Scylla! We found a small issue in tracker::set_configuration() during joint consensus and put together a fix. When a server is demoted from voter to non-voter, set_configuration processes the current config first (can_vote=false), then the previous config. But when it finds the server already in the progress map (tracker.cc:118), it hits `continue` without updating can_vote. So the server's follower_progress::can_vote stays false even though it's still a voter in the previous config. This causes broadcast_read_quorum (fsm.cc:1055) to skip the demoted server, reducing the pool of responders. Since committed() correctly includes the server in _previous_voters for quorum calculation, read barriers can stall if other servers are slow. The fix is to use configuration::can_vote() in tracker::set_configuration. We included a reproduction unit test (test_tracker_voter_demotion_joint_config) that extracts the set_configuration algorithm and demonstrates the mismatch. We weren't able to build the full Scylla test suite to add an in-tree test, so we kept it as a standalone file for reference. No backport: the bug is non-critical and the change needs some soak time in master. Closes scylladb/scylladb#29226 * https://github.com/scylladb/scylladb: fix: use is_voter::yes instead of true in test assertions test: add tracker voter demotion test to fsm_test.cc fix: use configuration::can_vote() in tracker::set_configuration	2026-04-08 12:37:27 +02:00
Qian-Cheng-nju	a416238155	test: add tracker voter demotion test to fsm_test.cc	2026-04-08 12:37:19 +02:00
Qian-Cheng-nju	f72528c759	raft: use configuration::can_vote() in tracker::set_configuration	2026-04-08 12:37:16 +02:00
Michał Jadwiszczak	568f20396a	test: fix flaky test_create_index_synchronous_updates trace event race The test_create_index_synchronous_updates test in test_secondary_index_properties.py was intermittently failing with 'assert found_wanted_trace' because the expected trace event 'Forcing ... view update to be synchronous' was missing from the trace events returned by get_query_trace(). Root cause: trace events are written asynchronously to system_traces.events. The Python driver's populate() method considers a trace complete once the session row in system_traces.sessions has duration IS NOT NULL, then reads events exactly once. Since the session row and event rows are written as separate mutations with no transactional guarantee, the driver can read an incomplete set of events. Evidence from the failed CI run logs: - The entire test (CREATE TABLE through DROP TABLE) completed in ~300ms (01:38:54,859 - 01:38:55,157) - The INSERT with tracing happened in a ~50ms window between the second CREATE INDEX completing (01:38:55,108) and DROP TABLE starting (01:38:55,157) - The 'Forcing ... synchronous' trace message is generated during the INSERT write path (db/view/view.cc:2061), so it was produced, but not yet flushed to system_traces.events when the driver read them - This matches the known limitation documented in test/alternator/ test_tracing.py: 'we have no way to know whether the tracing events returned is the entire trace' Fix: replace the single-shot trace.events read with a retry loop that directly queries system_traces.events until the expected event appears (with a 30s timeout). Use ConsistencyLevel.ONE since system_traces has RF=2 and cqlpy tests run on a single-node cluster. The same race condition pattern exists in test_mv_synchronous_updates in test_materialized_view.py (which this test was modeled after), so the same fix is proactively applied there as well. Fixes SCYLLADB-1314 Closes scylladb/scylladb#29374	2026-04-08 12:35:10 +02:00
Raphael S. Carvalho	f941a77867	scripts/base36-uuid: dump date in UTC Previously, the timestamp decoded from a timeuuid was printed using the local timezone via datetime.fromtimestamp(), which produces different output depending on the machine's locale settings. ScyllaDB logs are emitted in UTC by default. Printing the decoded date in UTC makes it straightforward to correlate SSTable identifiers with log entries without having to mentally convert timezones. Also fix the embedded pytest assertion, which was accidentally correct only on machines in UTC+8 — it now uses an explicit UTC-aware datetime. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29253	2026-04-08 12:19:55 +03:00
Yaniv Michael Kaul	c385c0bdf9	.github/workflows/call_validate_pr_author_email.yml: add missing workflow permissions Add explicit permissions block (contents: read, pull-requests: write, statuses: write) matching the requirements of the called reusable workflow which checks out code, posts PR comments, and sets commit statuses. Fixes code scanning alert #172. Closes scylladb/scylladb#29183	2026-04-08 12:19:55 +03:00
Pavel Emelyanov	788ecaa682	api: Fix enable_injection to accept case-insensitive bool parameter Replace strict case-sensitive '== "True"' check with strcasecmp(..., "true") so that Python's str(True) -> "True" is properly recognized. Accepts any case variation of "true" ("True", "TRUE", etc.), with empty string defaulting to false. Maintains backward compatibility with out-of-tree tests that rely on Python's bool stringification. The goal is to reduce the number of distinct ways API handlers use to convert string http query parameters into bool variables. This place is the only one that simply compares param to "True". Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29236	2026-04-08 12:19:55 +03:00
Avi Kivity	0fd9ea9701	abseil: update to lts_2026_01_07 Switch to branch lts_2026_01_07, which is exactly equal to upstream now. There were no notable changes in the release notes, but the new versions are more friendly to newer compilers (specifically, in include hygiene). configure.py needs a few library updates; cmake works without change. scylla-gdb.py updated for new hash table layout (by Claude Opus 4.6). * abseil d7aaad83...255c84da (1179): > Abseil LTS branch, Jan 2026, Patch 1 (#2007) > Cherry-picks for LTS 20260107 (#1990) > Apply LTS transformations for 20260107 LTS branch (#1989) > Mark legacy Mutex methods and MutexLock pointer constructors as deprecated > `cleanup`: specify that it's safe to use the class in a signal handler. > Suppress bugprone-use-after-move in benign cases > StrFormat: format scientific notation without heap allocation > Introduce a legacy copy of GetDebugStackTraceHook API. > Report 1ns instead of 0ns for probe_benchmarks. Some tools incorrectly assume that benchmark was not run if 0ns reported. > Add absl::chunked_queue > `CRC32` version of `CombineContiguous` for length <= 32. > Add `absl::down_cast` > Fix FixedArray iterator constructor, which should require input_iterator, not forward_iterator > Add a latency benchmark for hashing a pair of integers. > Delete absl::strings_internal::STLStringReserveAmortized() > As IsAtLeastInputIterator helper > Use StringAppendAndOverwrite() in CEscapeAndAppendInternal() > Add support for absl::(u)int128 in FastIntToBuffer() > absl/strings: Prepare helper for printing objects to string representations. > Use SimpleAtob() for parsing bool flags > No-op changes to relative timeout support code. > Adjust visibility of heterogeneous_lookup_testing.h > Remove -DUNORDERED_SET_CXX17 since the macro no longer exists > [log] Prepare helper for streaming container contents to strings. > Restrict the visibility of some internal testing utilities > Add absl::linked_hash_set and absl::linked_hash_map > [meta] Add constexpr testing helper. > BUILD file reformatting. > `absl/meta`: Add C++17 port of C++20 `requires` expression for internal use > Remove the implementation of `absl::string_view`, which was only needed prior to C++17. `absl::string_view` is now an alias for `std::string_view`. It is recommended that clients simply use `std::string_view`. > No public description > absl:🎏 Stop echoing file content in flagfile parsing errors Modified ArgsList::ReadFromFlagfile to redact the content of unexpected lines from error messages. \ > Refactor the declaration of `raw_hash_set`/`btree` to omit default template parameters from the subclasses. > Import of CCTZ from GitHub. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to Flag help generator > Correct `Mix4x16Vectors` comment. > Special implementation for string hash with sizes greater than 64. > Reorder function parameters so that hash state is the first argument. > Search more aggressively for open slots in absl::internal_stacktrace::BorrowedFixupBuffer > Implement SpinLockHolder in terms of std::lock_guard. > No public description > Avoid discarding test matchers. > Import of CCTZ from GitHub. > Automated rollback of commit 9f40d6d6f3cfc1fb0325dd8637eb65f8299a4b00. > Enable clang-specific warnings on the clang-cl build instead of just trying to be MSVC > Enable clang-specific warnings on the clang-cl build instead of just trying to be MSVC > Make AnyInvocable remember more information > Add further diagnostics under clang for string_view(nullptr) > Import of CCTZ from GitHub. > Document the differing trimming behavior of absl::Span::subspan() and std::span::subspan() > Special implementation for string hash with sizes in range [33, 64]. > Add the deleted string_view(std::nullptr_t) constructor from C++23 > CI: Use a cached copy of GoogleTest in CMake builds if possible to minimize the possibility of errors downloading from GitHub > CI: Enable libc++ hardening in the ASAN build for even more checks https://libcxx.llvm.org/Hardening.html > Call the common case of AllocateBackingArray directly instead of through the function pointer. > Change AlignedType to have a void* array member so that swisstable backing arrays end up in the pointer-containing partition for heap partitioning. > base: Discourage use of ABSL_ATTRIBUTE_PACKED > Revert: Add an attribute to HashtablezInfo which performs a bitwise XOR on all hashes. The purposes of this attribute is to identify if identical hash tables are being created. If we see a large number of identical tables, it's likely the code can be improved by using a common table as opposed to keep rebuilding the same one. > Import of CCTZ from GitHub. > Record insert misses in hashtable profiling. > Add absl::StatusCodeToStringView. > Add a missing dependency on str_format that was being pulled in transitively > Pico-optimize `SkipWhitespace` to use `StripLeadingAsciiWhitespace`. > absl::string_view: Upgrade the debug assert on the single argument char* constructor to ABSL_HARDENING_ASSERT > Use non-stack storage for stack trace buffers > Fixed incorrect include for ABSL_NAMESPACE_BEGIN > Add ABSL_REFACTOR_INLINE to separate the inliner directive from the deprecated directive so that we can give users a custom deprecation message. > Reduce stack usage when unwinding without fixups > Reduce stack usage when unwinding from 170 to 128 on x64 > Rename RecordInsert -> RecordInsertMiss. > PR #1968: Use std::move_backward within InlinedVector's Storage::Insert > Use the new absl::StringResizeAndOverwrite() in CUnescape() > Explicitly instantiate common `raw_hash_set` backing array functions. > Rollback reduction of maximum load factor. Now it is back to 28/32. > Export Mutex::Dtor from shared libraries in NDEBUG mode > Allow `IsOkAndHolds` to rely on duck typing for matching `StatusOr` like types instead of uniquely `absl::StatusOr`, e.g. `google::cloud::StatusOr`. > Fix typo in macro and add missing static_cast for WASM builds. > windows(cmake): add abseil_test_dll to target link libraries when required > Handle empty strings in `SimpleAtof` after stripping whitespace > Avoid using a thread_local in an inline function since this causes issues on some platforms. > (Roll forward) Change Abseil's SpinLock adaptive_spin_count to a class static variable that can be set by tcmalloc friend classes. > Change Abseil's SpinLock adaptive_spin_count to a class static variable that can be set by tcmalloc friend classes. > Change Abseil's SpinLock adaptive_spin_count to a class static variable that can be set by tcmalloc friend classes. > Fixes for String{Resize\|Append}AndOverwrite - StringAppendAndOverwrite() should always call StringResizeAndOverwrite() with at least capacity() in case the standard library decides to shrink the buffer (Fixes #1965) - Small refactor to make the minimum growth an addition for clarity and to make it easier to test 1.5x growth in the future - Turn an ABSL_HARDENING_ASSERT into a ThrowStdLengthError - Add a missing std::move > Correct the supported features of Status Matchers > absl/time: Use "memory order acquire" for loads, which would allow for the safe removal of the data memory barrier. > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Add an internal-only helper StringAppendAndOverwrite() similar to StringResizeAndOverwrite() but optimized for repeated appends, using exponential growth to ensure amortized complexity of increasing a string size by a small amount is O(1). > Release `ABSL_EXPECT_OK` and `ABSL_ASSERT_OK`. > Fix the CHECK_XX family of macros to not print `char` arguments as C-strings if the comparison happened as pointers. Printing as pointers is more relevant to the result of the comparison. > Rollback StringAppendAndOverwrite() - the problem is that StringResizeAndOverwrite has MSAN testing of the entire string. This causes quadratic MSAN verification on small appends. > Add an internal-only helper StringAppendAndOverwrite() similar to StringResizeAndOverwrite() but optimized for repeated appends, using exponential growth to ensure amortized complexity of increasing a string size by a small amount is O(1). > PR #1961: Fix Clang warnings on powerpc > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > macOS CI: Move the Bazel vendor_dir to ${HOME} to workaround a Bazel issue where it does not work when it is in ${TMP} and also fix the quoting which was causing it to incorrectly receive the argument > Use __msan_check_mem_is_initialized for detailed MSan report > Optimize stack unwinding by reducing `AddressIsReadable` calls. > Add internal API to allow bypassing stack trace fixups when needed > absl::StrFormat: improve test coverage with scientific exponent test cases > Add throughput and latency benchmarks for `absl::ToDoubleXYZ` functions. > CordzInfo: Use absl::NoDestructor to remove a global destructor. Chromium requires no global destructors. > string_view: Enable std::view and std::borrowed_range > cleanup: s/logging_internal/log_internal/ig for consistency > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Use the new absl::StringResizeAndOverwrite() in string escaping utilities > Use the new absl::StringResizeAndOverwrite() in absl::AsciiStrTo{Lower\|Upper} > Use the new absl::StringResizeAndOverwrite() in absl::StrJoin() > Use the new absl::StringResizeAndOverwrite() in absl::StrCat() > string_view: Fix include order > Don't pass nullptr as the 1st arg of `from_chars` > absl/types: format code with clang-format. > Validate absl::StringResizeAndOverwrite op has written bytes as expected. > Skip the ShortStringCollision test on WASM. > Rollback `absl/types`: format code with clang-format. > Remove usage of the WasmOffsetConverter for Wasm / Emscripten stack-traces. > Use the new absl::StringResizeAndOverwrite() in absl::CordCopyToString() > Remove an undocumented behavior of --vmodule and absl::SetVLogLevel that could set a module_pattern to defer to the global vlog threshold. > Update to rules_cc 0.2.9 > Avoid redefine warnings with ntstatus constants > PR #1944: Use same element-width for non-temporal loads and stores on Arm > absl::StringResizeAndOverwrite(): Add the requirement that the only value that can be written to buf[size] is the terminator character. > absl/types: format code with clang-format. > Minor formatting changes. > Remove `IntIdentity` and `PtrIdentity` from `raw_hash_set_probe_benchmark`. > Automated rollback of commit cad60580dba861d36ed813564026d9774d9e4e2b. > FlagStateInterface implementors need only support being restored once. > Clarify the post-condition of `reserve()` in Abseil hash containers. > Clarify the post-condition of `reserve()` in Abseil hash containers. > Represent dropped samples in hashtable profile. > Add lifetimebound to absl::implicit_cast and make it work for rvalue references as it already does with lvalue references > Clean up a doc example where we had `absl_nonnull` and `= nullptr;` > Change Cordz to synchronize tracked cords with Snapshots / DeleteQueue > Minor refactor to `num_threads` in deadlock test > Rename VLOG macro parameter to match other uses of this pseudo type. > `time`: Fix indentation > Automated Code Change > Adds `absl::StringResizeAndOverwrite` as a polyfill for C++23's `std::basic_string<CharT,Traits,Allocator>::resize_and_overwrite` > Internal-only change > absl/time: format code with clang-format. > No public description > Expose typed releasers of externally appended memory. > Fix __declspec support for ABSL_DECLARE_FLAG() > Annotate absl::AnyInvocable as an owner type via [[gsl::Owner]] and absl_internal_is_view = std::false_type > Annotate absl::FunctionRef as a view type via [[gsl::Pointer]] and absl_internal_is_view > Remove unnecessary dep on `core_headers` from the `nullability` cc_library > type_traits: Add type_identity and type_traits_t backfills > Refactor raw_hash_set range insertion to call private insert_range function. > Fix bug in absl::FunctionRef conversions from non-const to const > PR #1937: Simplify ConvertSpecialToEmptyAndFullToDeleted > Improve absl::FunctionRef compatibility with C++26 > Add a workaround for unused variable warnings inside of not-taken if-constexpr codepaths in older versions of GCC > Annotate ABSL_DIE_IF_NULL's return type with `absl_nonnull` > Move insert index computation into `PrepareInsertLarge` in order to reduce inlined part of insert/emplace operations. > Automated Code Change > PR #1939: Add missing rules_cc loads > Expose (internally) a LogMessage constructor taking file as a string_view for (internal, upcoming) FFI integration. > Fixed up some #includes in mutex.h > Make absl::FunctionRef support non-const callables, aligning it with std::function_ref from C++26 > Move capacity update in `Grow1To3AndPrepareInsert` after accessing `common.infoz()` to prevent assertion failure in `control()`. > Fix check_op(s) compilation failures on gcc 8 which eagerly tries to instantiate std::underlying_type for non-num types. > Use `ABSL_ATTRIBUTE_ALWAYS_INLINE`for lambda in `find_or_prepare_insert_large`. > Mark the implicit floating operators as constexpr for `absl::int128` and `absl::uint128` > PR #1931: raw_hash_set: fix instantiation for recursive types on MSVC with /Zc:__cplusplus > Add std::pair specializations for IsOwner and IsView > Cast ABSL_MIN_LOG_LEVEL to absl::LogSeverityAtLeast instead of absl::LogSeverity. > Fix a corner case in the aarch64 unwinder > Fix inconsistent nullability annotation in ReleasableMutexLock > Remove support for Native Client > Rollback f040e96b93dba46e8ed3ca59c0444cbd6c0a0955 > When printing CHECK_XX failures and both types are unprintable, don't bother printing " (UNPRINTABLE vs. UNPRINTABLE)". > PR #1929: Fix shorten-64-to-32 warning in stacktrace_riscv-inl.inc > Refactor `find_or_prepare_insert_large` to use a single return statement using a lambda. > Use possible CPUs to identify NumCPUs() on Linux. > Fix incorrect nullability annotation of `absl::Cord::InlineRep::set_data()`. > Move SetCtrl family of functions to cc file. > Change absl::InlinedVector::clear() so that it does not deallocate any allocated space. This allows allocations to be reused and matches the behavior specification of std::vector::clear(). > Mark Abseil container algorithms as `constexpr` for C++20. > Fix `CHECK_<OP>` ambiguous overload for `operator<<` in older versions of GCC when C-style strings are compared > stacktrace_test: avoid spoiling errno in the test signal handler. > Optimize `CRC32AcceleratedX86ARMCombinedMultipleStreams::Extend` by interleaving the `CRC32_u64` calls at a lower level. > stacktrace_test: avoid spoiling errno in the test signal handler. > stacktrace_test: avoid spoiling errno in the test signal handler. > std::multimap::find() is not guaranteed to return the first entry with the requested key. Any may be returned if many exist. > Mark `/`, `%`, and `` operators as constexpr when intrinsics are available. > Add the C++20 string_view contructor that uses iterators > Implement absl::erase_if for absl::InlinedVector > Adjust software prefetch to fetch 5 cachelines ahead, as benchmarking suggests this should perform better. > Reduce maximum load factor to 27/32 (from 28/32). > Remove unused include > Remove unused include statement > PR #1921: Fix ABSL_BUILD_DLL mode (absl_make_dll) with mingw > PR #1922: Enable mmap for WASI if it supports the mman header > Rollback C++20 string_view constructor that uses iterators due to broken builds > Add the C++20 string_view contructor that uses iterators > Bump versions of dependencies in MODULE.bazel > Automated Code Change > PR #1918: base: add musl + ppc64le fallback for UnscaledCycleClock::Frequency > Optimize crc32 Extend by removing obsolete length alignment. > Fix typo in comment of `ABSL_ATTRIBUTE_UNUSED`. > Mark AnyInvocable as being nullability compatible. > Ensure stack usage remains low when unwinding the stack, to prevent stack overflows > Shrink #if ABSL_HAVE_ATTRIBUTE_WEAK region sizes in stacktrace_test.cc > <filesystem> is not supported for XTENSA. Disable it in //absl/hash/internal/hash.h. > Use signal-safe dynamic memory allocation for stack traces when necessary > PR #1915: Fix SYCL Build Compatibility with Intel LLVM Compiler on Windows for abseil > Import of CCTZ from GitHub. > Tag tests that currently fail on ios_sim_arm64 with "no_test_ios_sim_arm64" > Automated Code Change > Automated Code Change > Import of CCTZ from GitHub. > Move comment specific to pointer-taking MutexLock variant to its definition. > Add lifetime annotations to MutexLock, SpinLockHolder, etc. > Add lifetimebound annotations to absl::MakeSpan and absl::MakeConstSpan to detect dangling references > Remove comment mentioning deferenceability. > Add referenceful MutexLock with Condition overload. > Mark SpinLock camel-cased methods as ready for inlining. > Whitespace change > In logging tests that write expectations against `ScopedMockLog::Send`, suppress the default behavior that forwards to `ScopedMockLog::Log` so that unexpected logs are printed with full metadata. Many of these tests are poking at those metadata, and a failure message that doesn't include them is unhelpful. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to absl::ClippedSubstr > Inline internal usages of Mutex::Lock, etc. in favor of lock. > Inline internal usages of pointerful SpinLockHolder/MutexLock. > Remove wrong comment in Cord::Unref > Update the crc32 dynamic dispatch table with newer platforms. > PR #1914: absl/base/internal/poison.cc: Minor build fix > Accept references on SpinLockHolder/MutexLock > Import of CCTZ from GitHub. > Fix typos in comments. > Inline SpinLock Lock->lock, Unlock->unlock internal to Abseil. > Rename Mutex methods to use the typical C++ lower case names. > Rename SpinLock methods to use the typical C++ lower case names. > Add an assert that absl::StrSplit is not called with a null char argument. > Fix sign conversion warning > PR #1911: Fix absl_demangle_test on ppc64 > Disallow using a hash function whose return type is smaller than size_t. > Optimize CRC-32C extension by zeroes > Deduplicate stack trace implementations in stacktrace.cc > Align types of location_table_ and mapping_table_ keys (-Wshorten-64-to-32). > Move SigSafeArena() out to absl/base/internal/low_level_alloc.h > Allow CHECK_<OP> variants to be used with unprintable types. > Import of CCTZ from GitHub. > Adds required load statements for C++ rules to BUILD and bzl files. > Disable sanitizer bounds checking in ComputeZeroConstant. > Roll back NDK weak symbol mode for backtrace() due to internal test breakage > Add converter for extracting SwissMap profile information into a https://github.com/google/pprof suitable format for inspection. > Allocate memory for frames and sizes during stack trace fix-up when no memory is provided > Support NDK weak symbol mode for backtrace() on Android. > Change skip_empty_or_deleted to not use groups. > Fix bug of dereferencing invalidated iterator in test case. > Refactor: split erase_meta_only into large and small versions. > Fix a TODO to use std::is_nothrow_swappable when it became available. > Clean up the testing of alternate options that were removed in previous changes > Only use generic stacktrace when ABSL_HAVE_THREAD_LOCAL. > Automated Code Change > Add triviality tests for absl::Span > Loosen the PointerAlignment test to allow up to 5 stuck bits to avoid flakiness. > Prevent conversion constructions from absl::Span to itself > Skip flaky expectations in waiter_test for MSVC. > Refactor: call AssertIsFull from iterator::assert_is_full to avoid passing the same arguments repeatedly. > In AssertSameContainer, remove the logic checking for whether the iterators are from SOO tables or not since we don't use it to generate a more informative debug message. > Remove unused NonIterableBitMask::HighestBitSet function. > Refactor: move iterator unchecked_* members before data members to comply with Google C++ style guide. > Mix pointers once instead of twice now that we've improved mixing on 32-bit platforms and improved the kMul constant. > Remove unused utility functions/constants. > Revert a change for breaking downstream third party libs > Remove unneeded include from cord_rep_btree_navigator.h > Refactor: move find_first_non_full into raw_hash_set.cc. > Perform stronger mixing on 32-bit platforms and enable the LowEntropyStrings test. > Include deallocated caller-provided size in delete hooks. > Roll back one more time: In debug mode, assert that the probe sequence isn't excessively long. > Allow a `std::move` of `delimiter_` to happen in `ByString::ByString(ByString&&)`. Right now the move ctor is making a copy because the source object is `const`. > Assume that control bytes don't alias CommonFields. > Consistently use [[maybe_unused]] in raw_hash_set.h for better compiler warning compatibility. > Roll forward: In debug mode, assert that the probe sequence isn't excessively long. > Add a new test for hash collisions for short strings when PrecombineLengthMix has low quality. > Refactor: define CombineRawImpl for repeated `Mix(state ^ value, kMul)` operations. > Automated Code Change > Mark hash_test as large so that the timeout is increased. > Change the value of kMul to have higher entropy and prevent collisions when keys are aligned integers or pointers. > Fix LIFETIME annotations for op/op->/value operators for reference types. > Update StatusOr to support lvalue reference value types. > Rollback debug assertion that the probe sequence isn't excessively long. > AnyInvocable: Fix operator==/!= comments > In debug mode, assert that the probe sequence isn't excessively long. > Improve NaN handling in absl::Duration arithmetic. > Change PrecombineLengthMix to sample data from kStaticRandomData. > Fix includes and fuse constructors of SpinLock. > Enable `operator==` for `StatusOr` only if the contained type is equality-comparable > Enable SIMD memcpy-crc on ARM cores. > Improve mixing on 32-bit platforms. > Change DurationFromDouble to return -InfiniteDuration() for all NaNs. > Change return type of hash internal `Seed` to `size_t` from `uint64_t` > CMake: Add a fatal error when the compiler defaults to or is set to a C++ language standard prior to C++17. > Make bool true hash be ~size_t{} instead of 1 so that all bits are different between true/false instead of only one. > Automated Code Change > Pass swisstable seed as seed to absl::Hash so we can save an XOR in H1. > Add support for scoped enumerations in CHECK_XX(). > Revert no-inline on Voidify::operator&&() -- caused unexpected binary size growth > Mark Voidify::operator&&() as no-inline. This improves stack trace for `LOG(FATAL)` with optimization on. > Refactor long strings hash computations and move `len <= PiecewiseChunkSize()` out of the line to keep only one function call in the inlined hash code. > rotr/rotl: Fix undefined behavior when passing INT_MIN as the number of positions to rotate by > Reorder members of MixingHashState to comply with Google C++ style guide ordering of type declarations, static constants, ctors, non-ctor functions. > Delete unused function ShouldSampleHashtablezInfoOnResize. > Remove redundant comments that just name the following symbol without providing additional information. > Remove unnecessary modification of growth info in small table case. > Suppress CFI violation on VDSO call. > Replace WeakMix usage with Mix and change H2 to use the most significant 7 bits - saving 1 cycle in H1. > Fix -Wundef warning > Fix conditional constexpr in ToInt64{Nano\|Micro\|Milli}seconds under GCC7 and GCC8 using an else clause as a workaround > Enable CompressedTupleTest.NestedEbo test case. > Lift restriction on using EBCO[1] for nested CompressedTuples. The current implementation of CompressedTuple explicitly disallows EBCO for cases where CompressedTuples are nested. This is because the implentation for a tuple with EBCO-compatible element T inherits from Storage<T, I>, where I is the index of T in the tuple, and > absl::string_view: assert against (data() == nullptr && size() != 0) > Fix a false nullability warning in [Q]CHECK_OK by replacing nullptr with an empty char > Make `combine_contiguous` to mix length in a weak way by adding `size << 24`, so that we can avoid a separate mixing of size later. The empty range is mixing 0x57 byte. > Add a test case that -1.0 and 1.0 have different hashes. > Update CI to a more recent Clang on Linux x86-64 > `absl::string_view`: Add a debug assert to the single-argument constructor that the argument is not `nullptr`. > Fix CI on macOS Sequoia > Use Xcode 16.3 for testing > Use a proper fix instead of a workaround for a parameter annotated absl_nonnull since the latest Clang can see through the workaround > Assert that SetCtrl isn't called on small tables - there are no control bytes in such cases. > Use `MaskFullOrSentinel` in `skip_empty_or_deleted`. > Reduce flakiness in MockDistributions.Examples test case. > Rename PrepareInsertNonSoo to PrepareInsertLarge now that it's no longer used in all non-SOO cases. > PR #1895: use c++17 in podspec > Avoid hashing the key in prefetch() for small tables. > Remove template alias nullability annotations. > Add `Group::MaskFullOrSentinel` implementation without usage. > Move `hashtable_control_bytes` tests into their own file. > Simplify calls to `EqualElement` by introducing `equal_to` helper function. > Do `common.increment_size()` directly in SmallNonSooPrepareInsert if inserting to reserved 1 element table. > Import of CCTZ from GitHub. > Small cleanup of `infoz` processing to get the logic out of the line or removed. > Extract the entire PrepareInsert to Small non SOO table out of the line. > Take `get_hash` implementation out of the SwissTable class to minimize number of instantiations. > Change kEmptyGroup to kDefaultIterControl now that it's only used for default-constructed iterators. > [bits] Add tests for return types > Avoid allocating control bytes in capacity==1 swisstables. > PR #1888: Adjust Table.GrowExtremelyLargeTable to avoid OOM on i386 > Avoid mixing after `Hash64` calls for long strings by passing `state` instead of `Seed` to low level hash. > Indent absl container examples consistently > Revert- Doesn't actually work because SWIG doesn't use the full preprocessor > Add tags to skip some tests under UBSAN. > Avoid subtracting `it.control()` and `table.control()` in single element table during erase. > Remove the `salt` parameter from low level hash and use a global constant. That may potentially remove some loads. > In SwissTable, don't hash the key when capacity<=1 on insertions. > Remove the "small" size designation for thread_identity_test, which causes the test to timeout after 60s. > Add comment explaining math behind expressions. > Exclude SWIG from ABSL_DEPRECATED and ABSL_DEPRECATE_AND_INLINE > stacktrace_x86: Handle nested signals on altstack > Import of CCTZ from GitHub. > Simplify MixingHashState::Read9To16 to not depend on endianness. > Delete deprecated `absl::Cord::Get` and its remaining call sites. > PR #1884: Remove duplicate dependency > Remove relocatability test that is no longer useful > Import of CCTZ from GitHub. > Fix a bug of casting sizeof(slot_type) to uint16_t instead of uint32_t. > Rewrite `WideToUtf8` for improved readability. > Avoid requiring default-constructability of iterator type in algorithms that use ContainerIterPairType > Added test cases for invalid surrogates sequences. > Use __builtin_is_cpp_trivially_relocatable to implement absl::is_trivially_relocatable in a way that is compatible with PR2786 in the upcoming C++26. > Remove dependency on `wcsnlen` for string length calculation. > Stop being strict about validating the "clone" part of mangled names > Add support for logging wide strings in `absl::log`. > Deprecate `ABSL_HAVE_STD_STRING_VIEW`. > Change some nullability annotations in absl::Span to absl_nullability_unknown to workaround a bug that makes nullability checks trigger in foreach loops, while still fixing the -Wnullability-completeness warnings. > Linux CI update > Fix new -Wnullability-completeness warnings found after upgrading the Clang version used in the Linux ARM CI to Clang 19. > Add __restrict for uses of PolicyFunctions. > Use Bazel vendor mode to cache external dependencies on Windows and macOS > Move PrepareInsertCommon from header file to cc file. > Remove the explicit from the constructor to a test allocator in hash_policy_testing.h. This is rejected by Clang when using the libstdc++ that ships with GCC15 > Extract `WideToUtf8` helper to `utf8.h`. > Updates the documentation for `CHECK` to make it more explicit that it is used to require that a condition is true. > Add PolicyFunctions::soo_capacity() so that the compiler knows that soo_capacity() is always 0 or 1. > Expect different representations of pointers from the Windows toolchain. > Add set_no_seed_for_testing for use in GrowExtremelyLargeTable test. > Update GoogleTest dependency to 1.17.0 to support GCC15 > Assume that frame pointers inside known stack bounds are readable. > Remove fallback code in absl/algorithm/container.h > Fix GCC15 warning that <ciso646> is deprecated in C++17 > Fix misplaced closing brace > Remove unused include. > Automated Code Change > Type erase copy constructor. > Refactor to use hash_of(key) instead of hash_ref()(key). > Create Table.Prefetch test to make sure that it works. > Remove NOINLINE on the constructor with buckets. > In SwissTable, don't hash the key in find when capacity<=1. > Use 0x57 instead of Seed() for weakly mixing of size. > Use absl::InsecureBitGen in place of std::random_device in Abseil tests. > Remove unused include. > Use large 64 bits kMul for 32 bits platforms as well. > Import of CCTZ from GitHub. > Define `combine_weakly_mixed_integer` in HashSelect::State in order to allow `friend auto AbslHashValue` instead of `friend H AbslHashValue`. > PR #1878: Fix typos in comments > Update Abseil dependencies in preparation for release > Use weaker mixing for absl::Hash for types that mix their sizes. > Update comments on UnscaledCycleClock::Now. > Use alignas instead of the manual alignment for the Randen entropy pool. > Document nullability annotation syntax for array declarations (not many people may know the syntax). > Import of CCTZ from GitHub. > Release tests for ABSL_RAW_DCHECK and ABSL_RAW_DLOG. > Adjust threshold for stuck bits to avoid flaky failures. > Deprecate template type alias nullability annotations. > Add more probe benchmarks > PR #1874: Simplify detection of the powerpc64 ELFv1 ABI > Make `absl::FunctionRef` copy-assignable. This brings it more in line with `std::function_ref`. > Remove unused #includes from absl/base/internal/nullability_impl.h > PR #1870: Retry SymInitialize on STATUS_INFO_LENGTH_MISMATCH > Prefetch from slots in parallel with reading from control. > Migrate template alias nullability annotations to macros. > Improve dependency graph in `TryFindNewIndexWithoutProbing` hot path evaluation. > Add latency benchmarks for Hash for strings with size 3, 5 and 17. > Exclude UnwindImpl etc. from thread sanitizer due to false positives. > Use `GroupFullEmptyOrDeleted` inside of `transfer_unprobed_elements_to_next_capacity_fn`. > PR #1863: [minor] Avoid variable shadowing for absl btree > Extend stack-frame walking functionality to allow dynamic fixup > Fix "unsafe narrowing" in absl for Emscripten > Roll back change to address breakage > Extend stack-frame walking functionality to allow dynamic fixup > Introduce `absl::Cord::Distance()` > Avoid aliasing issues in growth information initialization. > Make `GrowSooTableToNextCapacityAndPrepareInsert` in order to initialize control bytes all at once and avoid two function calls on growth right after SOO. > Simplify `SingleGroupTableH1` since we do not need to mix all bits anymore. Per table seed has a good last bit distribution. > Use `NextSeed` instead of `NextSeedBaseNumber` and make the result type to be `uint16_t`. That avoids unnecessary bit twiddling and simplify the code. > Optimize `GrowthToLowerBoundCapacity` in order to avoid division. > [base] Make :endian internal to absl > Fully qualify absl names in check macros to avoid invalid name resolution when the user scope has those names defined. > Fix memory sanitization in `GrowToNextCapacityAndPrepareInsert`. > Define and use `ABSL_SWISSTABLE_ASSERT` in cc file since a lot of logic moved there. > Remove `ShouldInsertBackwards` functionality. It was used for additional order randomness in debug mode. It is not necessary anymore with introduction of separate per table `seed`. > Fast growing to the next capacity based on carbon hash table ideas. > Automated Code Change > Refactor CombinePiecewiseBuffer test case to (a) call PiecewiseChunkSize() to get the chunk size and (b) use ASSERT for expectation in a loop. > PR #1867: Remove global static in stacktrace_win32-inl.inc > Mark Abseil hardening assert in AssertIsValidForComparison as slow. > Roll back a problematic change. > Add absl::FastTypeId<T>() > Automated Code Change > Update TestIntrinsicInt128 test to print the indices with the conflicting hashes. > Code simplification: we don't need XOR and kMul when mixing large string hashes into hash state. > Refactor absl::CUnescape() to use direct string output instead of pointer/size. > Rename `policy.transfer` to `policy.transfer_n`. > Optimize `ResetCtrl` for small tables with `capacity < Group::KWidth * 2` (<32 if SSE enabled and <16 if not). > Use 16 bits of per-table-seed so that we can save an `and` instruction in H1. > Fully annotate nullability in headers where it is partially annotated. > Add note about sparse containers to (flat\|node)_hash_(set\|map). > Make low_level_alloc compatible with -Wthread-safety-pointer > Add missing direct includes to enable the removal of unused includes from absl/base/internal/nullability_impl.h. > Add tests for macro nullability annotations analogous to existing tests for type alias annotations. > Adds functionality to return stack frame pointers during stack walking, in addition to code addresses > Use even faster reduction algorithm in FinalizePclmulStream() > Add nullability annotations to some very-commonly-used APIs. > PR #1860: Add `unsigned` to character buffers to ensure they can provide storage (https://eel.is/c++draft/intro.object#3) > Release benchmarks for absl::Status and absl::StatusOr > Use more efficient reduction algorithm in FinalizePclmulStream() > Add a test case to make it clear that `--vmodule=foo/=1` does match any children and grandchildren and so on under `foo/`. > Gate use of clang nullability qualifiers through absl nullability macros on `nullability_on_classes`. > Mark `absl::StatusOr::status()` as ABSL_MUST_USE_RESULT > Cleanups related to benchmarks Fix many benchmarks to be cc_binary instead of cc_test * Add a few benchmarks for StrFormat * Add benchmarks for Substitute * Add benchmarks for Damerau-Levenshtein distance used in flags > Add a log severity alias `DO_NOT_$UBMIT` intended for logging during development > Avoid relying on true and false tokens in the preprocessor macros used in any_invocable.h > Avoid relying on true and false tokens in the preprocessor macros used in absl/container > Refactor to make it clear that H2 computation is not repeated in each iteration of the probe loop. > Turn on C++23 testing for GCC and Clang on Linux > Fix overflow of kSeedMask on 32 bits platform in `generate_new_seed`. > Add a workaround for std::pair not being trivially copyable in C++23 in some standard library versions > Refactor WeakMix to include the XOR of the state with the input value. > Migrate ClearPacBits() to a more generic implementation and location > Annotate more Abseil container methods with [[clang::lifetime_capture_by(...)]] and make them all forward to the non-captured overload > Make PolicyFunctions always be the second argument (after CommonFields) for type-erased functions. > Move GrowFullSooTableToNextCapacity implementation with some dependencies to cc file. > Optimize btree_iterator increment/decrement to avoid aliasing issues by using local variables instead of repeatedly writing to `this`. > Add constexpr conversions from absl::Duration to int64_t > PR #1853: Add support for QCC compiler > Fix documentation for key requirements of flat_hash_set > Use `extern template` for `GrowFullSooTableToNextCapacity` since we know the most common set of paramenters. > C++23: Fix log_format_test to match the stream format for volatile pointers > C++23: Fix compressed_tuple_test. > Implement `btree::iterator::+=` and `-=`. > Stop calling `ABSL_ANNOTATE_MEMORY_IS_INITIALIZED` for threadlocal counter. > Automated Code Change > Introduce seed stored in the hash table inside of the size. > Replace ABSL_ATTRIBUTE_UNUSED with [[maybe_unused]] > Minor consistency cleanups to absl::BitGen mocking. > Restore the empty CMake targets for bad_any_cast, bad_optional_access, and bad_variant_access to allow clients to migrate. > bits.h: Add absl::endian and absl::byteswap polyfills > Use absl::NoDestructor an absl::Mutex instance in the flags library to prevent some exit-time destructor warnings > Add thread GetEntropyFromRandenPool test > Update nullability annotation documentation to focus on macro annotations. > Simplify some random/internal types; expose one function to acquire entropy. > Remove pre-C++17 workarounds for lack of std::launder > UBSAN: Use -fno-sanitize-recover > int128_test: Avoid testing signed integer overflow > Remove leading commas in `Describe` methods of `StatusIs` matcher. > absl::StrFormat: Avoid passing null to memcpy > str_cat_test: Avoid using invalid enum values > hash_generator_testing: Avoid using invalid enum values > absl::Cord: Avoid passing null to memcpy and memset > graphcycles_test: Avoid applying a non-zero offset to a null pointer > Make warning about wrapping empty std::function in AnyInvocable stronger. > absl/random: Convert absl::BitGen / absl::InsecureBitGen to classes from aliases. > Fix buffer overflow the internal demangling function > Avoid calling `ShouldRehashForBugDetection` on the first two inserts to the table. > Remove the polyfill implementations for many type traits and alias them to their std equivalents. It is recommended that clients now simple use the std equivalents. > ROLLBACK: Limit slot_size to 2^16-1 and maximum table size to 2^43-1. > Limit `slot_size` to `2^16-1` and maximum table size to `2^43-1`. > Use C++17 [[nodiscard]] instead of the deprecated ABSL_MUST_USE_RESULT > Remove the polyfills for absl::apply and absl::make_from_tuple, which were only needed prior to C++17. It is recommended that clients simply use std::apply and std::make_from_tuple. > PR #1846: Fix build on big endian > Bazel: Move environment variables to --action_env > Remove the implementation of `absl::variant`, which was only needed prior to C++17. `absl::variant` is now an alias for `std::variant`. It is recommended that clients simply use `std::variant`. > MSVC: Fix warnings c4244 and c4267 in the main library code > Update LowLevelHashLenGt16 to be LowLevelHashLenGt32 now that the input is guaranteed to be >32 in length. > Xtensa does not support thread_local. Disable it in absl/base/config.h. > Add support for 8-bit and 16-bit integers to absl::SimpleAtoi > CI: Update Linux ARM latest container > Add time hash tests > `any_invocable`: Update comment that refer to C++17 and C++11 > `check_test_impl.inc`: Use C++17 features unconditionally > Remove the implementation of `absl::optional`, which was only needed prior to C++17. `absl::optional` is now an alias for `std::optional`. It is recommended that clients simply use `std::optional`. > Move hashtable control bytes manipulation to a separate file. > Fix a use-after-free bug in which the string passed to `AtLocation` may be referenced after it is destroyed. While the string does live until the end of the full statement, logging (previously occurred) in the destructor of the `LogMessage` which may be constructed before the temporary string (and thus destroyed after the temporary string's destructor). > `internal/layout`: Delete pre-C++17 out of line definition of constexpr class member > Extract slow path for PrepareInsertNonSoo to a separate function `PrepareInsertNonSooSlow`. > Minor code cleanups > `internal/log_message`: Use `if constexpr` instead of SFINAE for `operator<<` > [absl] Use `std::min` in `constexpr` contexts in `absl::string_view` > Remove the implementation of `absl::any`, which was only needed prior to C++17. `absl::any` is now an alias for `std::any`. It is recommended that clients simply use `std::any`. > Remove ABSL_INTERNAL_NEED_REDUNDANT_CONSTEXPR_DECL which is longer needed with the C++17 floor > Make `OptimalMemcpySizeForSooSlotTransfer` ready to work with MaxSooSlotSize upto `3sizeof(size_t)`. > `internal/layout`: Replace SFINAE with `if constexpr` > PR #1830: C++17 improvement: use if constexpr in internal/hash.h > `absl`: Deprecate `ABSL_HAVE_CLASS_TEMPLATE_ARGUMENT_DEDUCTION` > Add a verification for access of being destroyed table. Also enabled access after destroy check in ASAN optimized mode. > Store `CharAlloc` in SwissTable in order to simplify type erasure of functions accepting allocator as `void`. > Introduce and use `SetCtrlInLargeTable`, when we know that table is at least one group. Similarly to `SetCtrlInSingleGroupTable`, we can save some operations. > Make raw_hash_set::slot_type private. > Delete absl/utility/internal/if_constexpr.h > `internal/any_invocable`: Use `if constexpr` instead of SFINAE when initializing storage accessor > Depend on string_view directly > Optimize and slightly simplify `PrepareInsertNonSoo`. > PR #1833: Make ABSL_INTERNAL_STEP_n macros consistent in crc code > `internal/any_invocable`: Use alias `RawT` consistently in `InitializeStorage` > Move the implementation of absl::ComputeCrc32c to the header file, to facilitate inlining. > Delete absl/base/internal/inline_variable.h > Add lifetimebound to absl::StripAsciiWhitespace > Revert: Random: Use target attribute instead of -march > Add return for opt mode in AssertNotDebugCapacity to make sure that code is not evaluated in opt mode. > `internal/any_invocable`: Delete TODO, improve comment and simplify pragma in constructor > Split resizing routines and type erase similar instructions. > Random: Use target attribute instead of -march > `internal/any_invocable`: Use `std::launder` unconditionally > `internal/any_invocable`: Remove suppresion of false positive -Wmaybe-uninitialized on GCC 12 > Fix feature test for ABSL_HAVE_STD_OPTIONAL > Support C++20 iterators in raw_hash_map's random-access iterator detection > Fix mis-located test dependency > Disable the DestroyedCallsFail test on GCC due to flakiness. > `internal/any_invocable`: Implement invocation using `if constexpr` instead of SFINAE > PR #1835: Bump deployment_target version and add visionos to podspec > PR #1828: Fix spelling of pseudorandom in README.md > Make raw_hash_map::key_arg private. > `overload`: Delete obsolete macros for undefining `absl::Overload` when C++ < 17 > `absl/base`: Delete `internal/invoke.h` and `invoke_test.cc` > Remove `WORKSPACE.bazel` > `absl`: Replace `base_internal::{invoke,invoke_result_t,is_invocable_r}` with `std` equivalents > Allow C++20 forward iterators to use fast paths > Factor out some iterator traits detection code > Type erase IterateOverFullSlots to decrease code size. > `any_invocable`: Delete pre-C++17 workarounds for `noexcept` and guaranteed copy elision > Make raw_hash_set::key_arg private. > Rename nullability macros to use new lowercase spelling. > Fix bug where ABSL_REQUIRE_EXPLICIT_INIT did not actually result in a linker error > Make Randen benchmark program use runtime CPU detection. > Add CI for the C++20/Clang/libstdc++ combination > Move Abseil to GoogleTest 1.16.0 > `internal/any_invocable`: Use `if constexpr` instead of SFINAE in `InitializeStorage` > More type-erasing of InitializeSlots by removing the Alloc and AlignOfSlot template parameters. > Actually use the hint space instruction to strip PAC bits for return addresses in stack traces as the comment says > `log/internal`: Replace `..._ATTRIBUTE_UNUSED_IF_STRIP_LOG` with C++17 `[[maybe_unused]]` > `attributes`: Document `ABSL_ATTRIBUTE_UNUSED` as deprecated > `internal/any_invocable`: Initialize using `if constexpr` instead of ternary operator, enum, and templates > Fix flaky tests due to sampling by introducing utility to refresh sampling counters for the current thread. > Minor reformatting in raw_hash_set: - Add a clear_backing_array member to declutter calls to ClearBackingArray. - Remove some unnecessary `inline` keywords on functions. - Make PoisonSingleGroupEmptySlots static. > Update CI for linux_gcc-floor to use GCC9, Bazel 7.5, and CMake 3.31.5. > `internal/any_invocable`: Rewrite `IsStoredLocally` type trait into a simpler constexpr function > Add ABSL_REQUIRE_EXPLICIT_INIT to Abseil to enable enforcing explicit field initializations > Require C++17 > Minimize number of `InitializeSlots` with respect to SizeOfSlot. > Leave the call to `SampleSlow` only in type erased InitializeSlots. > Update comments for Read4To8 and Read1To3. > PR #1819: fix compilation with AppleClang > Move SOO processing inside of InitializeSlots and move it once. > PR #1816: Random: use getauxval() via <sys/auxv.h> > Optimize `InitControlBytesAfterSoo` to have less writes and make them with compile time known size. > Remove stray plus operator in cleanup_internal::Storage > Include <cerrno> to fix compilation error in chromium build. > Adjust internal logging namespacing for consistency s/ABSL_LOGGING_INTERNAL_/ABSL_LOG_INTERNAL_/ > Rewrite LOG_EVERY_N (et al) docs to clarify that the first instance is logged. Also, deliberately avoid giving exact numbers or examples since IRL behavior is not so exact. > ABSL_ASSUME: Use a ternary operator instead of do-while in the implementations that use a branch marked unreachable so that it is usable in more contexts. > Simplify the comment for raw_hash_set::erase. > Remove preprocessors for now unsupported compilers. > `absl::ScopedMockLog`: Explicitly document that it captures logs emitted by all threads > Fix potential integer overflow in hash container create/resize > Add lifetimebound to StripPrefix/StripSuffix. > Random: Rollforward support runtime dispatch on AArch64 macOS > Crc: Only test non_temporal_store_memcpy_avx on AVX targets > Provide information about types of all flags. > Deprecate the precomputed hash find() API in swisstable. > Import of CCTZ from GitHub. > Adjust whitespace > Expand documentation for absl::raw_hash_set::erase to include idiom example of iterator post-increment. > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > Crc: Remove the __builtin_cpu_supports path for SupportsArmCRC32PMULL > Use absl::NoDestructor for some absl::Mutex instances in the flags library to prevent some exit-time destructor warnings > Update the WORKSPACE dependency of rules_cc to 0.1.0 > Rollback support runtime dispatch on AArch64 macOS for breaking some builds > Downgrade to rules_cc 0.0.17 because 0.1.0 was yanked > Use unused set in testing. > Random: Support runtime dispatch on AArch64 macOS > crc: Use absl::nullopt when returning absl::optional > Annotate absl::FixedArray to warn when unused. > PR #1806: Fix undefined symbol: __android_log_write > Move ABSL_HAVE_PTHREAD_CPU_NUMBER_NP to the file where it is needed > Use rbit instruction on ARM rather than rev. > Debugging: Report the CPU we are running on under Darwin > Add a microbenchmark for very long int/string tuples. > Crc: Detect support for pmull and crc instructions on Apple AArch64 With a newer clang, we can use __builtin_cpu_supports which caches all the feature bits. > Add special handling for hashing integral types so that we can optimize Read1To3 and Read4To8 for the strings case. > Use unused FixedArray instances. > Minor reformatting > Avoid flaky expectation in WaitDurationWoken test case in MSVC. > Use Bazel rules_cc for many compiler-specific rules instead of our custom ones from before the Bazel rules existed. > Mix pointers twice in absl::Hash. > New internal-use-only classes `AsStructuredLiteralImpl` and `AsStructuredValueImpl` > Annotate some Abseil container methods with [[clang::lifetime_capture_by(...)]] > Faster copy from inline Cords to inline Strings > Add new benchmark cases for hashing string lengths 1,2,4,8. > Move the Arm implementation of UnscaledCycleClock::Now() into the header file, like the x86 implementation, so it can be more easily inlined. > Minor include cleanup in absl/random/internal > Import of CCTZ from GitHub. > Use Bazel Platforms to support AES-NI compile options for Randen > In HashState::Create, require that T is a subclass of HashStateBase in order to discourage users from defining their own HashState types. > PR #1801: Remove unncessary <iostream> includes > New class StructuredProtoField > Mix pointers twice in TSan and MSVC to avoid flakes in the PointerAlignment test. > Add a test case that type-erased absl::HashState is consistent with absl::HashOf. > Mix pointers twice in build modes in which the PointerAlignment test is flaky if we mix once. > Increase threshold for stuck bits in PointerAlignment test on android. > Use hashing ideas from Carbon's hashtable in absl hashing: - Use byte swap instead of mixing pointers twice. - Change order of branches to check for len<=8 first. - In len<=16 case, do one multiply to mix the data instead of using the logic from go/absl-hash-rl (reinforcement learning was used to optimize the instruction sequence). - Add special handling for len<=32 cases in 64-bit architectures. > Test that using a table that was moved-to from a moved-from table fails in sanitizer mode. > Remove a trailing comma causing an issue for an OSS user > Add missing includes in hash.h. > Use the public implementation rule for "@bazel_tools//tools/cpp:clang-cl" > Import of CCTZ from GitHub. > Change the definition of is_trivially_relocatable to be a bit less conservative. > Updates to CI to support newer versions of tools > Check if ABSL_HAVE_INTRINSIC_INT128 is defined > Print hash expansions in the hash_testing error messages. > Avoid flakiness in notification_test on MSVC. > Roll back: Add more debug capacity validation checks on moves. > Add more debug capacity validation checks on moves. > Add macro versions of nullability annotations. > Improve fork-safety by opening files with `O_CLOEXEC`. > Move ABSL_HARDENING_ASSERTs in constexpr methods to their own lines. > Add test cases for absl::Hash: - That hashes are consistent for the same int value across different int types. - That hashes of vectors of strings are unequal even when their concatenations are equal. - That FragmentedCord hashes works as intended for small Cords. > Skip the IterationOrderChangesOnRehash test case in ASan mode because it's flaky. > Add missing includes in absl hash. > Try to use file descriptors in the 2000+ range to avoid mis-behaving client interference. > Add weak implementation of the __lsan_is_turned_off in Leak Checker > Fix a bug where EOF resulted in infinite loop. > static_assert that absl::Time and absl::Duration are trivially destructible. > Move Duration ToInt64<unit> functions to be inline. > string_view: Add defaulted copy constructor and assignment > Use `#ifdef` to avoid errors when `-Wundef` is used. > Strip PAC bits for return addresses in stack traces > PR #1794: Update cpu_detect.cc fix hw crc32 and AES capability check, fix undefined > PR #1790: Respect the allocator's .destroy method in ~InlinedVector > Cast away nullability in the guts of CHECK_EQ (et al) where Clang doesn't see that the nullable string returned by Check_EQImpl is statically nonnull inside the loop. > string_view: Correct string_view(const char, size_type) docs > Add support for std::string_view in StrCat even when absl::string_view != std::string_view. > Misc. adjustments to unit tests for logging. > Use local_config_cc from rules_cc and make it a dev dependency > Add additional iteration order tests with reservation. Reserved tables have a different way of iteration randomization compared to gradually resized tables (at least for small tables). > Use all the bits (`popcount`) in `FindFirstNonFullAfterResize` and `PrepareInsertAfterSoo`. > Mark ConsumePrefix, ConsumeSuffix, StripPrefix, and StripSuffix as constexpr since they are all pure functions. > PR #1789: Add missing #ifdef pp directive to the TypeName() function in the layout.h > PR #1788: Fix warning for sign-conversion on riscv > Make StartsWith and EndsWith constexpr. > Simplify logic for growing single group table. > Document that absl::Time and absl::Duration are trivially destructible. > Change some C-arrays to std::array as this enables bounds checking in some hardened standard library builds > Replace outdated select() on --cpu with platform API equivalent. > Take failure_message as const char* instead of string_view in LogMessageFatal and friends. > Mention `c_any_of` in the function comment of `absl::c_linear_search`. > Import of CCTZ from GitHub. > Rewrite some string_view methods to avoid a -Wunreachable-code warning > IWYU: Update includes and fix minor spelling mistakes. > Add comment on how to get next element after using erase. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND and a doc note about absl::LogAsLiteral to clarify its intended use. > Import of CCTZ from GitHub. > Reduce memory consumption of structured logging proto encoding by passing tag value > Remove usage of _LIBCPP_HAS_NO_FILESYSTEM_LIBRARY. > Make Span's relational operators constexpr since C++20. > distributions: support a zero max value in Zipf. > PR #1786: Fix typo in test case. > absl/random: run clang-format. > Add some nullability annotations in logging and tidy up some NOLINTs and comments. > CMake: Change the default for ABSL_PROPAGATE_CXX_STD to ON > Delete UnvalidatedMockingBitGen > PR #1783: [riscv][debugging] Fix a few warnings in RISC-V inlines > Add conversion operator to std::array for StrSplit. > Add a comment explaining the extra comparison in raw_hash_set::operator==. Also add a small optimization to avoid the extra comparison in sets that use hash_default_eq as the key_equal functor. > Add benchmark for absl::HexStringToBytes > Avoid installing options.h with the other headers > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to absl::Span constructors. > Annotate absl::InlinedVector to warn when unused. > Make `c_find_first_of`'s `options` parameter a const reference to allow temporaries. > Disable Elf symbols for Xtensa > PR #1775: Support symbolize only on WINAPI_PARTITION_DESKTOP > Require through an internal presubmit that .h\|.cc\|.inc files contain either the string ABSL_NAMESPACE_BEGIN or SKIP_ABSL_INLINE_NAMESPACE_CHECK > Xtensa supports mmap, enable it in absl/base/config.h > PR #1777: Avoid std::ldexp in `operator double(int128)`. > Marks absl::Span as view and borrowed_range, like std::span. > Mark inline functions with only a simple comparison in strings/ascii.h as constexpr. > Add missing Abseil inline namespace and fix includes > Fix bug where the high bits of `__int128_t`/`__uint128_t` might go unused in the hash function. This fix increases the hash quality of these types. > Add a test to verify bit casting between signed and unsigned int128 works as expected > Add suggestions to enable sanitizers for asserts when doing so may be helpful. > Add nullability attributes to nullability type aliases. > Refactor swisstable moves. > Improve ABSL_ASSERT performance by guaranteeing it is optimized away under NDEBUG in C++20 > Mark Abseil hardening assert in AssertSameContainer as slow. > Add workaround for q++ 8.3.0 (QNX 7.1) compiler by making sure MaskedPointer is trivially copyable and copy constructible. > Small Mutex::Unlock optimization > Optimize `CEscape` and `CEscapeAndAppend` by up to 40%. > Fix the conditional compilation of non_temporal_store_memcpy_avx to verify that AVX can be forced via `gnu::target`. > Delete TODOs to move functors when moving hashtables and add a test that fails when we do so. > Fix benchmarks in `escaping_benchmark.cc` by properly calling `benchmark::DoNotOptimize` on both inputs and outputs and by removing the unnecessary and wrong `ABSL_RAW_CHECK` condition (`check != 0`) of `BM_ByteStringFromAscii_Fail` benchmark. > It seems like commit abc9b916a94ebbf251f0934048295a07ecdbf32a did not work as intended. > Fix a bug in `absl::SetVLogLevel` where a less generic pattern incorrectly removed a more generic one. > Remove the side effects between tests in vlog_is_on_test.cc > Attempt to fix flaky Abseil waiter/sleep tests > Add an explicit tag for non-SOO CommonFields (removing default ctor) and add a small optimization for early return in AssertNotDebugCapacity. > Make moved-from swisstables behave the same as empty tables. Note that we may change this in the future. > Tag tests that currently fail on darwin_arm64 with "no_test_darwin_arm64" > add gmock to cmake defs for no_destructor_test > Optimize raw_hash_set moves by allowing some members of CommonFields to be uninitialized when moved-from. > Add more debug capacity validation checks on iteration/size. > Add more debug capacity validation checks on copies. > constinit -> constexpr for DisplayUnits > LSC: Fix null safety issues diagnosed by Clang’s `-Wnonnull` and `-Wnullability`. > Remove the extraneous variable creation in Match(). > Import of CCTZ from GitHub. > Add more debug capacity validation checks on merge/swap. > Add `absl::` namespace to c_linear_search implementation in order to avoid ADL > Distinguish the debug message for the case of self-move-assigned swiss tables. > Update LowLevelHash comment regarding number of hash state variables. > Add an example for the `--vmodule` flag. > Remove first prefetch. > Add moved-from validation for the case of self-move-assignment. > Allow slow and fast abseil hardening checks to be enabled independently. > Update `ABSL_RETIRED_FLAG` comment to reflect `default_value` is no longer used. > Add validation against use of moved-from hash tables. > Provide file-scoped pragma behind macro ABSL_POINTERS_DEFAULT_NONNULL to indicate the default nullability. This is a no-op for now (not understood by checkers), but does communicate intention to human readers. > Add stacktrace config for android using the generic implementation > Fix nullability annotations in ABSL code. > Replace CHECKs with ASSERTs and EXPECTs -- no reason to crash on failure. > Remove ABSL_INTERNAL_ATTRIBUTE_OWNER and ABSL_INTERNAL_ATTRIBUTE_VIEW > Migrate ABSL_INTERNAL_ATTRIBUTE_OWNER and ABSL_INTERNAL_ATTRIBUTE_VIEW to ABSL_ATTRIBUTE_OWNER and ABSL_ATTRIBUTE_VIEW > Disable ABSL_ATTRIBUTE_OWNER and ABSL_ATTRIBUTE_VIEW prior to Clang-13 due to false positives. > Make ABSL_ATTRIBUTE_VIEW and ABSL_ATTRIBUTE_OWNER public > Optimize raw_hash_set::AssertHashEqConsistent a bit to avoid having as much runtime overhead. > PR #1728: Workaround broken compilation against NDK r25 > Add validation against use of destroyed hash tables. > Do not truncate `ABSL_RAW_LOG` output at null bytes > Use several unused cord instances in tests and benchmarks. > Add comments about ThreadIdentity struct allocation behavior. > Refactoring followup for reentrancy validation in swisstable. > Add debug mode checks that element constructors/destructors don't make reentrant calls to raw_hash_set member functions. > Add tagging for cc_tests that are incompatible with Fuchsia > Add GetTID() implementation for Fuchsia > PR #1738: Fix shell option group handling in pkgconfig files > Disable weak attribute when absl compiled as windows DLL > Remove `CharIterator::operator->`. > Mark non-modifying container algorithms as constexpr for C++20. > PR #1739: container/internal: Explicitly include <cstdint> > Don't match -Wnon-virtual-dtor in the "flags are needed to suppress warnings in headers". It should fall through to the "don't impose our warnings on others" case. Do this by matching on "-Wno-" instead of "-Wno". > PR #1732: Fix build on NVIDIA Jetson board. Fix #1665 > Update GoogleTest dependency to 1.15.2 > Enable AsciiStrToLower and AsciiStrToUpper overloads for rvalue references. > PR #1735: Avoid `int` to `bool` conversion warning > Add `absl::swap` functions for `_hash_` to avoid calling `std::swap` > Change internal visibility > Remove resolved issue. > Increase test timeouts to support running on Fuchsia emulators > Add tracing annotations to absl::Notification > Suppress compiler optimizations which may break container poisoning. > Disable ABSL_INTERNAL_HAVE_DEBUGGING_STACK_CONSUMPTION for Fuchsia > Add tracing annotations to absl::BlockingCounter > Add absl_vlog_is_on and vlog_is_on to ABSL_INTERNAL_DLL_TARGETS > Update swisstable swap API comments to no longer guarantee that we don't move/swap individual elements. > PR #1726: cmake: Fix RUNPATH when using BUILD_WITH_INSTALL_RPATH=True > Avoid unnecessary copying when upper-casing or lower-casing ASCII string_view > Add weak internal tracing API > Fix LINT.IfChange syntax > PR #1720: Fix spelling mistake: occurrance -> occurrence > Add missing include for Windows ASAN configuration in poison.cc > Delete absl/strings/internal/has_absl_stringify.h now that the GoogleTest version we depend on uses the public file > Update versions of dependencies in preparation for release > PR #1699: Add option to build with MSVC static runtime > Remove unneeded 'be' from comment. > PR #1715: Generate options.h using CMake only once > Small type fix in absl/log/internal/log_impl.h > PR #1709: Handle RPATH CMake configuration > PR #1710: fixup! PR #1707: Fixup absl_random compile breakage in Apple ARM64 targets > PR #1695: Fix time library build for Apple platforms > Remove cyclic cmake dependency that breaks in cmake 3.30.0 > Roll forward poisoned pointer API and fix portability issues. > Use GetStatus in IsOkAndHoldsMatcher > PR #1707: Fixup absl_random compile breakage in Apple ARM64 targets > PR #1706: Require CMake version 3.16 > Add an MSVC implementation of ABSL_ATTRIBUTE_LIFETIME_BOUND > Mark c_min_element, c_max_element, and c_minmax_element as constexpr in C++17. > Optimize the absl::GetFlag cost for most non built-in flag types (including string). > Encode some additional metadata when writing protobuf-encoded logs. > Replace signed integer overflow, since that's undefined behavior, with unsigned integer overflow. > Make mutable CompressedTuple::get() constexpr. > vdso_support: support DT_GNU_HASH > Make c_begin, c_end, and c_distance conditionally constexpr. > Add operator<=> comparison to absl::Time and absl::Duration. > Deprecate `ABSL_ATTRIBUTE_NORETURN` in favor of the `[[noreturn]]` standardized in C++11 > Rollback new poisoned pointer API > Static cast instead of reinterpret cast raw hash set slots as casting from void* to T* is well defined > Fix absl::NoDestructor documentation about its use as a global > Declare Rust demangling feature-complete. > Split demangle_internal into a tree of smaller libraries. > Decode Rust Punycode when it's not too long. > Add assertions to detect reentrance in `IterateOverFullSlots` and `absl::erase_if`. > Decoder for Rust-style Punycode encodings of bounded length. > Add `c_contains()` and `c_contains_subrange()` to `absl/algorithm/container.h`. > Three-way comparison spaceship <=> operators for Cord. > internal-only change > Remove erroneous preprocessor branch on SGX_SIM. > Add an internal API to get a poisoned pointer. > optimization.h: Add missing <utility> header for C++ > Add a compile test for headers that require C compatibility > Fix comment typo > Expand documentation for SetGlobalVLogLevel and SetVLogLevel. > Roll back 6f972e239f668fa29cab43d7968692cd285997a9 > PR #1692: Add missing `<utility>` include > Remove NOLINT for `#include <new>` for __cpp_lib_launder > Remove not used after all kAllowRemoveReentrance parameter from IterateOverFullSlots. > Create `absl::container_internal::c_for_each_fast` for SwissTable. > Disable flaky test cases in kernel_timeout_internal_test. > Document that swisstable and b-tree containers are not exception-safe. > Add `ABSL_NULLABILITY_COMPATIBLE` attribute. > LSC: Move expensive variables on their last use to avoid copies. > Add ABSL_INTERNAL_ATTRIBUTE_VIEW and ABSL_INTERNAL_ATTRIBUTE_OWNER attributes to more types in Abseil > Drop std:: qualification from integer types like uint64_t. > Increase slop time on MSVC in PerThreadSemTest.Timeouts again due to continued flakiness. > Turn on validation for out of bounds MockUniform in MockingBitGen > Use ABSL_UNREACHABLE() instead of equivalent > If so configured, report which part of a C++ mangled name didn't parse. > Sequence of 1-to-4 values with prefix sum to support Punycode decoding. > Add the missing inline namespace to the nullability files > Add ABSL_INTERNAL_ATTRIBUTE_VIEW and ABSL_INTERNAL_ATTRIBUTE_OWNER attributes to types in Abseil > Disallow reentrance removal in `absl::erase_if`. > Fix implicit conversion of temporary bitgen to BitGenRef > Use `IterateOverFullSlots` in `absl::erase_if` for hash table. > UTF-8 encoding library to support Rust Punycode decoding. > Disable negative NaN float ostream format checking on RISC-V > PR #1689: Minor: Add missing quotes in CMake string view library definition > Demangle template parameter object names, TA <template-arg>. > Demangle sr St <simple-id> <simple-id>, a dubious encoding found in the wild. > Try not to lose easy type combinators in S::operator const int() and the like. > Demangle fixed-width floating-point types, DF.... > Demangle _BitInt types DB..., DU.... > Demangle complex floating-point literals. > Demangle <extended-qualifier> in types, e.g., U5AS128 for address_space(128). > Demangle operator co_await (aw). > Demangle fully general vendor extended types (any <template-args>). > Demangle transaction-safety notations GTt and Dx. > Demangle C++11 user-defined literal operator functions. > Demangle C++20 constrained friend names, F (<source-name> \| <operator-name>). > Demangle dependent GNU vector extension types, Dv <expression> _ <type>. > Demangle elaborated type names, (Ts \| Tu \| Te) <name>. > Add validation that hash/eq functors are consistent, meaning that `eq(k1, k2) -> hash(k1) == hash(k2)`. > Demangle delete-expressions with the global-scope operator, gs (dl \| da) .... > Demangle new-expressions with braced-init-lists. > Demangle array new-expressions, [gs] na .... > Demangle object new-expressions, [gs] nw .... > Demangle preincrement and predecrement, pp_... and mm_.... > Demangle throw and rethrow (tw... and tr). > Remove redundant check of is_soo() while prefetching heap blocks. > Demangle ti... and te... expressions (typeid). > Demangle nx... syntax for noexcept(e) as an expression in a dependent signature. > Demangle alignof expressions, at... and az.... > Demangle C++17 structured bindings, DC...E. > Demangle modern _ZGR..._ symbols. > Remove redundant check of is_soo() while prefetching heap blocks. > Demangle sizeof...(pack captured from an alias template), sP ... E. > Demangle types nested under vendor extended types. > Demangle il ... E syntax (braced list other than direct-list-initialization). > Avoid signed overflow for Ed <number> _ manglings with large <number>s. > Remove redundant check of is_soo() while prefetching heap blocks. > Remove obsolete TODO > Clarify function comment for `erase` by stating that this idiom only works for "some" standard containers. > Move SOVERSION to global CMakeLists, apply SOVERSION to DLL > Set ABSL_HAVE_THREAD_LOCAL to 1 on all platforms > Demangle constrained auto types (Dk <type-constraint>). > Parse <discriminator> more accurately. > Demangle lambdas in class member functions' default arguments. > Demangle unofficial <unresolved-qualifier-level> encodings like S0_IT_E. > Do not make std::filesystem::path hash available for macOS <10.15 > Include flags in DLL build (non-Windows only) > Enable building monolithic shared library on macOS and Linux. > Demangle Clang's last-resort notation _SUBSTPACK_. > Demangle C++ requires-expressions with parameters (rQ ... E). > Demangle Clang's encoding of __attribute__((enable_if(condition, "message"))). > Demangle static_cast and friends. > Demangle decltype(expr)::nested_type (NDT...E). > Optimize GrowIntoSingleGroupShuffleControlBytes. > Demangle C++17 fold-expressions. > Demangle thread_local helper functions. > Demangle lambdas with explicit template arguments (UlTy and similar forms). > Demangle &-qualified function types. > Demangle valueless literals LDnE (nullptr) and LA<number>_<type>E ("foo"). > Correctly demangle the <unresolved-name> at the end of dt and pt (x.y, x->y). > Add missing targets to ABSL_INTERNAL_DLL_TARGETS > Build abseil_test_dll with ABSL_BUILD_TESTING > Demangle C++ requires-expressions without parameters (rq ... E). > overload: make the constructor constexpr > Update Abseil CI Docker image to use Clang 19, GCC 14, and CMake 3.29.3 > Workaround symbol resolution bug in Clang 19 > Workaround bogus GCC14 -Wmaybe-uninitialized warning > Silence a bogus GCC14 -Warray-bounds warning > Forbid absl::Uniform<absl::int128>(gen) > Use IN_LIST to replace list(FIND) + > -1 > Recognize C++ vendor extended expressions (e.g., u9__is_same...E). > `overload_test`: Remove a few unnecessary trailing return types > Demangle the C++ this pointer (fpT). > Stop eating an extra E in ParseTemplateArg for some L<type><value>E literals. > Add ABSL_INTERNAL_ATTRIBUTE_VIEW and ABSL_INTERNAL_ATTRIBUTE_OWNER attributes to Abseil. > Demangle C++ direct-list-initialization (T{1, 2, 3}, tl ... E). > Demangle the C++ spaceship operator (ss, operator<=>). > Demangle C++ sZ encodings (sizeof...(pack)). > Demangle C++ so ... E encodings (typically array-to-pointer decay). > Recognize dyn-trait-type in Rust demangling. > Rework casting in raw_hash_set's IsFull(). > Remove test references to absl::SharedBitGen, which was never part of the open source release. This was only used in tests that never ran as part in the open source release. > Recognize fn-type and lifetimes in Rust demangling. > Support int128/uint128 in validated MockingBitGen > Recognize inherent-impl and trait-impl in Rust demangling. > Recognize const and array-type in Rust mangled names. > Remove Asylo from absl. > Recognize generic arguments containing only types in Rust mangled names. > Fix missing #include <random> for std::uniform_int_distribution > Move `prepare_insert` out of the line as type erased `PrepareInsertNonSoo`. > Revert: Add -Wdead-code-aggressive to ABSL_LLVM_FLAGS > Add (unused) validation to absl::MockingBitGen > Support `AbslStringify` with `DCHECK_EQ`. > PR #1672: Optimize StrJoin with tuple without user defined formatter > Give ReturnAddresses and N<uppercase> namespaces separate stacks for clarity. > Demangle Rust backrefs. > Use Nt for struct and trait names in Rust demangler test inputs. > Allow __cxa_demangle on MIPS > Add a `string_view` overload to `absl::StrJoin` > Demangle Rust's Y<type><path> production for passably simple <type>s. > `convert_test`: Delete obsolete condition around ASSERT_EQ in TestWithMultipleFormatsHelper > `any_invocable`: Clean up #includes > Resynchronize absl/functional/CMakeLists.txt with BUILD.bazel > `any_invocable`: Add public documentation for undefined behavior when invoking an empty AnyInvocable > `any_invocable`: Delete obsolete reference to proposed standard type > PR #1662: Replace shift with addition in crc multiply > Doc fix. > `convert_test`: Extract loop over tested floats from helper function > Recognize some simple Rust mangled names in Demangle. > Use __builtin_ctzg and __builtin_clzg in the implementations of CountTrailingZeroesNonzero16 and CountLeadingZeroes16 when they are available. > Remove the forked absl::Status matchers implementation in statusor_test > Add comment hack to fix copybara reversibility > Add GoogleTest matchers for absl::Status > [random] LogUniform: Document as a discrete distribution > Enable Cord tests with Crc. > Fix order of qualifiers in `absl::AnyInvocable` documentation. > Guard against null pointer dereference in DumpNode. > Apply ABSL_MUST_USE_RESULT to try lock functions. > Add public aliases for default hash/eq types in hash-based containers > Import of CCTZ from GitHub. > Remove the hand-rolled CordLeaker and replace with absl::NoDestructor to test the after-exit behavior > `convert_test`: Delete obsolete `skip_verify` parameter in test helper > overload: allow using the underlying type with CTAD directly. > PR #1653: Remove unnecessary casts when calling CRC32_u64 > PR #1652: Avoid C++23 deprecation warnings from float_denorm_style > Minor cleanup for `absl::Cord` > PR #1651: Implement ABSL_INTERNAL_DISABLE_DEPRECATED_DECLARATION_WARNING for MSVC compiler > Add `operator<=>` support to `absl::int128` and `absl::uint128` > [absl] Re-use the existing `std::type_identity` backfill instead of redefining it again > Add `absl::AppendCordToString` > `str_format/convert_test`: Delete workaround for [glibc bug](https://sourceware.org/bugzilla/show_bug.cgi?id=22142) > `absl/log/internal`: Document conditional ABSL_ATTRIBUTE_UNUSED, add C++17 TODO > `log/internal/check_op`: Add ABSL_ATTRIBUTE_UNUSED to CHECK macros when STRIP_LOG is enabled > log_benchmark: Add VLOG_IS_ON benchmark > Restore string_view detection check > Remove an unnecessary ABSL_ATTRIBUTE_UNUSED from a logging macro < Abseil LTS Branch, Jan 2024, Patch 2 (#1650) > In example code, add missing template parameter. > Optimize crc32 V128_From2x64 on Arm > Annotate that Mutex should warn when unused. > Add ABSL_ATTRIBUTE_LIFETIME_BOUND to Cord::Flatten/TryFlat > Deprecate `absl::exchange`, `absl::forward` and `absl::move`, which were only useful before C++14. > Temporarily revert dangling std::string_view detection until dependent is fixed > Use _decimal_ literals for the CivilDay example. > Fix bug in BM_EraseIf. > Add internal traits to absl::string_view for lifetimebound detection > Add internal traits to absl::StatusOr for lifetimebound detection > Add internal traits to absl::Span for lifetimebound detection > Add missing dependency for log test build target > Add internal traits for lifetimebound detection > Use local decoding buffer in HexStringToBytes > Only check if the frame pointer is inside a signal stack with known bounds > Roll forward: enable small object optimization in swisstable. > Optimize LowLevelHash by breaking dependency between final loads and previous len/ptr updates. > Fix the wrong link. > Optimize InsertMiss for tables without kDeleted slots. > Use GrowthInfo without applying any optimizations based on it. > Disable small object optimization while debugging some failing tests. > Adjust conditonal compilation in non_temporal_memcpy.h > Reformat log/internal/BUILD > Remove deprecated errno constants from the absl::Status mapping > Introduce GrowthInfo with tests, but without usage. > Enable small object optimization in swisstable. > Refactor the GCC unintialized memory warning suppression in raw_hash_set.h. > Respect `NDEBUG_SANITIZER` > Revert integer-to-string conversion optimizations pending more thorough analysis > Fix a bug in `Cord::{Append,Prepend}(CordBuffer)`: call `MaybeRemoveEmptyCrcNode()`. Otherwise appending a `CordBuffer` an empty Cord with a CRC node crashes (`RemoveCrcNode()` which increases the refcount of a nullptr child). > Add `BM_EraseIf` benchmark. > Record sizeof(key_type), sizeof(value_type) in hashtable profiles. > Fix ClangTidy warnings in btree.h. > LSC: Move expensive variables on their last use to avoid copies. > PR #1644: unscaledcycleclock: remove RISC-V support > Reland: Make DLOG(FATAL) not understood as [[noreturn]] > Separate out absl::StatusOr constraints into statusor_internal.h > Use Layout::WithStaticSizes in btree. > `layout`: Delete outdated comments about ElementType alias not being used because of MSVC > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > `layout_benchmark`: Replace leftover comment with intended call to MyAlign > Remove absl::aligned_storage_t > Delete ABSL_ANNOTATE_MEMORY_IS_INITIALIZED under Thread Sanitizer > Remove vestigial variables in the DumpNode() helper in absl::Cord > Do hashtablez sampling on the first insertion into an empty SOO hashtable. > Add explicit #include directives for <tuple>, "absl/base/config.h", and "absl/strings/string_view.h". > Add a note about the cost of `VLOG` in non-debug builds. > Fix flaky test failures on MSVC. > Add template keyword to example comment for Layout::WithStaticSizes. > PR #1643: add xcprivacy to all subspecs > Record sampling stride in cord profiling to facilitate unsampling. > Fix a typo in a comment. > [log] Correct SetVLOGLevel to SetVLogLevel in comments > Add a feature to container_internal::Layout that lets you specify some array sizes at compile-time as template parameters. This can make offset and size calculations faster. > `layout`: Mark parameter of Slices with ABSL_ATTRIBUTE_UNUSED, remove old workaround > `layout`: Use auto return type for functions that explicitly instantiate std::tuple in return statements > Remove redundant semicolons introduced by macros > [log] Make :vlog_is_on/:absl_vlog_is_on public in BUILD.bazel > Add additional checks for size_t overflows > Replace //visibility:private with :__pkg__ for certain targets > PR #1603: Disable -Wnon-virtual-dtor warning for CommandLineFlag implementations > Add several missing includes in crc/internal > Roll back extern template instatiations in swisstable due to binary size increases in shared libraries. > Add nodiscard to SpinLockHolder. > Test that rehash(0) reduces capacity to minimum. > Add extern templates for common swisstable types. > Disable ubsan for benign unaligned access in crc_memcpy > Make swisstable SOO support GDB pretty printing and still compile in OSS. > Fix OSX support with CocoaPods and Xcode 15 > Fix GCC7 C++17 build > Use UnixEpoch and ZeroDuration > Make flaky failures much less likely in BasicMocking.MocksNotTriggeredForIncorrectTypes test. > Delete a stray comment > Move GCC uninitialized memory warning suppression into MaybeInitializedPtr. > Replace usages of absl::move, absl::forward, and absl::exchange with their std:: equivalents > Fix the move to itself > Work around an implicit conversion signedness compiler warning > Avoid MSan: use-of-uninitialized-value error in find_non_soo. > Fix flaky MSVC test failures by using longer slop time. > Add ABSL_ATTRIBUTE_UNUSED to variables used in an ABSL_ASSUME. > Implement small object optimization in swisstable - disabled for now. > Document and test ability to use absl::Overload with generic lambdas. > Extract `InsertPosition` function to be able to reuse it. > Increase GraphCycles::PointerMap size > PR #1632: inlined_vector: Use trivial relocation for `erase` > Create `BM_GroupPortable_Match`. > [absl] Mark `absl::NoDestructor` methods with `absl::Nonnull` as appropriate > Automated Code Change > Rework casting in raw_hash_set's `IsFull()`. > Adds ABSL_ATTRIBUTE_LIFETIME_BOUND to absl::BitGenRef > Workaround for NVIDIA C++ compiler being unable to parse variadic expansions in range of range-based for loop > Rollback: Make DLOG(FATAL) not understood as [[noreturn]] > Make DLOG(FATAL) not understood as [[noreturn]] > Optimize `absl::Duration` division and modulo: Avoid repeated redundant comparisons in `IDivFastPath`. > Optimize `absl::Duration` division and modulo: Allow the compiler to inline `time_internal::IDivDuration`, by splitting the slow path to a separate function. > Fix typo in example code snippet. > Automated Code Change > Add braces for conditional statements in raw_hash_map functions. > Optimize `prepare_insert`, when resize happens. It removes single unnecessary probing before resize that is beneficial for small tables the most. > Add noexcept to move assignment operator and swap function > Import of CCTZ from GitHub. > Minor documentation updates. > Change find_or_prepare_insert to return std::pair<iterator, bool> to match return type of insert. > PR #1618: inlined_vector: Use trivial relocation for `SwapInlinedElements` > Improve raw_hash_set tests. > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > Use const_cast to avoid duplicating the implementation of raw_hash_set::find(key). > Import of CCTZ from GitHub. > Performance improvement for absl::AsciiStrToUpper() and absl::AsciiStrToLower() > Annotate that SpinLock should warn when unused. > PR #1625: absl::is_trivially_relocatable now respects assignment operators > Introduce `Group::MaskNonFull` without usage. > `demangle`: Parse template template and C++20 lambda template param substitutions > PR #1617: fix MSVC 32-bit build with -arch:AVX > Minor documentation fix for `absl::StrSplit()` > Prevent overflow in `absl::CEscape()` > `demangle`: Parse optional single template argument for built-in types > PR #1412: Filter out `-Xarch_` flags from pkg-config files > `demangle`: Add complexity guard to `ParseQRequiresExpr` < Prepare 20240116.1 patch for Apple Privacy Manifest (#1623) > Remove deprecated symbol absl::kuint128max > Add ABSL_ATTRIBUTE_WARN_UNUSED. > `demangle`: Parse `requires` clauses on template params, before function return type > On Apple, implement absl::is_trivially_relocatable with the fallback. > `demangle`: Parse `requires` clauses on functions > Make `begin()` to return `end()` on empty tables. > `demangle`: Parse C++20-compatible template param declarations, except those with `requires` expressions > Add the ABSL_DEPRECATE_AND_INLINE() macro > Span: Fixed comment referencing std::span as_writable_bytes() as as_mutable_bytes(). > Switch rank structs to be consistent with written guidance in go/ranked-overloads > Avoid hash computation and `Group::Match` in small tables copy and use `IterateOverFullSlots` for iterating for all tables. > Optimize `absl::Hash` by making `LowLevelHash` faster. > Add -Wdead-code-aggressive to ABSL_LLVM_FLAGS < Backport Apple Privacy Manifest (#1613) > Stop using `std::basic_string<uint8_t>` which relies on a non-standard generic `char_traits<>` implementation, recently removed from `libc++`. > Add absl_container_hash-based HashEq specialization > `demangle`: Implement parsing for simplest constrained template arguments > Roll forward 9d8588bfc4566531c4053b5001e2952308255f44 (which was rolled back in 146169f9ad357635b9cd988f976b38bcf83476e3) with fix. > Add a version of absl::HexStringToBytes() that returns a bool to validate that the input was actually valid hexadecimal data. > Enable StringLikeTest in hash_function_defaults_test > Fix a typo. > Minor changes to the BUILD file for absl/synchronization > Avoid static initializers in case of ABSL_FLAGS_STRIP_NAMES=1 > Rollback 9d8588bfc4566531c4053b5001e2952308255f44 for breaking the build > No public description > Decrease the precision of absl::Now in x86-64 debug builds > Optimize raw_hash_set destructor. > Add ABSL_ATTRIBUTE_UNINITIALIZED macros for use with clang and GCC's `uninitialized` > Optimize `Cord::Swap()` for missed compiler optimization in clang. > Type erased hash_slot_fn that depends only on key types (and hash function). > Replace `testonly = 1` with `testonly = True` in abseil BUILD files. > Avoid extra `& msbs` on every iteration over the mask for GroupPortableImpl. > Missing parenthesis. > Early return from destroy_slots for trivially destructible types in flat_hash_{}. > Avoid export of testonly target absl::test_allocator in CMake builds > Use absl::NoDestructor for cordz global queue. > Add empty WORKSPACE.bzlmod > Introduce `RawHashSetLayout` helper class. > Fix a corner case in SpyHashState for exact boundaries. > Add nullability annotations > Use absl::NoDestructor for global HashtablezSampler. > Always check if the new frame pointer is readable. > PR #1604: Add privacy manifest < Disable ABSL_ATTRIBUTE_TRIVIAL_ABI in open-source builds (#1606) > Remove code pieces for no longer supported GCC versions. > Disable ABSL_ATTRIBUTE_TRIVIAL_ABI in open-source builds > Prevent brace initialization of AlphaNum > Remove code pieces for no longer supported MSVC versions. > Added benchmarks for smaller size copy constructors. > Migrate empty CrcCordState to absl::NoDestructor. > Add protected copy ctor+assign to absl::LogSink, and clarify thread-safety requirements to apply to the interface methods. < Apply LTS transformations for 20240116 LTS branch (#1599) Closes scylladb/scylladb#28756	2026-04-08 12:19:54 +03:00
Liapkovich	4f17cc6d83	docs: add missing rack value for internode_compression parameter The rack option was fully implemented in the code but omitted from both docs/operating-scylla/admin.rst and conf/scylla.yaml comments. Closes scylladb/scylladb#29239	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	0ea76a468f	schema: Avoid copies in column_mapping::operator== In a multi-declarator declaration, the & ref-qualifier is part of each individual declarator, not the shared type specifier. So: const auto& a = x(), b = y(); declares 'a' as a reference but 'b' as a value, silently copying y(). The same applies to: const T& a = v[i], b = v[j]; Both operator== lines had this pattern, causing an unnecessary copy of the column vector and an unnecessary copy of each entry on every call. Fix by repeating & on the second declarator in both lines. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29213	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	b7c14c6d29	token_metadata: Clear _topology_change_info gently clear_gently() (introduced in `322aa2f8b5`) clears all token_metadata_impl members using co_await to avoid reactor stalls on large data structures. _topology_change_info (introduced in `10bf8c7901`) was added later and not included in clear_gently(). update_topology_change_info() already uses utils::clear_gently() when replacing the value, so it looks reasonable to apply the same pattern in clear_gently(). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29210	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	54fbbf0410	locator/tablets: Fix missing selector value in error messages Some on_internal_error() calls have the selector argument to a format string with no placeholder for it in the format string. "While at it", disambiguate selector type in the message text. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29208	2026-04-08 12:19:54 +03:00
Botond Dénes	418141ec08	Merge 'Drop create_dataset() helper from object_store tests' from Pavel Emelyanov There's only one test left that uses it, and it can be patched to use standard ks/cf creation helpers from pylib. This patch does so and drops the lengthy create_dataset() helper Tests improvements, no need to backport Closes scylladb/scylladb#29176 * github.com:scylladb/scylladb: test/backup: drop create_dataset helper test/backup: use new_test_keyspace in test_restore_primary_replica	2026-04-08 12:19:54 +03:00
Petr Gusev	1e3c8c5a87	test_mutation_schema_change: use tablets The enable_tablets(false) was added when LWT wasn't supported for tablets, now it's, so no need in this attribute are more. The test covers behavior which should work in similar way for both vnodes and tablets -> it doesn't seem it would benefit much from running it in both enable_tablets(true) and enable_tablets(false) modes. Closes scylladb/scylladb#29167	2026-04-08 12:19:54 +03:00
Pavel Emelyanov	7f854c0255	hints: Use shorter fault-injection overload In order to apply fsult-injected delay, there's the inject(duration) overload. Results in shorter code Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29168	2026-04-08 10:51:37 +03:00
Botond Dénes	aeefbda304	Merge 'Simplify and improve API descibe_ring code flow' from Pavel Emelyanov The endpoint in question has some places worth fixing, in particular - the keyspace parameter is not validated - the validated table name is resolved into table_id, but the id is unused - two ugly static helpers to stream obtained token ranges into json Improving the API code flow, not backporting Closes scylladb/scylladb#29154 * github.com:scylladb/scylladb: api: Inline describe_ring JSON handling storage_service: Make describe_ring_for_table() take table_id	2026-04-08 10:50:07 +03:00
Artsiom Mishuta	b1e9c0b867	test/pylib: add typed skip markers plugin Add skip_reason_plugin.py — a framework-agnostic pytest plugin that provides typed skip markers (skip_bug, skip_not_implemented, skip_slow, skip_env) so that the reason a test is skipped is machine-readable in JUnit XML and Allure reports. Bare untyped pytest.mark.skip now triggers a warning (to become an error after full migration). Runtime skips via skip() are also enriched by parsing the [type] prefix from the skip message. The plugin is a class (SkipReasonPlugin) that receives the concrete SkipType enum and an optional report_callback from conftest.py, keeping it decoupled from allure and project-specific types. Extract SkipType enum and convenience runtime skip wrappers (skip_bug, skip_env, etc.) into test/pylib/skip_types.py so callers only need a single import instead of importing both SkipType and skip() separately. conftest.py imports SkipType from the new module and registers the plugin instance unconditionally (for all test runners). New files: - test/pylib/skip_reason_plugin.py: core plugin — typed marker processing, bare-skip warnings, JUnit/Allure report enrichment (including runtime skip() parsing via _parse_skip_type helper) - test/pylib/skip_types.py: SkipType enum and convenience wrappers (skip_bug, skip_not_implemented, skip_slow, skip_env) - test/pylib_test/test_skip_reason_plugin.py: 17 pytester-based test functions (51 cases across 3 build modes) covering markers, warnings, reports, callbacks, and skip_mode interaction Infrastructure changes: - test/conftest.py: import SkipType from skip_types, register SkipReasonPlugin with allure report callback - test/pylib/runner.py: set SKIP_TYPE_KEY/SKIP_REASON_KEY stash keys for skip_mode so the report hook can enrich JUnit/Allure with skip_type=mode without longrepr parsing - test/pytest.ini: register typed marker definitions (required for --strict-markers even when plugin is not loaded) Migrated test files (representative samples): - test/cluster/test_tablet_repair_scheduler.py: skip -> skip_bug (#26844), skip -> skip_not_implemented - test/cqlpy/.../timestamp_test.py: skip -> skip_slow - test/cluster/dtest/schema_management_test.py: skip -> skip_not_implemented - test/cluster/test_change_replication_factor_1_to_0.py: skip -> skip_bug (#20282) - test/alternator/conftest.py: skip -> skip_env - test/alternator/test_https.py: use skip_env() wrapper Fixes SCYLLADB-79 Closes scylladb/scylladb#29235	2026-04-08 10:38:56 +03:00
Pavel Emelyanov	e0fa9ee332	Merge 'storage: implement sstable clone for object storage' from Ernest Zaslavsky This patch series implements `object_storage_base::clone`, which was previously a stub that aborted at runtime. Clone creates a copy of an sstable under a new generation and is used during compaction. The implementation uses server-side object copies (S3 CopyObject / GCS Objects: rewrite) and mirrors the filesystem clone semantics: TemporaryTOC is written first to mark the operation as in-progress, component objects are copied, and TemporaryTOC is removed to commit (unless the caller requested the destination be left unsealed). The first two patches fix pre-existing bugs in the underlying storage clients that were exposed by the new clone code path: - GCS `copy_object` used the wrong HTTP method (PUT instead of POST) and sent an invalid empty request body. - S3 `copy_object` silently ignored the abort_source parameter. 1. gcp_client: fix copy_object request method and body — Fix two bugs in the GCS rewrite API call. 2. s3_client: pass through abort_source in copy_object — Stop ignoring the abort_source parameter. 3. object_storage: add copy_object to object_storage_client — New interface method with S3 and GCS implementations. 4. storage: add make_object_name overload with generation — Helper for building destination object names with a different generation. 5. storage: make delete_object const — Needed by the const clone method. 6. storage: implement object_storage_base::clone — The actual clone implementation plus a copy_object wrapper. 7. test/boost: enable sstable clone tests for S3 and GCS — Re-enable the previously skipped tests. A test similar to `sstable_clone_leaving_unsealed_dest_sstable` was added to properly test the sealed/unsealed states for object storage. Works for both S3 and GCS. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1045 Prerequisite: https://github.com/scylladb/scylladb/pull/28790 No need to backport since this code targets future feature Closes scylladb/scylladb#29166 * github.com:scylladb/scylladb: compaction_test: enable sstable clone tests for S3 and GCS storage: implement object_storage_base::clone storage: make delete_object const in object_storage_base storage: add make_object_name overload with generation sstables: add get_format() accessor to sstable object_storage: add copy_object to object_storage_client s3_client: pass through abort_source in copy_object gcp_client: fix copy_object request method and body	2026-04-08 09:35:10 +03:00
Nadav Har'El	4eeb9f4120	lwt, vector: write to CDC when vector index is enabled. The vector-search feature introduced the somewhat confusing feature of enabling CDC without explicitly enabling CDC: When a vector index is enabled on a table, CDC is "enabled" for it even if the user didn't ask to enable CDC. For this, write-path code began to use a new cdc_enabled() function instead of checking schema.cdc_options.enabled() directly. This cdc_enabled() function checks if either this enabled() is true, or has_vector_index() is true. Unfortunately, LWT writes continued to use cdc_options.enabled() instead of the new cdc_enabled(). This means that if a vector index is used and a vector is written using an LWT write, the new value is not indexed. This patch fixes this bug. It also adds a regression test that fails before this patch and passes afterwards - the new test verifies that when a table has a vector index (but no explicit CDC enabled), the CDC log is updated both after regular writes and after successful LWT writes. This patch was also tested in the context of the upcoming vector-search- for-Alternator pull request, which has a test reproducing this bug (Alternator uses LWT frequently, so this is very important there). It will also be tested by the vector-store test suite ("validator"). Fixes SCYLLADB-1342 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29300	2026-04-08 07:55:05 +03:00
Marcin Maliszkiewicz	1bf3110adb	Merge 'test: add test_upgrade_preserves_ddl_audit_for_tables' from Andrzej Jackowski Verify that upgrading from 2025.1 to master does not silently drop DDL auditing for table-scoped audit configurations ([SCYLLADB-1155](https://scylladb.atlassian.net/browse/SCYLLADB-1155)). Test time in dev: 4s Refs: SCYLLADB-1155 Fixes: SCYLLADB-1305 No backport, test for bug on master [SCYLLADB-1155]: https://scylladb.atlassian.net/browse/SCYLLADB-1155?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29223 * github.com:scylladb/scylladb: test: add test_upgrade_preserves_ddl_audit_for_tables test: audit: split validate helper so callers need not pass audit_settings test: audit: declare manager attribute in AuditTester base class	2026-04-07 17:29:11 +02:00
Marcin Maliszkiewicz	895fdb6d29	Merge 'ldap: fix double-free of LDAPMessage in poll_results()' from Andrzej Jackowski In the unregistered-ID branch, ldap_msgfree() was called on a result already owned by an RAII ldap_msg_ptr, causing a double-free on scope exit. Remove the redundant manual free. Fixes: SCYLLADB-1344 Backport: 2026.1, 2025.4, 2025.1 - it's a memory corruption, with a one-line fix, so better backport it everywhere. Closes scylladb/scylladb#29302 * github.com:scylladb/scylladb: test: ldap: add regression test for double-free on unregistered message ID ldap: fix double-free of LDAPMessage in poll_results()	2026-04-07 17:27:43 +02:00
Ernest Zaslavsky	422f107122	compaction_test: enable sstable clone tests for S3 and GCS Now that object_storage_base::clone is implemented, remove the early-return skips and re-enable the sstable_clone_leaving_unsealed_dest_sstable tests for both S3 and GCS storage backends.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	7cd9bbb010	storage: implement object_storage_base::clone Implement the clone method for object_storage_base, which creates a copy of an sstable with a new generation using server-side object copies. Also add a const copy_object convenience wrapper, similar to the existing put_object and delete_object wrappers. A dedicated test for the new object storage clone path will be added in the following commit. The preexisting local-filesystem clone is already covered by the sstable_clone_leaving_unsealed_dest_sstable test.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	8fa82e6b6f	storage: make delete_object const in object_storage_base The method doesn't modify any member state. Making it const is needed for calling it from the const clone method.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	47387341bb	storage: add make_object_name overload with generation Add a make_object_name overload that accepts a target generation parameter for constructing object names with a generation different from the source sstable's own. Refactor the original make_object_name to delegate to the new overload, eliminating code duplication. This is needed by clone to build destination object names for the new generation.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	8bd891c6ed	sstables: add get_format() accessor to sstable Add a public get_format() accessor for the _format member, following the same pattern as the existing get_version(). This allows storage implementations to access the sstable format without reaching into private members, and is needed by the upcoming object_storage_base::clone to construct entry_descriptor for the sstables registry.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	3d23490615	object_storage: add copy_object to object_storage_client Add a copy_object method to the object_storage_client interface for server-side object copies, with implementations for both S3 and GCS wrappers. The S3 wrapper delegates to s3::client::copy_object. The GCS wrapper delegates to gcp::storage::client's cross-bucket copy_object overload. This is a prerequisite for implementing sstable clone on object storage.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	1702d6e6d4	s3_client: pass through abort_source in copy_object The abort_source parameter in s3::client::copy_object was ignored — the function accepted it but always passed nullptr to the underlying copy_s3_object. Forward it properly so callers can cancel in-progress copies.	2026-04-07 18:16:52 +03:00
Ernest Zaslavsky	bfdc1e5267	gcp_client: fix copy_object request method and body The GCP copy_object (rewrite API) had two bugs: 1. The request body was an empty string, but the GCP rewrite endpoint always parses it as JSON metadata. An empty string is not valid JSON, resulting in 400 "Metadata in the request couldn't decode". Fix: send "{}" (empty JSON object) as the body. 2. The HTTP method was PUT, but the GCP Objects: rewrite API requires POST per the documentation. Fix: use POST. Test coverage in a follow-up patch	2026-04-07 18:16:52 +03:00
Nadav Har'El	a0e79f391f	Merge 'alternator: fix batch write item squashing cdc entries' from Radosław Cybulski When `BatchWriteItem` operates on multiple items sharing the same partition key in `always_use_lwt` write isolation mode, all CDC log entries are emitted under a single timestamp. The previous `get_records` parsing algorithm in `alternator/streams.cc` assumed that all CDC log entries sharing the same timestamp correspond to a single DynamoDB item change. As a result, it would incorrectly squash multiple distinct item changes into a single Streams record — producing wrong event data (e.g., one INSERT instead of four, with mismatched key/attribute values). Note: the bug is specific to `always_use_lwt` mode because only in LWT mode does the entire batch share a single timestamp. In non-LWT modes, each item in the batch receives a separate timestamp, so the entries naturally stay separate. Commit 1: alternator: add BatchWriteItem Streams test - Adds new tests `test_streams_batchwrite_no_clustering_deletes_non_existing_items` and `test_streams_batchwrite_no_clustering_deletes_existing_items` that cover the corner cases of batch-deleting a existing and non-existing item in a table without a clustering key. CDC tables without clustering keys are handled differently, and this path was previously untested for delete operations. - Adds a new test `test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data`, that is a simple way to trigger a bug. - Adds a new test `test_streams_batchwrite_into_the_same_partition_deletes_existing_items`, that validates various combinations of puts and deletes in a single BatchWrite against the same partition. - Adds a new `test_table_ss_new_and_old_images_write_isolation_always` fixture and extends `create_table_ss` to accept `additional_tags`, enabling tests with a specific write isolation mode. Commit 2: alternator: fix BatchWriteItem squashed Streams entries The core fix rewrites the CDC log entry parsing in `get_records` to distinguish items by their clustering key: - Introduces `managed_bytes_ptr_hash` and `managed_bytes_ptr_equal` helper structs for pointer-based hash map lookups on `managed_bytes`. - Replaces the single `record`/`dynamodb` pair with a `std::unordered_map<const managed_bytes, Record, ...>` (`records_map`) keyed by the base table's clustering key value from each CDC log row. For tables without a clustering key, all entries map to a single sentinel key. - Adds a validation that Alternator tables have at most one clustering key column (as required by the DynamoDB data model). - On end-of-record (`eor`), flushes all accumulated per-clustering-key records into the output, each with a unique `eventID` (the `event_id` format now includes an index suffix). - Adjusts the limit check: since a single CDC timestamp bucket can now produce multiple output records, the limit may be slightly exceeded to avoid breaking mid-batch. Fixes #28439 Fixes: SCYLLADB-540 Closes scylladb/scylladb#28452 github.com:scylladb/scylladb: alternator/test: explain why 'always' write isolation mode is used in tests alternator/test: add scylla_only to always write isolation fixture alternator: fix BatchWriteItem squashed Streams entries alternator: add BatchWriteItem test (failing)	2026-04-07 17:49:23 +03:00
Nadav Har'El	22e7ef46a7	Merge 'vector_search: fix SELECT on local vector index' from Karol Nowacki Queries against local vector indexes were failing with the error: ```ANN ordering by vector requires the column to be indexed using 'vector_index'``` This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895 Backport to 2026.1 is required as this issue occurs also on this branch. Closes scylladb/scylladb#28862 * github.com:scylladb/scylladb: index: fix DESC INDEX for vector index vector_search: test: refactor boilerplate setup vector_search: fix SELECT on local vector index index: test: vector index target option serialization test index: test: secondary index target option serialization test	2026-04-07 17:43:35 +03:00
Michał Jadwiszczak	9cf94116c2	db/view/view_building_worker: fix indentation	2026-04-07 16:12:04 +02:00
Michał Jadwiszczak	c9aa5bb09c	db/view/view_building_worker: lock staging sstables mutex for necessary shards when creating tasks To create `process_staging` view building tasks, we firstly need to collect informations about them on shard0, create necessary mutations, commit them to group0 and move staging sstables objects to their original shards. But there is a possible race after committing the group0 command and before moving the staging sstables to their shards. Between those two events, the coordinator may schedule freshly created tasks and dispatch them to the worker but the worker won't have the sstables objects because they weren't moved yet. This patch fixes the race by holding `_staging_sstables_mutex` locks from necessary shards when executing `create_staging_sstable_tasks()`. With this, even if the task will be scheduled and dispatched quickly, the worker will wait with executing it until the sstables objects are moved and the locks are released. Fixes SCYLLADB-816	2026-04-07 16:11:45 +02:00
Pavel Emelyanov	58e59e8c0d	Merge 'test: add test_sstable_clone_preserves_staging_state' from Benny Halevy Add a test that verifies filesystem_storage::clone preserves the sstable state: an sstable in staging is cloned to a new generation, the clone is re-loaded from the staging directory, and its state is asserted to still be staging. The change proves that https://scylladb.atlassian.net/browse/SCYLLADB-1205 is invalid, and can be closed. * No functional change and no backport needed Closes scylladb/scylladb#29209 * github.com:scylladb/scylladb: test: add test_sstable_clone_preserves_staging_state test: derive sstable state from directory in test_env::make_sstable sstables: log debug message in filesystem_storage::clone	2026-04-07 17:02:04 +03:00
Botond Dénes	816f2bf163	Merge 'cql3: fix null handling in data_value formatting' from Dario Mirovic `data_value::to_parsable_string()` crashes with a null pointer dereference when called on a `null` data_value. Return `"null"` instead. Added tests after the fix. Manually checked that tests fail without the fix. Fixes SCYLLADB-1350 This is a fix that prevents format crash. No known occurrence in production, but backport is desirable. Closes scylladb/scylladb#29262 * github.com:scylladb/scylladb: test: boost: test null data value to_parsable_string cql3: fix null handling in data_value formatting	2026-04-07 16:35:31 +03:00
Dimitrios Symonidis	701808d7aa	test/object_store: parametrize test_basic over replication factor Extend test_basic to run with both RF=1 and RF=3 to verify that object storage works correctly with multiple replicas. The test now starts one server per replica (each on its own rack), flushes all nodes, validates tablet replica counts for RF>1, and restarts all servers before verifying data is still readable. Fixes: SCYLLADB-546 Closes scylladb/scylladb#28583	2026-04-07 16:27:44 +03:00
Nadav Har'El	f642db0693	test/alternator: tests for missing support of ReturnConsumedCapacity As noted in issue #5027 and issue #29138, Alternator's support for ReturnConsumedCapacity is lacking in a two areas: 1. While ReturnConsumedCapacity is supported for most relevant operations, it's not supported in two operations: Query and Scan. 2. While ReturnConsumedCapacity=TOTAL is supported, INDEXES is not supported at all. This patch adds extensive tests for all these cases. All these tests pass on DynamoDB but fail on Alternator, so are marked with "xfail". The tests for ReturnConsumedCapacity=INDEXES are deliberately split into two: First, we test the case where the table has no indexes, so INDEXES is almost the same as TOTAL and should be very easy to implement. A second test checks the cases where there are indexes, and different operations increment the capacity of the base table and/or indexes differently - it will require significantly more work to make the second test pass. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29188	2026-04-07 16:07:41 +03:00
Nadav Har'El	f590ee2b7e	cdc, vector: fix CDC result tracker for vector indexes When a table has a vector index, cdc::cdc_enabled() returns true because vector index writes are implemented via the CDC augmentation path. However, register_cdc_operation_result_tracker() was checking only cdc_options().enabled(), which is false for tables that have a vector index but not traditional CDC. As a result, the operation_result_tracker was never attached to write response handlers for vector-indexed tables. This tracker was added in commit `1b92cbe`, and its job is to update metrics of CDC operations, and since vector search really does use CDC under the hood, these metrics could be useful when diagnosing problems. Fix by using cdc::cdc_enabled() instead of cdc_options().enabled(), which covers both traditional CDC and vector-indexed tables. Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#29343	2026-04-07 15:54:51 +03:00
Avi Kivity	8c629d55b0	test: vector_search: check [[nodiscard]] return values of expected<> types Clang 22 verifies [[nodiscard]] for co_await, causing compilation failures where return values of expected<> were silently discarded. These call sites were discarding the return value of client::request() and vector_store_client::ann(), both of which return expected<> types marked [[nodiscard]]. Rather than suppressing the warning with (void) casts, properly check the return values using the established test patterns: BOOST_CHECK(result) where the call is expected to succeed, and BOOST_CHECK(!result) where the call is expected to fail. Closes scylladb/scylladb#29297	2026-04-07 15:25:08 +03:00
Anna Stuchlik	176f6fb59e	doc: add the 2026.x patch release upgrade guide-from-2025 This issue adds the upgrade guide for all patch releases within 2026.x major release. In addition, it fixes the link to Upgrade Policy in the 2025.x-to-2026.1 upgrade guide. Fixes SCYLLADB-1247 Closes scylladb/scylladb#29307	2026-04-07 13:52:16 +02:00
Anna Stuchlik	d329c91f9e	doc: remove About Upgrade and redirect to Upgrade Policy While fixing https://github.com/scylladb/scylladb/issues/28997, we added a new page about upgrade policy: https://docs.scylladb.com/stable/versioning/upgrade-policy.html This commit removes the old page and adds redirections to the new Upgrade Policy page in the unversioned documentation set. Closes scylladb/scylladb#29251	2026-04-07 13:44:10 +02:00
Andrei Chekun	93583bf193	test.py: use safe_drive_shutdown in the tests These methods for closing driver was missed during original fix. Fixes: SCYLLADB-900 Closes scylladb/scylladb#29093	2026-04-07 14:35:18 +03:00
Avi Kivity	00409b61f1	Merge 'Add Vnodes to Tablets Migration Procedure' from Nikos Dragazis This PR introduces the vnodes-to-tablets migration procedure, which enables converting an existing vnode-based keyspace to tablets. The migration is implemented as a manual, operator-driven process executed in several stages. The core idea is to first create tablet maps with the same token boundaries and replica hosts as the vnodes, and then incrementally convert the storage of each node to the tablets layout. At a high level, the procedure is the following: 1. Create tablet maps for all tables in the keyspace. 2. Sequentially upgrade all nodes from vnodes to tablets: 1. Mark a node for upgrade in the topology state. 2. Restart the node. During startup, while the node is offline, it reshards the SSTables on vnode boundaries and switches to a tablet ERM. 3. Wait for the node to return online before proceeding to the next node. 4. Finalize the migration: 1. Update the keyspace schema to mark it as tablet-based. 2. Clear the group0 state related to the migration. From the client's perspective, the migration is online; the cluster can still serve requests on that keyspace, although performance may be temporarily degraded. During the migration, some nodes use vnode ERMs while others use tablet ERMs. Cluster-level algorithms such as load balancing will treat the keyspace's tables as vnode-based. Once migration is finalized, the keyspace is permanently switched to tablets and cannot be reverted back to vnodes. However, a rollback procedure is available before finalization. The patch series consists of: * Load balancer adjustments to ignore tablets belonging to a migrating keyspace. * A new vnode-based resharding mode, where SSTables are segregated on vnode boundaries rather than with the static sharder. * A new per-node `intended_storage_mode` column in `system.topology`. Represents migration intent (whether migration should occur on restart) and direction. * Four new REST endpoints for driving the migration (start, node upgrade/downgrade, finalize, status), along with `nodetool` wrappers. The finalization is implemented as a global topology request. * Wiring of the migration process into the startup logic: the `distributed_loader` determines a migrating table's ERM flavor from the `intended_storage_mode` and the ERM flavor determines the `table_populator`'s resharding mode. Token metadata changes have been adjusted to preserve the ERM flavor. * Cluster tests for the migration process. Fixes SCYLLADB-722. Fixes SCYLLADB-723. Fixes SCYLLADB-725. Fixes SCYLLADB-779. Fixes SCYLLADB-948. New feature, no backport is needed. Closes scylladb/scylladb#29065 * github.com:scylladb/scylladb: docs: Add ops guide for vnodes-to-tablets migration test: cluster: Add test for migration of multiple keyspaces test: cluster: Add test for error conditions test: cluster: Add vnodes->tablets migration test (rollback) test: cluster: Add vnodes->tablets migration test (1 table, 3 nodes) test: cluster: Add vnodes->tablets migration test (1 table, 1 node) scylla-nodetool: Add migrate-to-tablets subcommand api: Add REST endpoint for vnode-to-tablet migration status api: Add REST endpoint for migration finalization topology_coordinator: Add `finalize_migration` request database: Construct migrating tables with tablet ERMs api: Add REST endpoint for upgrading nodes to tablets api: Add REST endpoint for starting vnodes-to-tablets migration topology_state_machine: Add intended_storage_mode to system.topology distributed_loader: Wire vnode-based resharding into table populator replica: Pick any compaction group for resharding compaction: resharding_compaction: add vnodes_resharding option storage_service: Preserve ERM flavor of migrating tables tablet_allocator: Exclude migrating tables from load balancing feature_service: Add vnodes_to_tablets_migrations feature	2026-04-07 14:32:22 +03:00
Łukasz Paszkowski	6f364fd3b7	db: fix system.size_estimates to aggregate sstable estimates across all shards The estimate() function in the size_estimates virtual reader only considered sstables local to the shard that happened to own the keyspace's partition key token. Since sstables are distributed across shards, this caused partition count estimates to be approximately 1/smp_count of the actual value. This bug has been present since the virtual reader was introduced in `225648780d`. Use db.container().map_reduce0() to aggregate sstable estimates across all shards. Each shard contributes its local count and estimated_histogram, which are then merged to produce the correct total. Also fix the `test_partitions_estimate_full_overlap` test which becomes flaky (xpassing ~1% of runs) because autocompaction could merge the two overlapping sstables before the size estimate was read. Wrap the test body in nodetool.no_autocompaction_context to prevent this race. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1179 Refs https://github.com/scylladb/scylladb/issues/9083 Closes scylladb/scylladb#29286	2026-04-07 14:13:26 +03:00
Piotr Smaron	7d449a307c	docs: remove old audit design doc As discussed with @ScyllaPiotr in https://github.com/scylladb/scylladb/pull/29232, the doc about to be removed is just: > Looking at history, I think this audit.md is a design doc: scylladb/scylla-enterprise@87a5c19, for which the feature has been implemented differently, eventually, and was created around the time when design docs, apparently, where stored within the repository itself. So for me it's some trash (sorry for strong language) that can be safely removed. Closes scylladb/scylladb#29316	2026-04-07 14:11:53 +03:00
Avi Kivity	8b4a91982b	cmake: add missing rolling_max_tracker_test and symmetric_key_test Added in `5b2a07b408` and `c596ae6eb1` without cmake integration. Closes scylladb/scylladb#29328	2026-04-07 14:09:00 +03:00
Avi Kivity	d01c9a425f	test: test_out_of_storage_prevention: fix invalid escape in regex Python warns that the sequence "\(" is an invalid escape and might be rejected in the future. Protect against that by using a raw string. Closes scylladb/scylladb#29334	2026-04-07 14:06:32 +03:00
Pavel Emelyanov	0ae781c008	Merge 'test: auth_test: coroutinize' from Avi Kivity Convert auth_test.cc to coroutines for improved readability. Each test is converted in its own commit. Some are trivial. Indentation is left broken in some commits to reduce the diff, then fixed up in the last commit. Code cleanup, so no backport. Closes scylladb/scylladb#29336 * github.com:scylladb/scylladb: auth_test: fix whitespace auth_test: coroutinize test_try_describe_schema_with_internals_and_passwords_as_anonymous_user auth_test: coroutinize test_try_login_after_creating_roles_with_hashed_password auth_test: coroutinize test_create_roles_with_hashed_password_and_log_in auth_test: coroutinize test_try_create_role_with_hashed_password_as_anonymous_user auth_test: coroutinize test_try_to_create_role_with_password_and_hashed_password auth_test: coroutinize test_try_to_create_role_with_hashed_password_and_password auth_test: coroutinize test_alter_with_workload_type auth_test: coroutinize test_alter_with_timeouts auth_test: coroutinize role_permissions_table_is_protected auth_test: coroutinize role_members_table_is_protected auth_test: coroutinize roles_table_is_protected auth_test: coroutinize test_password_authenticator_operations auth_test: coroutinize test_password_authenticator_attributes auth_test: coroutinize test_default_authenticator	2026-04-07 14:05:32 +03:00
Botond Dénes	513af59130	encryption: improve error message when KMS host is not configured When an SSTable was encrypted with a KMS host that is not present in scylla.yaml, the error thrown was: std::invalid_argument (No such host: <host-name>) This message is very obscure in general, and especially confusing when encountered while using the scylla-sstable tool: it gives no indication that the SSTable is encrypted, that a KMS host lookup is involved, or what the user needs to do to fix the problem. Replace it with a message that names the missing host and points directly to the relevant scylla.yaml section: Encryption host "<host-name>" is not defined in scylla.yaml. Make sure it is listed under the "kmip_hosts" section. The wording is intentionally kept neutral (not framed as an SSTable tool problem) because the same code path is exercised by production ScyllaDB when a node's configuration no longer contains a host referenced by an existing data file (e.g. after a config rollback or when restoring data from a different cluster). The production use-case takes precedence, but the message is equally actionable from the tool. Closes scylladb/scylladb#29228	2026-04-07 14:00:27 +03:00
Botond Dénes	7344c05494	scylla-gdb.py: fix small_vector.__len__() start - end will result in negative length, rejected by the python runtime. Use the correct end - start to calculate length. Closes scylladb/scylladb#29249	2026-04-07 13:57:21 +03:00
Botond Dénes	f71d2e78d8	tombstone_gc: don't use real-db for validation and determining default data_dictionary::database was converted to replica::database in two places, just to call find_keyspace(), then call get_replication_strategy() on the returned keyspace. This is not necessary, data_dictionary::database already has find_keyspace() and the returned data_dictionary::keyspace also has get_replication_strategy(). This patch removes a small layering violation but more importantly, it is necessary for the sstable tool to be able to load schemas from disk, when said schema has tombstone_gc props. Closes scylladb/scylladb#29279	2026-04-07 13:56:24 +03:00
Pavel Emelyanov	d6df5ef60a	Merge 'compaction_test: Make compaction tests backend‑agnostic and add S3/GCS support' from Ernest Zaslavsky This series updates the storage abstraction and extends the compaction tests to support object‑storage backends (S3 and GCS), while tightening several parts of the test environment. The changes include: - New exists/object_exists helpers across storage backends and clock fixes in the S3 client to make signature generation stable under test conditions. - A new get_storage_for_tests accessor and adjustments to the test environment to avoid premature teardown of the sstable registry. - Refactoring of compaction tests to remove direct sstable access, ensure proper schema setup, and avoid use of moved‑from objects. - Extraction of test_env‑based logic into reusable functions and addition of S3/GCS variants of the compaction tests. Not all tests were converted to be backend‑agnostic yet, and a few require further investigation before they can run cleanly against S3/GCS backends. These will be addressed in follow‑up work. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-704 however, followup is needed No backport needed since this change targeting future feature Closes scylladb/scylladb#28790 * github.com:scylladb/scylladb: compaction_test: fix formatting after previous patches compaction_test: add S3/GCS variations to tests compaction_test: extract test_env-based tests into functions compaction_test: replace file_exists with storage::exists compaction_test: initialize tables with schema via make_table_for_tests compaction_test: use sstable APIs to manipulate component files compaction_test: fix use-after-move issue sstable_utils: add `get_storage` and `open_file` helpers test_env: delay unplugging sstable registry storage: add `exists` method to storage abstraction s3_client: use lowres_system_clock for aws_sigv4 s3_client: add `object_exists` helper gcs_client: add `object_exists` helper	2026-04-07 13:53:48 +03:00
Piotr Dulikowski	4161273b4c	Merge 'view_building_worker: fix race during draining procedure' from Michał Jadwiszczak View building worker was breaking semaphores without holding their locks. This lead to races like SCYLLADB-844 and SCYLLADB-543, where a new batch was started after `view_building_worker::state` was cleared in the `drain()` process. This patch fix the race by: - taking a lock of the mutex before breaking it - distinguishing between `state::clear()`(can happen multiple times) and `state::drain()`(can be called only once during shutdown) - asserting that the state is not doing any new work after it was drained Fixes SCYLLADB-844 Fixes SCYLLADB-543 This PR should be backported to all versions containing view building coordinator (2025.4 and newer). Closes scylladb/scylladb#29303 * github.com:scylladb/scylladb: view_building_worker: extract starting a new batch to state's method view_building_worker: distinguish between state's `clear()` and `drain()` view_building_worker: lock mutexes before breaking them in `drain()` view_building_worker: execute drain() once	2026-04-07 12:13:51 +02:00
Avi Kivity	bc10e1a171	test: fix flaky test_login by not retrying authentication failures The fix for SCYLLADB-1373 (`b4f652b7c1`) changed get_session() to use the default timeout=30 for the retry loop in patient_*_cql_connection (previously timeout=0.1). This correctly allowed retrying transient NoHostAvailable errors during node startup, but introduced a new flakiness in test_login and other auth tests. The failure chain: 1. test_login connects with bad credentials (e.g. user="doesntexist") 2. get_session() calls patient_exclusive_cql_connection(), which calls retry_till_success() with bypassed_exception=NoHostAvailable 3. The first attempt correctly fails: the server rejects the credentials with AuthenticationFailed, wrapped in NoHostAvailable 4. retry_till_success() catches NoHostAvailable indiscriminately and retries, not distinguishing between transient errors (node not ready) and permanent errors (bad credentials) 5. A subsequent retry attempt times out (connect_timeout=5), producing OperationTimedOut wrapped in NoHostAvailable 6. After 30 seconds, the last NoHostAvailable is raised -- now wrapping OperationTimedOut instead of the original AuthenticationFailed 7. The assertion `isinstance(..., AuthenticationFailed)` fails With the old timeout=0.1, the deadline was already exceeded after the first attempt, so the original AuthenticationFailed propagated. Fix: Add a `should_retry` predicate parameter to retry_till_success() and use it in patient_cql_connection() and patient_exclusive_cql_connection() to immediately re-raise NoHostAvailable when it wraps AuthenticationFailed. Retrying authentication failures is never useful since the credentials won't change between attempts. Fixes: SCYLLADB-1382 Closes scylladb/scylladb#29348	2026-04-07 10:17:31 +03:00
Michał Jadwiszczak	51c164c8d2	view_building_worker: extract starting a new batch to state's method Following the previous commit, a new batch cannot be started if the state was already drained. This commit also adds a check that only one batch is running at a time.	2026-04-07 08:39:05 +02:00
Michał Jadwiszczak	639aa223f3	view_building_worker: distinguish between state's `clear()` and `drain()` While both of this methods do the same (abort current batch, clear data), we can clear the state multiple times during view_building_worker lifetime (for instance when processing base table is changed) but `view_building_worker::state::drain()` should be called only once and after this no other work on the state should be done.	2026-04-07 08:39:05 +02:00
Michał Jadwiszczak	7aea524f52	view_building_worker: lock mutexes before breaking them in `drain()` Not doing this may lead to races like SCYLLADB-844. If some consumer is holding a lock of a mutex and `drain()` is just braking the mutex without locking it beforehand, then the consumer may process its code which should be aborted. An example of the race is SCYLLADB-844, where `work_on_tasks()` is holding `_state._mutex` while it is broken by `drain()`. This causes a new batch is started after the `_state` is cleared.	2026-04-07 08:39:00 +02:00
Michał Jadwiszczak	91c7ac1fb2	view_building_worker: execute drain() once Future changes will require that the view building worker is drained only once per its lifetime.	2026-04-07 08:35:02 +02:00
Avi Kivity	b4f652b7c1	test: fix flaky test_create_ks_auth by removing bad retry timeout get_session() was passing timeout=0.1 to patient_exclusive_cql_connection and patient_cql_connection, leaving only 0.1 seconds for the retry loop in retry_till_success(). Since each connection attempt can take up to 5 seconds (connect_timeout=5), the retry loop effectively got only one attempt with no chance to retry on transient NoHostAvailable errors. Use the default timeout=30 seconds, consistent with all other callers. Fixes: SCYLLADB-1373 Closes scylladb/scylladb#29332	2026-04-05 19:13:15 +03:00
Avi Kivity	2f0d178510	auth_test: fix whitespace Fix over-indented lines inside do_with_mc lambda bodies introduced during coroutinization.	2026-04-05 18:28:23 +03:00
Avi Kivity	7a24da9e88	auth_test: coroutinize test_try_describe_schema_with_internals_and_passwords_as_anonymous_user Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	e1b52cf337	auth_test: coroutinize test_try_login_after_creating_roles_with_hashed_password Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	24d36ad459	auth_test: coroutinize test_create_roles_with_hashed_password_and_log_in Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	6f20129eec	auth_test: coroutinize test_try_create_role_with_hashed_password_as_anonymous_user Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	cece181113	auth_test: coroutinize test_try_to_create_role_with_password_and_hashed_password Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	752391f757	auth_test: coroutinize test_try_to_create_role_with_hashed_password_and_password Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	287625b297	auth_test: coroutinize test_alter_with_workload_type Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	4eeb5ef54d	auth_test: coroutinize test_alter_with_timeouts Use co_await instead of return for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	170c71b25d	auth_test: coroutinize role_permissions_table_is_protected Use co_await for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	13eccf519f	auth_test: coroutinize role_members_table_is_protected Use co_await for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	43ff3798ad	auth_test: coroutinize roles_table_is_protected Use co_await for improved readability.	2026-04-05 18:26:30 +03:00
Avi Kivity	c586eeb003	auth_test: coroutinize test_password_authenticator_operations Flatten continuation chains (.then()) into linear thread-style code with .get() calls for improved readability. Remove the now-unused require_throws helper template.	2026-04-05 18:26:25 +03:00
Avi Kivity	fbccfe5c9d	auth_test: coroutinize test_password_authenticator_attributes Use co_await instead of return+do_with_cql_env+make_ready_future for improved readability.	2026-04-05 17:28:09 +03:00
Avi Kivity	e3dee64003	auth_test: coroutinize test_default_authenticator Use co_await instead of return+do_with_cql_env+make_ready_future for improved readability.	2026-04-05 17:27:45 +03:00
Jenkins Promoter	ab4a2cdde2	Update pgo profiles - aarch64	2026-04-05 16:58:02 +03:00
Jenkins Promoter	b97cf0083c	Update pgo profiles - x86_64	2026-04-05 16:00:15 +03:00
Nikos Dragazis	6d50e67bd2	scylla_swap_setup: Remove Before=swap.target dependency from swap unit When a Scylla node starts, the scylla-image-setup.service invokes the `scylla_swap_setup` script to provision swap. This script allocates a swap file and creates a swap systemd unit to delegate control to systemd. By default, systemd injects a Before=swap.target dependency into every swap unit, allowing other services to use swap.target to wait for swap to be enabled. On Azure, this doesn't work so well because we store the swap file on the ephemeral disk [1] which has network dependencies (`_netdev` mount option, configured by cloud-init [2]). This makes the swap.target indirectly depend on the network, leading to dependency cycles such as: swap.target -> mnt-swapfile.swap -> mnt.mount -> network-online.target -> network.target -> systemd-resolved.service -> tmp.mount -> swap.target This patch breaks the cycle by removing the swap unit from swap.target using DefaultDependencies=no. The swap unit will still be activated via WantedBy=multi-user.target, just not during early boot. Although this problem is specific to Azure, this patch applies the fix to all clouds to keep the code simple. Fixes #26519. Fixes SCYLLADB-1257 [1] https://github.com/scylladb/scylla-machine-image/pull/426 [2] https://github.com/canonical/cloud-init/pull/1213#issuecomment-1026065501 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#28504	2026-04-05 15:07:50 +03:00
Tomasz Grabiec	74542be5aa	test: pylib: Ignore exceptions in wait_for() ManagerClient::get_ready_cql() calls server_sees_others(), which waits for servers to see each other as alive in gossip. If one of the servers is still early in boot, RESTful API call to "gossiper/endpoint/live" may fail. It throws an exception, which currently terminates the wait_for() and propagates up, failing the test. Fix this by ignoring errors when polling inside wait_for. In case of timeout, we log the last exception. This should fix the problem not only in this case, for all uses of wait_for(). Example output: ``` pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140> deadline = 1775218828.9172852, period = 1.0, before_retry = None backoff_factor = 1.5, max_period = 1.0, label = None async def wait_for( pred: Callable[[], Awaitable[Optional[T]]], deadline: float, period: float = 0.1, before_retry: Optional[Callable[[], Any]] = None, backoff_factor: float = 1.5, max_period: float = 1.0, label: Optional[str] = None) -> T: tag = label or getattr(pred, '__name__', 'unlabeled') start = time.time() retries = 0 last_exception: Exception \| None = None while True: elapsed = time.time() - start if time.time() >= deadline: timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)" if last_exception is not None: timeout_msg += ( f"; last exception: {type(last_exception).__name__}: {last_exception}" ) raise AssertionError(timeout_msg) from last_exception raise AssertionError(timeout_msg) try: > res = await pred() test/pylib/util.py:80: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ async def _sees_min_others(): > raise Exception("asd") E Exception: asd test/pylib/manager_client.py:802: Exception The above exception was the direct cause of the following exception: manager = <test.pylib.manager_client.ManagerClient object at 0x7f022af7e7b0> @pytest.mark.asyncio async def test_auth_after_reset(manager: ManagerClient) -> None: servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1") > cql, _ = await manager.get_ready_cql(servers) test/cluster/auth_cluster/test_auth_after_reset.py:33: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/manager_client.py:137: in get_ready_cql await self.servers_see_each_other(servers) test/pylib/manager_client.py:820: in servers_see_each_other await asyncio.gather(others) test/pylib/manager_client.py:806: in server_sees_others await wait_for(_sees_min_others, time() + interval, period=.5) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pred = <function ManagerClient.server_sees_others.<locals>._sees_min_others at 0x7f022af9a140> deadline = 1775218828.9172852, period = 1.0, before_retry = None backoff_factor = 1.5, max_period = 1.0, label = None async def wait_for( pred: Callable[[], Awaitable[Optional[T]]], deadline: float, period: float = 0.1, before_retry: Optional[Callable[[], Any]] = None, backoff_factor: float = 1.5, max_period: float = 1.0, label: Optional[str] = None) -> T: tag = label or getattr(pred, '__name__', 'unlabeled') start = time.time() retries = 0 last_exception: Exception \| None = None while True: elapsed = time.time() - start if time.time() >= deadline: timeout_msg = f"wait_for({tag}) timed out after {elapsed:.2f}s ({retries} retries)" if last_exception is not None: timeout_msg += ( f"; last exception: {type(last_exception).__name__}: {last_exception}" ) > raise AssertionError(timeout_msg) from last_exception E AssertionError: wait_for(_sees_min_others) timed out after 45.30s (46 retries); last exception: Exception: asd test/pylib/util.py:76: AssertionError ``` Fixes a failure observed in test_auth_after_reset: ``` manager = <test.pylib.manager_client.ManagerClient object at 0x7fb3740e1630> @pytest.mark.asyncio async def test_auth_after_reset(manager: ManagerClient) -> None: servers = await manager.servers_add(3, config=auth_config, auto_rack_dc="dc1") cql, _ = await manager.get_ready_cql(servers) await cql.run_async("ALTER ROLE cassandra WITH PASSWORD = 'forgotten_pwd'") logging.info("Stopping cluster") await asyncio.gather([manager.server_stop_gracefully(server.server_id) for server in servers]) logging.info("Deleting sstables") for table in ["roles", "role_members", "role_attributes", "role_permissions"]: await asyncio.gather([manager.server_wipe_sstables(server.server_id, "system", table) for server in servers]) logging.info("Starting cluster") # Don't try connect to the servers yet, with deleted superuser it will be possible only after # quorum is reached. await asyncio.gather([manager.server_start(server.server_id, connect_driver=False) for server in servers]) logging.info("Waiting for CQL connection") await repeat_until_success(lambda: manager.driver_connect(auth_provider=PlainTextAuthProvider(username="cassandra", password="cassandra"))) > await manager.get_ready_cql(servers) test/cluster/auth_cluster/test_auth_after_reset.py:50: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ test/pylib/manager_client.py:137: in get_ready_cql await self.servers_see_each_other(servers) test/pylib/manager_client.py:819: in servers_see_each_other await asyncio.gather(*others) test/pylib/manager_client.py:805: in server_sees_others await wait_for(_sees_min_others, time() + interval, period=.5) test/pylib/util.py:71: in wait_for res = await pred() test/pylib/manager_client.py:802: in _sees_min_others alive_nodes = await self.api.get_alive_endpoints(server_ip) test/pylib/rest_client.py:243: in get_alive_endpoints data = await self.client.get_json(f"/gossiper/endpoint/live", host=node_ip) test/pylib/rest_client.py:99: in get_json ret = await self._fetch("GET", resource_uri, response_type = "json", host = host, _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <test.pylib.rest_client.TCPRESTClient object at 0x7fb2404a0650> method = 'GET', resource = '/gossiper/endpoint/live', response_type = 'json' host = '127.15.252.8', port = 10000, params = None, json = None, timeout = None allow_failed = False async def _fetch(self, method: str, resource: str, response_type: Optional[str] = None, host: Optional[str] = None, port: Optional[int] = None, params: Optional[Mapping[str, str]] = None, json: Optional[Mapping] = None, timeout: Optional[float] = None, allow_failed: bool = False) -> Any: # Can raise exception. See https://docs.aiohttp.org/en/latest/web_exceptions.html assert method in ["GET", "POST", "PUT", "DELETE"], f"Invalid HTTP request method {method}" assert response_type is None or response_type in ["text", "json"], \ f"Invalid response type requested {response_type} (expected 'text' or 'json')" # Build the URI port = port if port else self.default_port if hasattr(self, "default_port") else None port_str = f":{port}" if port else "" assert host is not None or hasattr(self, "default_host"), "_fetch: missing host for " \ "{method} {resource}" host_str = host if host is not None else self.default_host uri = self.uri_scheme + "://" + host_str + port_str + resource logging.debug(f"RESTClient fetching {method} {uri}") client_timeout = ClientTimeout(total = timeout if timeout is not None else 300) async with request(method, uri, connector = self.connector if hasattr(self, "connector") else None, params = params, json = json, timeout = client_timeout) as resp: if allow_failed: return await resp.json() if resp.status != 200: text = await resp.text() > raise HTTPError(uri, resp.status, params, json, text) E test.pylib.rest_client.HTTPError: HTTP error 404, uri: http://127.15.252.8:10000/gossiper/endpoint/live, params: None, json: None, body: E {"message": "Not found", "code": 404} test/pylib/rest_client.py:77: HTTPError ``` Fixes: SCYLLADB-1367 Closes scylladb/scylladb#29323	2026-04-05 13:52:26 +03:00
Ernest Zaslavsky	c7a74237b3	compaction_test: fix formatting after previous patches	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	101b4ad7fa	compaction_test: add S3/GCS variations to tests Add S3 and GCS variants of the compaction tests to expand coverage for keyspaces configured to use object_storage backends.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	03bd3010bf	compaction_test: extract test_env-based tests into functions Move all test code that relies on test_env into standalone free functions so they can be reused by upcoming S3 and GCS test suites.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	b18528e97e	compaction_test: replace file_exists with storage::exists Replace direct filesystem checks (file_exists) with the storage-agnostic exists() method in unsealed_sstable_compaction, sstable_clone_leaving_unsealed_dest_sstable, and failure_when_adding_new_sstable tests, making them compatible with object-storage backends (S3, GCS).	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	98492e4ea8	compaction_test: initialize tables with schema via make_table_for_tests Start using `table_for_tests::make_default_schema` so test tables are created with a real schema. This is required for object-storage backends, which cannot operate correctly without proper schema initialization.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	5ba79e2ed4	compaction_test: use sstable APIs to manipulate component files Switch tests to use sstable member functions for file manipulation instead of opening files directly on the filesystem. This affects the helpers that emulate sstable corruption: we now overwrite the entire component file rather than just the first few kilobytes, which is sufficient for producing a corrupted sstable.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	405c032f48	compaction_test: fix use-after-move issue We were moving `compaction_type_options` inside a loop, so on the second iteration the test received an already moved-from instance.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	437a581b04	sstable_utils: add `get_storage` and `open_file` helpers Add a non-const `get_storage` accessor to expose underlying storage, and an `open_file` helper to access sstable component files directly. These are needed so compaction tests can read and write sstable components.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	2ad2dbae03	test_env: delay unplugging sstable registry Unplugging the mock sstable_registry happened too early in the test environment. During sstable destruction, components may still need access to the registry, so the unplugging is moved to a later stage.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	8f6630e9cd	storage: add `exists` method to storage abstraction Add an `exists` method to the storage abstraction to allow S3, GCS, and local storage implementations to check whether an sstable component is present.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	ba785f6cab	s3_client: use lowres_system_clock for aws_sigv4 Switch aws_sigv4 to lowres_system_clock since it is not affected by time offsets often introduced in tests, which can skew db_clock. S3 requests cannot represent time shifts greater than 15 minutes from server time, so a stable clock is required.	2026-04-05 11:07:17 +03:00
Ernest Zaslavsky	e08d779922	s3_client: add `object_exists` helper Introduce `object_exists` to the S3 client to check whether an object exists. This is primarily useful for test scenarios.	2026-04-05 11:07:16 +03:00
Ernest Zaslavsky	016b344a8a	gcs_client: add `object_exists` helper Introduce `object_exists` to the GCS client to check whether an object exists. This is primarily useful for test scenarios.	2026-04-05 11:07:16 +03:00
Andrzej Jackowski	8c0920202b	test: protect populate_range in row_cache_test from bad_alloc When test_exception_safety_of_update_from_memtable was converted from manual fail_after()/catch to with_allocation_failures() in `74db08165d`, the populate_range() call ended up inside the failure injection scope without a scoped_critical_alloc_section guard. The other two tests converted in the same commit (test_exception_safety_of_transitioning... and test_exception_safety_of_partition_scan) were correctly guarded. Without the guard, the allocation failure injector can sometimes target an allocation point inside the cleanup path of populate_range(). In a rare corner case, this triggers a bad_alloc in a noexcept context (reader_concurrency_semaphore::stop()), causing std::terminate. Fixes SCYLLADB-1346 Closes scylladb/scylladb#29321	2026-04-04 21:13:26 +03:00
Andrzej Jackowski	ec274cf7b6	test: add test_upgrade_preserves_ddl_audit_for_tables Verify that upgrading from 2025.1 to master does not silently drop DDL auditing for table-scoped audit configurations (SCYLLADB-1155). Test time in dev: 4s Refs: SCYLLADB-1155 Fixes: SCYLLADB-1305	2026-04-03 13:53:28 +02:00
Andrzej Jackowski	9c7b7ac3e3	test: audit: split validate helper so callers need not pass audit_settings The old execute_and_validate_audit_entry required every caller to pass audit_settings so it could decide internally whether to expect an entry. A test added later in this series needs to simply assert an entry was produced, without specifying audit_settings at all. Split into two methods: - execute_and_validate_new_audit_entry: unconditionally expects an audit entry. - execute_and_validate_if_category_enabled: checks audit_settings to decide whether to expect an entry or assert absence. Local wrapper functions and **kwargs forwarding are removed in favor of explicit arguments at each call site, and expected-error cases are handled inline with assert_invalid + assert_entries_were_added.	2026-04-03 13:52:47 +02:00
Andrzej Jackowski	189bff1d5c	test: audit: declare manager attribute in AuditTester base class AuditTester uses self.manager throughout but never declares it. The attribute is only assigned in the CQLAuditTester subclass __init__, so the type checker reports 'Attribute "manager" is unknown' on every self.manager reference in the base class. Add an __init__ to AuditTester that accepts and stores the manager instance, and update CQLAuditTester to forward it via super().__init__ instead of assigning self.manager directly.	2026-04-03 13:52:47 +02:00
Botond Dénes	2c22d69793	Merge 'Pytest: fix variable handling in GSServer (mock) and ensure docker service logs go to test log as well' from Calle Wilund Fixes: SCYLLADB-1106 * Small fix in scylla_cluster - remove debug print * Fix GSServer::unpublish so it does not except if publish was not called beforehand * Improve dockerized_server so mock server logs echo to the test log to help diagnose CI failures (because we don't collect log files from mocks etc, and in any case correlation will be much easier). No backport needed. Closes scylladb/scylladb#29112 * github.com:scylladb/scylladb: dockerized_service: Convert log reader to pipes and push to test log test::cluster::conftest::GSServer: Fix unpublish for when publish was not called scylla_cluster: Use thread safe future signalling scylla_cluster: Remove left-over debug printout	2026-04-03 06:38:05 +03:00
Raphael S. Carvalho	b6ebbbf036	test/cluster/test_tablets2: Fix test_split_stopped_on_shutdown race with stale log messages The test was failing because the call to: await log.wait_for('Stopping.ongoing compactions') was missing the 'from_mark=log_mark' argument. The log mark was updated (line: log_mark = await log.mark()) immediately after detecting 'splitting_mutation_writer_switch_wait: waiting', and just before launching the shutdown task. However, the wait_for call on the following line was scanning from the beginning of the log, not from that mark. As a result, the search immediately matched old 'Stopping N tasks for N ongoing compactions for table system.X due to table removal' messages emitted during initial server bootstrap (for system.large_partitions, system.large_rows, system.large_cells), rather than waiting for the shutdown to actually stop the user-table split compaction. This caused the test to prematurely send the message to the 'splitting_mutation_writer_switch_wait' injection. The split compaction was unblocked before the shutdown had aborted it, so it completed successfully. Since the split succeeded, 'Failed to complete splitting of table' was never logged. Meanwhile, 'storage_service_drain_wait' was blocking do_drain() waiting for a message. With the split already done, the test was stuck waiting for the expected failure log that would never come (600s timeout). At the same time, after 60s the 'storage_service_drain_wait' injection timed out internally, triggering on_internal_error() which -- with --abort-on-internal-error=1 -- crashed the server (exit code -6). Fix: pass from_mark=log_mark to the wait_for('Stopping.ongoing compactions') call so it only matches messages that appear after the shutdown has started, ensuring the test correctly synchronizes with the shutdown aborting the user-table split compaction before releasing the injection. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1319. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29311	2026-04-03 06:28:51 +03:00
Andrei Chekun	6526a78334	test.py: fix nodetool mock server port collision Replace the random port selection with an OS-assigned port. We open a temporary TCP socket, bind it to (ip, 0) with SO_REUSEADDR, read back the port number the OS selected, then close the socket before launching rest_api_mock.py. Add reuse_address=True and reuse_port=True to TCPSite in rest_api_mock.py so the server itself can also reclaim a TIME_WAIT port if needed. Fixes: SCYLLADB-1275 Closes scylladb/scylladb#29314	2026-04-02 16:24:07 +02:00
Botond Dénes	eb78498e07	test: fix flaky test_timeout_is_applied_on_lookup by using eventually_true On slow/overloaded CI machines the lowres_clock timer may not have fired after the fixed 2x sleep, causing the assertion on get_abort_exception() to fail. Replace the fixed sleep with sleep(1x) + eventually_true() which retries with exponential backoff, matching the pattern already used in test_time_based_cache_eviction. Fixes: SCYLLADB-1311 Closes scylladb/scylladb#29299	2026-04-01 18:20:11 +03:00
Marcin Maliszkiewicz	a74665b300	transport: add per-service-level pending response memory metric Track the total memory consumed by responses waiting to be written to the socket, exposed as a per-scheduling-group gauge (cql_pending_response_memory). This complements the response memory accounting added in the previous commits by giving visibility into how much memory each service level is holding in unsent response buffers.	2026-04-01 17:15:28 +02:00
Robert Bindar	e7527392c4	test: close clients if cluster teardown throws make sure the driver is stopped even though cluster teardown throws and avoid potential stale driver connections entering infinite reconnect loops which exhaust cpu resources. Fixes: SCYLLADB-1189 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#29230	2026-04-01 17:22:19 +03:00
Tomasz Grabiec	2ec47a8a21	tests: address_map_test: Fix flakiness in debug mode due to task reordering Debug mode shuffles task position in the queue. So the following is possible: 1) shard 1 calls manual_clock::advance(). This expires timers on shard 1 and queues a background smp call to shard 0 which will expire timers there 2) the smp::submit_to(0, ...) from shard 1 called by the test sumbits the call 3) shard 0 creates tasks for both calls, but (2) is run first, and preempts the reactor 4) shard 1 sees the completion, completes m_svc.invoke_on(1, ..) 5) shard 0 inserts the completion from (4) before task from (1) 6) the check on shard 0: m.find(id1) fails because the timer is not expired yet To fix that, wait for timer expiration on shard 0, so that the test doesn't depend on task execution order. Note: I was not able to reproduce the problem locally using test.py --mode debug --repeat 1000. It happens in jenkins very rarely. Which is expected as the scenario which leads to this is quite unlikely. Fixes SCYLLADB-1265 Closes scylladb/scylladb#29290	2026-04-01 17:17:35 +03:00
Aleksandra Martyniuk	4d4ce074bb	test: node_ops_tasks_tree: reconnect driver after topology changes The test exercises all five node operations (bootstrap, replace, rebuild, removenode, decommission) and by the end only one node out of four remains alive. The CQL driver session, however, still holds stale references to the dead hosts in its connection pool and load-balancing policy state. When the new_test_keyspace context manager exits and attempts DROP KEYSPACE, the driver routes the query to the dead hosts first, gets ConnectionShutdown from each, and throws NoHostAvailable before ever trying the single live node. Fix by calling driver_connect() after the decommission step, which closes the old session and creates a fresh one connected only to the servers the test manager reports as running. Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1313. Closes scylladb/scylladb#29306	2026-04-01 17:13:11 +03:00
Dario Mirovic	85127fded8	test: boost: test null data value to_parsable_string Add tests for null value in data_type::to_parsable_string(). We now explicitly return "null". Refs SCYLLADB-1350	2026-04-01 14:15:25 +02:00
Dario Mirovic	fc705dfb4b	cql3: fix null handling in data_value formatting data_value::to_parsable_string() crashed with a null pointer dereference when called on a null data_value. Return "null" instead. Fixes SCYLLADB-1350	2026-04-01 14:15:18 +02:00
Andrzej Jackowski	cccb014747	test: ldap: add regression test for double-free on unregistered message ID Sends a search via the raw LDAP handle (bypassing _msgid_to_promise registration), then triggers poll_results() through the public API to exercise the unregistered-ID branch. Refs: SCYLLADB-1344	2026-04-01 12:57:50 +02:00
Botond Dénes	0351756b15	Merge 'test: fix fuzzy_test timeout in release mode' from Piotr Smaron The multishard_query_test/fuzzy_test was timing out (SIGKILL after 15 minutes) in release mode CI. In release mode the test generates up to 64 partitions with up to 1000 clustering rows and 1000 range tombstones each. With deeply nested randomly-generated types (e.g. frozen<map<varint, frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed the 15-minute CI timeout. Reduce the release-mode clustering-row and range-tombstone distributions from 0-1000 to 0-200. This caps the worst case at ~12,800 rows -- still 2x the devel-mode maximum (0-100) and sufficient to exercise multi-partition paged scanning with many pages. Fixes: SCYLLADB-1270 No need to backport for now, only appeared on master. Closes scylladb/scylladb#29293 * github.com:scylladb/scylladb: test: clean up fuzzy_test_config and add comments test: fix fuzzy_test timeout in release mode	2026-04-01 11:50:15 +03:00
Andrzej Jackowski	f0028c06dc	ldap: fix double-free of LDAPMessage in poll_results() In the unregistered-ID branch, ldap_msgfree() was called on a result already owned by an RAII ldap_msg_ptr, causing a double-free on scope exit. Remove the redundant manual free. Fixes: SCYLLADB-1344	2026-04-01 10:35:13 +02:00
Andrei Chekun	18f41dcd71	test.py: introduce new scheduler for choosing job count This commit improves how test.py chohoses the default number of parallele jobs. This update keeps logic of selecting number of jobs from memory and cpu limits but simplifies the heuristic so it is smoother, easier to reason about. This avoids discontinuities such as neighboring machine sizes producing unexpectedly different job counts, and behaves more predictably on asymmetric machines where CPU and RAM do not scale together. Compared to the current threshold-based version, this approach: - avoids hard jumps around memory cutoffs - avoids bucketed debug scaling based on CPU count - keeps CPU and memory as separate constraints and combines them in one place - avoids double-penalizing debug mode - is easier to tune later by adjusting a few constants instead of rewriting branching logic Closes scylladb/scylladb#28904	2026-04-01 11:11:15 +03:00
Avi Kivity	d438e35cdd	test/cluster: fix race in test_insert_failure_standalone audit log query get_audit_partitions_for_operation() returns None when no audit log rows are found. In _test_insert_failure_doesnt_report_success_assign_nodes, this None is passed to set(), causing TypeError: 'NoneType' object is not iterable. The audit log entry may not yet be visible immediately after executing the INSERT, so use wait_for() from test.pylib.util with exponential backoff to poll until the entry appears. Import it as wait_for_async to avoid shadowing the existing wait_for from test.cluster.dtest.dtest_class, which has a different signature (timeout vs deadline). Fixes SCYLLADB-1330 Closes scylladb/scylladb#29289	2026-04-01 10:59:02 +03:00
Michael Litvak	35547bfb6e	test: logstor: additional logstor tests	2026-03-31 18:45:08 +02:00
Michael Litvak	5b3e2a4ca2	docs/dev: add logstor on-disk format section	2026-03-31 18:45:08 +02:00
Michael Litvak	39baa573d2	logstor: add version and crc to buffer header add basic crc and validation to the buffer header. add also a version field that indicates the version of the on-disk format.	2026-03-31 18:45:08 +02:00
Michael Litvak	6ace823ee4	test: logstor: tablet split/merge and migration add basic logstor tests for tablet split/merge and migration to verify it works as expected	2026-03-31 18:45:08 +02:00
Michael Litvak	996d623ab4	logstor: enable tablet balancing enable tablet balancing with the logstor feature now that it works	2026-03-31 18:45:08 +02:00
Michael Litvak	b02349d755	logstor: streaming of logstor segments using stream_blob implement tablet migration for logstor tables by streaming segments using stream_blob, similar to file streaming of sstables. take a snapshot of the logstor segments and create a stream_blob_info vector with entry for each segment with the input stream that reads the segment and an op of type file_ops::stream_logstor_segments. the stream_blob_handler creates a logstor sink that allocates a segment on the target shard and creates an output stream that writes to it. when the sink is closed it loads the segment.	2026-03-31 18:45:08 +02:00
Michael Litvak	78426ae31b	logstor: add take_logstor_snapshot add the function table::take_logstor_snapshot that is similar to take_storage_snapshot for sstables. given a token range, for each storage group in the range, it flushes the separator buffers and then makes a snapshot of all segments in the sg's compaction groups while disabling compaction. the segment snapshot holds a reference to the segment so that it won't be freed by compaction, and it provides an input stream for reading the segment. this will be used for tablet migration to stream the segments.	2026-03-31 18:45:08 +02:00
Michael Litvak	754c1b83bd	logstor: segment input/output stream add functions for creating segment input and output streams, that will be used for segment streaming. the segment input stream creates a file input stream that reads a given segment. the segment output stream allocates a new local segment and creates an output stream that writes to the segment, and when closed it loads the segment and adds it to the compaction group.	2026-03-31 18:45:08 +02:00
Michael Litvak	17cab4181b	logstor: implement compaction_group::cleanup implement compaction group cleanup by clearing the range in the index and discarding the segments of the compaction group. segments are discarded by overwriting the segment header to indicate the segment is empty while preserving the segment generation number in order to not resurrect old data in the segment.	2026-03-31 18:45:08 +02:00
Michael Litvak	9fd6dace72	logstor: tablet split implement tablet split for logstor. flush the separator and then perform split as a new type of compaction: take a batch of segments from the source compaction group, read them and write all live records into left/right write buffers according to the split classifier, flush them to the compaction group, and free the old segments. segments that fit in a single target compaction group are removed from the source and added to the correct target group.	2026-03-31 18:45:08 +02:00
Michael Litvak	5de39afc24	logstor: tablet merge implement tablet merge with logstor. disable compaction for the new compaction group, then merge the merging compaction groups by merging their logstor segments set into the new cg - simply merging the segment histogram.	2026-03-31 18:40:57 +02:00
Michael Litvak	684ce8de71	logstor: add compaction reenabler add a function that stops and disabled compaction for a compaction group and returns a compaction reenabler object, similarly to the normal compaction manager. this will be useful for disabling compaction while doing operations on the compaction group's logstor segment set.	2026-03-31 18:40:56 +02:00
Michael Litvak	1d7c2e4f52	logstor: add segment header we have two types of segments. the active segment is "mixed" because we can write to it multiple write_buffers, each write buffer having records from different tables and tablets. in constrast, the separator and compaction write "full" segments - they write a single write_buffer that has records from a single tablet and storage group. for "full" segments, we add a segment header the contains additional useful metadata such as the table and token range in the segment. the write buffer header contains the type of the buffer, mixed or full. if it's full then it has a segment header placed after the write buffer header.	2026-03-31 18:40:56 +02:00
Michael Litvak	8615f68657	logstor: serialize writes to active segment previously when writing to the active segment, the allocation was serialized but multiple writes could proceed concurrently to different offsets. change it instead to serialize the entire write. we prefer to write larger buffers sequentially instead of multiple buffers concurrently. it is also better that we don't have "holes" in the segment. we also change the buffered_writer to send a single flushing buffer at a time. it has a ring of buffers, new writes are written to the head buffer, and a single consumer flushes the tail buffer.	2026-03-31 18:40:56 +02:00
Michael Litvak	e791823874	replica: extend compaction_group functions for logstor extend compaction_group functions such as disk size calculation and empty() to account also for the logstor segments that the compaction group owns. reuse the sstable_add_gate when there is a write in process to a compaction group, in order for the compaction group to be considered not empty.	2026-03-31 18:40:56 +02:00
Michael Litvak	d3db967802	replica: add compaction_group_for_logstor_segment add the function table::compaction_group_for_logstor_segment that we use when recovering a segment to find the compaction group for a segment based on its token range, similarly to compaction_group_for_sstable for sstables. extract the common logic from compaction_group_for_sstable to a common function compaction_group_for_token_range that finds a compaction group for a token range.	2026-03-31 18:40:56 +02:00
Michael Litvak	bf7bc5b410	logstor: code cleanup misc code cleanup and small changes	2026-03-31 18:40:56 +02:00
Botond Dénes	2d2ff4fbda	sstables: use chunked_managed_vector for promoted indexes in partition_index_page Switch _promoted_indexes storage in partition_index_page from managed_vector to chunked_managed_vector to avoid large contiguous allocations. Avoid allocation failure (or crashes with --abort-on-internal-error) when large partitions have enough promoted index entries to trigger a large allocation with managed_vector. Fixes: SCYLLADB-1315 Closes scylladb/scylladb#29283	2026-03-31 18:43:57 +03:00
Piotr Smaron	2ce409dca0	test: clean up fuzzy_test_config and add comments Remove the unused timeout field from fuzzy_test_config. It was declared, initialized per build mode, and logged, but never actually enforced anywhere. Document the intentionally small max_size (1024 bytes) passed to read_partitions_with_paged_scan in run_fuzzy_test_scan: it forces many pages per scan to stress the paging and result-merging logic.	2026-03-31 17:13:26 +02:00
Piotr Smaron	df2924b2a3	test: fix fuzzy_test timeout in release mode The multishard_query_test/fuzzy_test was timing out (SIGKILL after 15 minutes) in release mode CI. In release mode the test generates up to 64 partitions with up to 1000 clustering rows and 1000 range tombstones each. With deeply nested randomly-generated types (e.g. frozen<map<varint, frozen<map<frozen<tuple<...>>>>>>), this volume of data can exceed the 15-minute CI timeout. Reduce the release-mode clustering-row and range-tombstone distributions from 0-1000 to 0-200. This caps the worst case at ~12,800 rows -- still 2x the devel-mode maximum (0-100) and sufficient to exercise multi-partition paged scanning with many pages. Fixes: SCYLLADB-1270	2026-03-31 17:13:06 +02:00
Piotr Szymaniak	6d8ec8a0c0	alternator: fix flaky test_update_condition_unused_entries_short_circuit The test was flaky because it stopped dc2_node immediately after an LWT write, before cross-DC replication could complete. The LWT commit uses LOCAL_QUORUM, which only guarantees persistence in the coordinator's DC. Replication to the remote DC is async background work, and CAS mutations don't store hints. Stopping dc2_node could drop in-flight RPCs, leaving DC1 without the mutation. Fix by polling both live DC1 nodes after the write to confirm cross-DC replication completed before stopping dc2_node. Both nodes must have the data so that the later ConsistentRead=True (LOCAL_QUORUM) read on restarted node1 is guaranteed to succeed. Fixes SCYLLADB-1267 Closes scylladb/scylladb#29287	2026-03-31 16:50:51 +03:00
Dawid Mędrek	f040f1b703	Merge 'raft: remake the read barrier optimization' from Patryk Jędrzejczak The approach taken in `1ae2ae50a6` turned out to be incorrect. The Raft member requesting a read barrier could incorrectly advance its commit_idx and break linearizability. We revert that commit in this PR. We also remake the read barrier optimization with a completely new approach. We make the leader replicate to the non-voting requester of a read barrier if its `commit_idx` is behind. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-998 No backport: the issue is present only in master. Closes scylladb/scylladb#29216 * github.com:scylladb/scylladb: raft: speed up read barrier requested by non-voters Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe"	2026-03-31 15:11:56 +02:00
Marcin Maliszkiewicz	a26ca0f5f7	transport: hold memory permit until response write completes Capture the memory permit in the leave lambda's .finally() continuation so that the semaphore units are kept alive until write_response finishes, preventing premature release of memory accounting. This is especially important with slow network and big responses when buffers can accumulate and deplete node's memory.	2026-03-31 14:05:00 +02:00
Avi Kivity	216d39883a	Merge 'test: audit: fix audit test syslog race' from Dario Mirovic Fix two independent race conditions in the syslog audit test that cause intermittent `assert 2 <= 1` failures in `assert_entries_were_added`. Datagram ordering race: `UnixSockerListener` used `ThreadingUnixDatagramServer`, where each datagram spawns a new thread. The notification barrier in `get_lines()` assumes FIFO handling, but the notification thread can win the lock before an audit entry thread, so `clear_audit_logs()` misses entries that arrive moments later. Fix: switch to sequential `UnixDatagramServer`. Config reload race: The live-update path used `wait_for_config` (REST API poll on shard 0) which can return before `broadcast_to_all_shards()` completes. Fix: wait for `"completed re-reading configuration file"` in the server log after each SIGHUP, which guarantees all shards have the new config. Fixes SCYLLADB-1277 This is CI improvement for the latest code. No need for backport. Closes scylladb/scylladb#29282 * github.com:scylladb/scylladb: test: cluster: wait for full config reload in audit live-update path test: cluster: fix syslog listener datagram ordering race	2026-03-31 13:53:01 +03:00
Tomasz Grabiec	b355bb70c2	dtest/alternator: stop concurrent-requests test when workers hit limit `test_limit_concurrent_requests` could create far more tables than intended because worker threads looped indefinitely and only the probe path terminated the test. In practice, workers often hit `RequestLimitExceeded` first, but the test kept running and creating tables, increasing memory pressure and causing flakiness due to bad_alloc errors in logs. Fix by replacing the old probe-driven termination with worker-driven termination. Workers now run until any worker sees `RequestLimitExceeded`. Fixes SCYLLADB-1181 Closes scylladb/scylladb#29270	2026-03-31 13:35:50 +03:00
Patryk Jędrzejczak	b9f82f6f23	raft_group0: join_group0: fix join hang when node joins group 0 before post_server_start A joining node hung forever if the topology coordinator added it to the group 0 configuration before the node reached `post_server_start`. In that case, `server->get_configuration().contains(my_id)` returned true and the node broke out of the join loop early, skipping `post_server_start`. `_join_node_group0_started` was therefore never set, so the node's `join_node_response` RPC handler blocked indefinitely. Meanwhile the topology coordinator's `respond_to_joining_node` call (which has no timeout) hung forever waiting for the reply that never came. Fix by only taking the early-break path when not starting as a follower (i.e. when the node is the discovery leader or is restarting). A joining node must always reach `post_server_start`. We also provide a regression test. It takes 6s in dev mode. Fixes SCYLLADB-959 Closes scylladb/scylladb#29266	2026-03-31 12:33:56 +02:00
Marcin Maliszkiewicz	2645b95888	transport: account for response size exceeding initial memory estimate After obtaining the CQL response, check if its actual size exceeds the initially acquired memory permit. If so, take semaphore units and adopt them into the permit (non blocking). This doesn't fully prevent from allocating too much memory as size is known when buffer is already allocated but improves memory accounting for big responses.	2026-03-31 11:57:41 +02:00
Ferenc Szili	7b308f3aa0	test: verify hints are delivered during tablet RF reduction Add test_hint_to_leaving_when_reducing_rf which verifies that mutations stored as hints are delivered to the correct replicas when a tablet is removed due to RF reduction. The test sets up a 3-node cluster with RF=2, drops the hint for one replica via error injection, then reduces RF to 1 while hints are pending. It asserts that the mutation is readable after the topology change completes. Also adds a "drop_hint_for_host" error injection point in hint_endpoint_manager to selectively drop hints for a specific host.	2026-03-31 09:18:42 +02:00
Dario Mirovic	0cb63fb669	test: cluster: wait for full config reload in audit live-update path _apply_config_to_running_servers used wait_for_config (REST API poll) to confirm live config updates. The REST API reads from shard 0 only, so it can return before broadcast_to_all_shards() completes — other shards may still have stale audit config, generating unexpected entries. Additionally, server_remove_config_option for absent keys sent separate SIGHUPs before server_update_config, and the single wait_for_config at the end could match a completion from an earlier SIGHUP. Wait for "completed re-reading configuration file" in the server log after each SIGHUP-producing operation. This message is logged only after both read_config() and broadcast_to_all_shards() finish, guaranteeing all shards have the new config. Each operation gets its own mark+wait so no stale completion is matched. Fixes SCYLLADB-1277	2026-03-31 02:27:11 +02:00
Dario Mirovic	1d623196eb	test: cluster: fix syslog listener datagram ordering race UnixSockerListener used ThreadingUnixDatagramServer, which spawns a new thread per datagram. The notification barrier in get_lines() relies on all prior datagrams being handled before the notification. With threading, the notification handler can win the lock before an audit entry handler, so get_lines() returns before the entry is appended. clear_audit_logs() then clears an incomplete buffer, and the late entry leaks into the next test's before/after diff. Switch to sequential UnixDatagramServer. The server thread now handles datagrams in kernel FIFO order, so the notification is always processed after all preceding audit entries. Refs SCYLLADB-1277	2026-03-31 02:27:11 +02:00
Karol Nowacki	493a4433e7	index: fix DESC INDEX for vector index The `DESC INDEX` command returned incorrect results for local vector indexes and for vector indexes that included filtering columns. This patch corrects the implementation to ensure `DESCRIBE INDEX` accurately reflects the index configuration. This was a pre-existing issue, not a regression from recent serialization schema changes for vector index target options.	2026-03-30 16:46:48 +02:00
Karol Nowacki	a32e4bb9f4	vector_search: test: refactor boilerplate setup The test boilerplate setup for some vector store client tests has been extracted to a common function.	2026-03-30 16:46:48 +02:00
Karol Nowacki	6bc88e817f	vector_search: fix SELECT on local vector index Queries against local vector indexes were failing with the error: "ANN ordering by vector requires the column to be indexed using 'vector_index'" This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895	2026-03-30 16:46:48 +02:00
Karol Nowacki	c0b78477a5	index: test: vector index target option serialization test This test ensures that the serialization format for vector index target options remains stable. Maintaining backward compatibility is critical because the index is restored from this property on startup. Any unintended changes to the serialization schema could break existing indexes after an upgrade. This option is also an interface for the vector-store service, which uses it to identify the indexed column.	2026-03-30 16:46:48 +02:00
Karol Nowacki	4dc28dfa52	index: test: secondary index target option serialization test Target option serialization must remain stable for backward compatibility. The index is restored from this property on startup, so unintentional changes to the serialization schema can break indexes after upgrade.	2026-03-30 16:46:47 +02:00
Patryk Jędrzejczak	ba54b2272b	raft: speed up read barrier requested by non-voters We achieve this by making the leader replicate to the non-voting requester of a read barrier if its commit_idx is behind. There are some corner cases where the new `replicate_to(*opt_progress, true);` call will be a no-op, while the corresponding call in `tick_leader()` would result in sending the AppendEntries RPC to the follower. These cases are: - `progress.state == follower_progress::state::PROBE && progress.probe_sent`, - `progress.state == follower_progress::state::PIPELINE && progress.in_flight == follower_progress::max_in_flight`. We could try to improve the optimization by including some of the cases above, but it would only complicate the code without noticeable benefits (at least for group0). Note: this is the second attempt for this optimization. The first approach turned out to be incorrect and was reverted in the previous commit. The performance improvement is the same as in the previous case.	2026-03-30 15:56:24 +02:00
Patryk Jędrzejczak	4913acd742	Revert "raft: read_barrier: update local commit_idx to read_idx when it's safe" This reverts commit `1ae2ae50a6`. The reverted change turned out to be incorrect. The Raft member requesting a read barrier could incorrectly advance its commit_idx and break linearizability. More details in https://scylladb.atlassian.net/browse/SCYLLADB-998?focusedCommentId=42935	2026-03-30 15:56:24 +02:00
Ferenc Szili	1d64ddbdd3	hint_sender: use per-tablet is_leaving() to avoid losing hints on RF reduction hint_sender decides whether to send a hint directly to its destination or to re-mutate from scratch based on token_metadata::is_leaving(), which only checks whether the host is leaving the cluster. When a tablet is dropped from a host due to RF reduction (RF--), the host is still alive and is_leaving() returns false, so hint_sender sends directly to a replica that will no longer own the data -- effectively losing the hint. Switch to the new ermp->is_leaving(host, token) which is tablet-aware. When the destination's tablet is being migrated away and there are pending endpoints, send directly (the pending endpoints will receive the data via streaming); otherwise fall through to the re-mutate path so all current replicas receive the mutation.	2026-03-30 15:49:59 +02:00
Ferenc Szili	7db239b2ed	erm: add is_leaving() to effective_replication_map token_metadata::is_leaving() only knows whether a host is leaving the cluster, which is insufficient for tablets -- a tablet can be migrated away from a host (e.g. during RF reduction) without the host itself leaving. Add a pure virtual is_leaving(host, token) to effective_replication_map so callers can ask per-token questions. The vnode implementation delegates to token_metadata::is_leaving() (host-level, as before). The tablet implementation checks whether the tablet owning the token has a transition whose leaving replica matches the given host.	2026-03-30 15:49:01 +02:00
Andrzej Jackowski	ab43420d30	test: use exclusive driver connection in test_limited_concurrency_of_writes Use get_cql_exclusive(node1) so the driver only connects to node1 and never attempts to contact the stopped node2. The test was flaky because the driver received `Host has been marked down or removed` from node2. Fixes: SCYLLADB-1227 Closes scylladb/scylladb#29268	2026-03-30 11:50:44 +02:00
Botond Dénes	068a7894aa	test/cluster: fix flaky test_cleanup_stop by using asyncio.sleep The test was using time.sleep(1) (a blocking call) to wait after scheduling the stop_compaction task, intending to let it register on the server before releasing the sstable_cleanup_wait injection point. However, time.sleep() blocks the asyncio event loop entirely, so the asyncio.create_task(stop_compaction) task never gets to run during the sleep. After the sleep, the directly-awaited message_injection() runs first, releasing the injection point before stop_compaction is even sent. By the time stop_compaction reaches Scylla, the cleanup has already completed successfully -- no exception is raised and the test fails. Fix by replacing time.sleep(1) with await asyncio.sleep(1), which yields control to the event loop and allows the stop_compaction task to actually send its HTTP request before message_injection is called. Fixes: SCYLLADB-834 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29202	2026-03-30 11:40:47 +03:00
Nikos Dragazis	3b3b02b15a	docs: Add ops guide for vnodes-to-tablets migration The vnodes-to-tablets migration is a manual procedure, so instructions need to be provided to the users. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-29 22:18:46 +03:00
Ernest Zaslavsky	1d779804a0	scripts: remove lua library rename workaround from comparison script Now that cmake/FindLua.cmake uses pkg-config (matching configure.py), both build systems resolve to the same 'lua' library name. Remove the lua/lua-5.4 entries from _KNOWN_LIB_ASYMMETRIES and add 'm' (math library) as a known transitive dependency that configure.py gets via pkg-config for lua.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	c32851b102	cmake: add custom FindLua using pkg-config to match configure.py CMake's built-in FindLua resolves to the versioned library file (e.g. liblua-5.4.so) instead of the unversioned symlink (liblua.so), causing a library name mismatch between the two build systems. Add a custom cmake/FindLua.cmake that uses pkg-config — matching configure.py's approach — and find_library(NAMES lua) to find the unversioned symlink. This also mirrors the pattern used by other Find modules in cmake/ (FindxxHash, Findlz4, etc.).	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	f3a91df0b4	test/cmake: add missing tests to boost test suite Add symmetric_key_test (standalone, links encryption library) and auth_cache_test to the combined_tests binary. These tests already exist in configure.py; this aligns the CMake build.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	de606cc17a	test/cmake: remove per-test LTO disable The per-test -fno-lto link option is now redundant since -fno-lto was added globally in mode.common.cmake. LTO-enabled targets (the scylla binary in RelWithDebInfo) override it via enable_lto().	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	38ba58567a	cmake: add BOOST_ALL_DYN_LINK and strip per-component defines Match configure.py's Boost handling: - Add BOOST_ALL_DYN_LINK when using shared Boost libraries. - Strip per-component defines (BOOST_UNIT_TEST_FRAMEWORK_DYN_LINK, BOOST_REGEX_DYN_LINK, etc.) that CMake's Boost package config adds on imported targets. configure.py only uses the umbrella BOOST_ALL_DYN_LINK define.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	7e72898150	cmake: move SEASTAR_TESTING_MAIN after seastar and abseil subdirs Place add_compile_definitions(SEASTAR_TESTING_MAIN) after both add_subdirectory(seastar) and add_subdirectory(abseil) are processed. This matches configure.py's global define without leaking into seastar's subdirectory build (which would cause a duplicate main symbol in seastar_testing). Remove the now-redundant per-test SEASTAR_TESTING_MAIN compile definition from test/CMakeLists.txt.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	b0837ead3e	cmake: add -fno-sanitize=vptr for abseil sanitizer flags Match configure.py line 2192: abseil gets sanitizer flags with -fno-sanitize=vptr to exclude vptr checks which are incompatible with abseil's usage of type-punning patterns.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	dd829fa69c	cmake: align Seastar build configuration with configure.py - Set BUILD_SHARED_LIBS based on build type to match configure.py's build_seastar_shared_libs: Debug and Dev build Seastar as a shared library, all other modes build it static. - Add sanitizer link options on the seastar target for Coverage mode. Seastar's CMake only activates sanitizer targets for Debug/Sanitize configs, but Coverage mode needs them too since configure.py's seastar_libs_coverage carries -fsanitize flags.	2026-03-29 16:17:45 +03:00
Ernest Zaslavsky	52e4d44a75	cmake: align global compile defines and options with configure.py - Disable CMake's automatic -fcolor-diagnostics injection for Clang+Ninja (CMake 3.24+), matching configure.py which does not add any color diagnostics flags. - Add SEASTAR_NO_EXCEPTION_HACK and XXH_PRIVATE_API as global defines (previously SEASTAR_NO_EXCEPTION_HACK was only on the seastar target as PRIVATE; it needs to be project-wide). - Add -fpch-validate-input-files-content to check precompiled header content when timestamps don't match.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	6f2fe3c2fc	cmake: fix Coverage mode in mode.Coverage.cmake Fix multiple deviations from configure.py's coverage mode: - Remove -fprofile-list from CMAKE_CXX_FLAGS_COVERAGE. That flag belongs in COVERAGE_INST_FLAGS applied to other modes, not to coverage mode itself. - Replace incorrect defines (DEBUG, SANITIZE, DEBUG_LSA_SANITIZER, SCYLLA_ENABLE_ERROR_INJECTION) with the correct Seastar debug defines (SEASTAR_DEBUG, SEASTAR_DEFAULT_ALLOCATOR, etc.) that configure.py's pkg-config query produces for coverage mode. - Add sanitizer and stack-clash-protection compile flags for Coverage config, matching the flags that Seastar's pkg-config --cflags output includes for debug builds. - Change CMAKE_STATIC_LINKER_FLAGS_COVERAGE to CMAKE_EXE_LINKER_FLAGS_COVERAGE. Coverage flags need to reach the executable linker, not the static archiver.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	7d23ba7dc8	cmake: align mode.common.cmake flags with configure.py Add three flag-alignment changes: - -Wno-error=stack-usage= alongside the stack-usage threshold flag, preventing hard errors from stack-usage warnings (matching configure.py behavior). - -fno-lto global link option. configure.py adds -fno-lto to all binaries; LTO-enabled targets override it via enable_lto(). - Sanitizer link flags (-fsanitize=address, -fsanitize=undefined) for Debug/Sanitize configs, matching configure.py's cxx_ld_flags.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	38088a8a94	configure.py: add sstable_tablet_streaming to combined_tests	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	33bca2428a	docs: add compare-build-systems.md Document the purpose, usage, and examples for scripts/compare_build_systems.py which compares the configure.py and CMake build systems by parsing their ninja build files.	2026-03-29 16:17:44 +03:00
Ernest Zaslavsky	d3972369a0	scripts: add compare_build_systems.py to compare ninja build files Add a script that compares configure.py and CMake build systems by parsing their generated build.ninja files. The script checks: - Per-file compilation flags (defines, warnings, optimization) - Link target sets (detect missing/extra targets) - Per-target linker flags and libraries configure.py is treated as the baseline. CMake should match it. Both systems are always configured into a temporary directory so the user's build tree is never touched. Usage: scripts/compare_build_systems.py -m dev # single mode scripts/compare_build_systems.py # all modes scripts/compare_build_systems.py --ci # CI mode (strict)	2026-03-29 16:17:44 +03:00
Nadav Har'El	d32fe72252	Merge 'alternator: check concurrency limit before memory acquisition' from Łukasz Paszkowski Fix the ordering of the concurrency limit check in the Alternator HTTP server so it happens before memory acquisition, and reduce test pressure to avoid LSA exhaustion on the memory-constrained test node. The patch moves the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. The check was originally placed before memory acquisition but was inadvertently moved after it during a refactoring. This allowed unlimited requests to pile up consuming memory, reading bodies, verifying signatures, and decompressing — all before being rejected. Restores the original ordering and mirrors the CQL transport (`transport/server.cc`). Lowers `concurrent_requests_limit` from 5 to 3 and the thread multiplier from 5 to 2 (6 threads instead of 25). This is still sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512MB per shard can sustain. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1248 Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1181 The test started to fail quite recently. It affects master only. No backport is needed. We might want to consider backporting a commit moving the concurrency check earlier. Closes scylladb/scylladb#29272 * github.com:scylladb/scylladb: test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion alternator: check concurrency limit before memory acquisition	2026-03-29 11:08:28 +03:00
Łukasz Paszkowski	b8e3ef0c64	test: reduce concurrent-request-limit test pressure to avoid LSA exhaustion The test_limit_concurrent_requests dtest uses concurrent CreateTable requests to verify Alternator's concurrency limiting. Each admitted CreateTable triggers Raft consensus, schema mutations, and memtable flushes—all of which consume LSA memory. On the 1 GB test node (2 SMP × 512 MB), the original settings (limit=5, 25 threads) created enough flush pressure to exhaust the LSA emergency reserve, producing logalloc::bad_alloc errors in the node log. The test was always marginal under these settings and became flaky as new system tables increased baseline LSA usage over time. Lower concurrent_requests_limit from 5 to 3 and the thread multiplier from 5 to 2 (6 threads total). This is still well above the limit and sufficient to reliably trigger RequestLimitExceeded, while keeping flush pressure within what 512 MB per shard can sustain.	2026-03-28 20:40:33 +01:00
Łukasz Paszkowski	a86928caa1	alternator: check concurrency limit before memory acquisition The concurrency limit check in the Alternator server was positioned after memory acquisition (get_units), request body reading (read_entire_stream), signature verification, and decompression. This allowed unlimited requests to pile up consuming memory before being rejected, exhausting LSA memory and causing logalloc::bad_alloc errors that cascade into Raft applier and topology coordinator failures, breaking subsequent operations. Without this fix, test_limit_concurrent_requests on a 1GB node produces 50 logalloc::bad_alloc errors and cascading failures: reads from system.scylla_local fail, the Raft applier fiber stops, the topology coordinator stops, and all subsequent CreateTable operations fail with InternalServerError (500). With this fix, the cascade is eliminated -- admitted requests may still cause LSA pressure on a memory-constrained node, but the server remains functional. Move the concurrency check to right after the content-length early-out, before any memory acquisition or I/O. This mirrors the CQL transport which correctly checks concurrency before memory acquisition (transport/server.cc). The concurrency check was originally added in `1b8c946ad7` (Sep 2020) before memory acquisition, which at the time lived inside with_gate (after the concurrency gate). The ordering was inverted by `f41dac2a3a` (Mar 2021, "avoid large contiguous allocation for request body"), which moved get_units() earlier in the function to reserve memory before reading the newly-introduced content stream -- but inadvertently also moved it before the concurrency check. `c3593462a4` (Mar 2025) further worsened the situation by adding a 16MB fallback reservation for requests without Content-Length and ungzip/deflate decompression steps -- all before the concurrency check -- greatly increasing the memory consumed by requests that would ultimately be rejected.	2026-03-28 20:40:33 +01:00
Aleksandra Martyniuk	166b293d06	test: add test_failed_tablet_rebuild_is_retried_on_alter Test if alter keyspace statement with the current rf values will fix the state of replicas.	2026-03-27 17:29:31 +01:00
Aleksandra Martyniuk	9ec54a8207	test: add a test to ensure that failed rebuilds are retried	2026-03-27 17:29:31 +01:00
Aleksandra Martyniuk	200dc084c5	service: fail ALTER KEYSPACE if replicas do not satisfy the replication RF change of tablet keyspace starts tablet rebuilds. Even if any of the rebuilds is rolled back (because pending replica was excluded), rf change request finishes successfully. Yet, we are left with not enough replicas. Then, a next new rf change request handler would generate a rebuild of two replicas of the same tablet. Such a transition would not be applied, as we don't allow many pending replicas. An exception would be thrown and the request would be retried infinitely, blocking the topology coordinator. Throw and fail rf change request if there is not enough replicas. The request should be retried later, after the issue is fixed by the mechanism introduced in previous changes.	2026-03-27 17:29:26 +01:00
Aleksandra Martyniuk	7951f92270	service: retry failed tablet rebuilds RF change of tablet keyspace starts tablet rebuilds. Even if any of the rebuilds is rolled back (because pending replica was excluded), rf change request finishes successfully. In this case we end up with the state of the replicas that isn't compatible with the expected keyspace replication. After this change, if topology_coordinator has nothing to do, it proceeds to check if the state of replicas reflects the keyspace replication. If there are any mismatches, the tablet rebuilds are scheduled. All required rebuilds of a single keyspace are scheduled together without respecting the node's load (just as it happens in case of keyspace rf change).	2026-03-27 17:26:45 +01:00
Aleksandra Martyniuk	6f1bba8faf	service: maybe_start_tablet_migration returns std::optional<group0_guard> maybe_start_tablet_migration takes an ownership of group0_guard and does not give it back, even if no work was done. In the following patches, we will proceed with different operations, if there are no migrations to be started. Thus, the guard would be needed. Return group0_guard from maybe_start_tablet_migration is no work was done.	2026-03-27 17:26:45 +01:00
Emil Maskovsky	9dad68e58d	raft: abort stale snapshot transfers when term changes The Bug Assertion failure: `SCYLLA_ASSERT(res.second)` in `raft/server.cc` when creating a snapshot transfer for a destination that already had a stale in-flight transfer. Root Cause If a node loses leadership and later becomes leader again before the next `io_fiber` iteration, the old transfer from the previous term can remain in `_snapshot_transfers` while `become_leader()` resets progress state. When the new term emits `install_snapshot(dst)`, `send_snapshot(dst)` tries to create a new entry for the same destination and can hit the assertion. The Fix Abort all in-flight snapshot transfers in `process_fsm_output()` when `term_and_vote` is persisted. A term/vote change marks existing transfers as stale, so we clean them up before dispatching messages from that batch and before any new snapshot transfer is started. With cross-term cleanup moved to the term-change path, `send_snapshot()` now asserts the within-term invariant that there is at most one in-flight transfer per destination. Fixes: SCYLLADB-862 Backport: The issue is reproducible in master, but is present in all active branches. Closes scylladb/scylladb#29092	2026-03-27 10:00:15 +01:00
Andrzej Jackowski	181ad9f476	Revert "audit: disable DDL by default" This reverts commit `c30607d80b`. With the default configuration, enabling DDL has no effect because no `audit_keyspaces` or `audit_tables` are specified. Including DDL in the default categories can be misleading for some customers, and ideally we would like to avoid it. However, DDL has been one of the default audit categories for years, and removing it risks silently breaking existing deployments that depend on it. Therefore, the recent change to disable DDL by default is reverted. Fixes: SCYLLADB-1155 Closes scylladb/scylladb#29169	2026-03-27 09:55:11 +01:00
Botond Dénes	854c374ebf	test/encryption: wait for topology convergence after abrupt restart test_reboot uses a custom restart function that SIGKILLs and restarts nodes sequentially. After all nodes are back up, the test proceeded directly to reads after wait_for_cql_and_get_hosts(), which only confirms CQL reachability. While a node is restarted, other nodes might execute global token metadata barriers, which advance the topology fence version. The restarted node has to learn about the new version before it can send reads/writes to the other nodes. The test issues reads as soon as the CQL port is opened, which might happen before the last restarted node learns of the latest topology version. If this node acts as a coordinator for reads/write before this happens, these will fail as the other nodes will reject the ops with the outdated topology fence version. Fix this by replacing wait_for_cql_and_get_hosts() on the abrupt-restart path with the more robus get_ready_cql(), which makes sure servers see each other before refreshing the cql connection. This should ensure that nodes have exchanged gossip and converged on topology state before any reads are executed. The rolling_restart() path is unaffected as it handles this internally. Fixes: SCYLLADB-557 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29211	2026-03-27 09:52:27 +01:00
Avi Kivity	b708e5d7c9	Merge 'test: fix race condition in test_crashed_node_substitution' from Sergey Zolotukhin `test_crashed_node_substitution` intermittently failed: ```python assert len(gossiper_eps) == (len(server_eps) + 1) ``` The test crashed the node right after a single ACK2 handshake (`finished do_send_ack2_msg`), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node. This change: Wait until the gossiper state is visible on peers before continuing the test and asserting. Fixes: [SCYLLADB-1256](https://scylladb.atlassian.net/browse/SCYLLADB-1256). backport: this issue may affect CI for all branches, so should be backported to all versions. [SCYLLADB-1256]: https://scylladb.atlassian.net/browse/SCYLLADB-1256?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29254 * github.com:scylladb/scylladb: test: test_crashed_node_substitution: add docstring and fix whitespace test: fix race condition in test_crashed_node_substitution	2026-03-26 21:40:33 +02:00
Petr Gusev	c38e312321	test_lwt_fencing_upgrade: fix quorum failure due to gossip lag If lwt_workload() sends an update immediately after a rolling restart, the coordinator might still see a replica as down due to gossip lagging behind. Concurrently restarting another node leaves only one available replica, failing the LOCAL_QUORUM requirement for learn or eventually consistent sp::query() in sp::cas() and resulting in a mutation_write_failure_exception. We fix this problem by waiting for the restarted server to see 2 other peers. The server_change_version doesn't do that by default -- it passes wait_others=0 to server_start(). Fixes SCYLLADB-1136 Closes scylladb/scylladb#29234	2026-03-26 21:25:53 +02:00
bitpathfinder	627a8294ed	test: test_crashed_node_substitution: add docstring and fix whitespace Add a description of the test's intent and scenario; remove extra blanks.	2026-03-26 18:40:17 +01:00
bitpathfinder	5a086ae9b7	test: fix race condition in test_crashed_node_substitution `test_crashed_node_substitution` intermittently failed: ``` assert len(gossiper_eps) == (len(server_eps) + 1) ``` The test crashed the node right after a single ACK2 handshake ("finished do_send_ack2_msg"), assuming the node state was visible to all peers. However, since gossip is eventually consistent, the update may not have propagated yet, so some nodes did not see the failed node. This change: Wait until the gossiper state is visible on peers before continuing the test and asserting. Fixes: SCYLLADB-1256.	2026-03-26 18:25:05 +01:00
Robert Bindar	c575bbf1e8	test_refresh_deletes_uploaded_sstables should wait for sstables to get deleted SSTable unlinking is async, so in some cases it may happen that the upload dir is not empty immediately after refresh is done. This patch adjusts test_refresh_deletes_uploaded_sstables so it waits with a timeout till the upload dir becomes empty instead of just assuming the API will sync on sstables being gone. Fixes SCYLLADB-1190 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Closes scylladb/scylladb#29215	2026-03-26 08:43:14 +03:00
Nikos Dragazis	8789c95a85	test: cluster: Add test for migration of multiple keyspaces Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	25af8bdc24	test: cluster: Add test for error conditions Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	01a51817c4	test: cluster: Add vnodes->tablets migration test (rollback) Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	56ec33d3e0	test: cluster: Add vnodes->tablets migration test (1 table, 3 nodes) Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	58e930c490	test: cluster: Add vnodes->tablets migration test (1 table, 1 node) This test runs the vnodes-to-tablets migration for a single table on a single-node cluster. The node has multiple shards and multiple power-of-two aligned vnodes, so resharding is triggered. More details in the docstring. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	8837dac2f9	scylla-nodetool: Add migrate-to-tablets subcommand The vnodes-to-tablets migration is a manual procedure, so orchestration must be done via nodetool. This patch adds the following new commands: * nodetool migrate-to-tablets start {ks} * nodetool migrate-to-tablets upgrade * nodetool migrate-to-tablets downgrade * nodetool migrate-to-tablets status {ks} * nodetool migrate-to-tablets finalize {ks} The commands are just wrappers over the REST API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Nikos Dragazis	2a5e6b832a	api: Add REST endpoint for vnode-to-tablet migration status If the keyspace is migrating, it reports the intended and actual storage mode for each node. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:24 +02:00
Marcin Maliszkiewicz	7fdd650009	Merge 'test: audit: clean up test helper class naming' from Dario Mirovic Remove unused `pytest.mark.single_node` marker from `TestCQLAudit`. Rename `TestCQLAudit` to `CQLAuditTester` to reflect that it is a test helper, not a test class. This avoids accidental pytest collection and subsequent warning about `__init__`. Logs before the fixes: ``` test/cluster/test_audit.py:514: 14 warnings /home/dario/dev/scylladb/test/cluster/test_audit.py:514: PytestCollectionWarning: cannot collect test class 'TestCQLAudit' because it has a __init__ constructor (from: cluster/test_audit.py) @pytest.mark.single_node ``` Fixes SCYLLADB-1237 This is an addition to the latest master code. No backport needed. Closes scylladb/scylladb#29237 * github.com:scylladb/scylladb: test: audit: rename TestCQLAudit to CQLAuditTester test: audit: remove unused pytest.mark.single_node	2026-03-25 15:30:16 +01:00
Radosław Cybulski	1dc20cc8f9	alternator/test: explain why 'always' write isolation mode is used in tests Improve test comments for test_streams_batchwrite_into_the_same_partition_deletes_existing_items and test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data to explain why 'always' write isolation mode is required: in always_use_lwt mode all items in a batch get the same CDC timestamp, which triggers the squashing bug. In other modes each item gets a separate timestamp so the bug doesn't manifest. Also fix the example in the second test comment to use cleaner key values and correct event type (INSERT, not MODIFY, since items are inserted into an empty table), and fix the issue reference from #28452 (the PR) to #28439 (the issue).	2026-03-25 15:15:20 +01:00
Dario Mirovic	552a2d0995	test: audit: rename TestCQLAudit to CQLAuditTester pytest tries to collect tests for execution in several ways. One is to pick all classes that start with 'Test'. Those classes must not have custom '__init__' constructor. TestCQLAudit does. TestCQLAudit after migration from test/cluster/dtest is not a test class anymore, but rather a helper class. There are two ways to fix this: 1. Add __init__ = False to the TestCQLAudit class 2. Rename it to not start with 'Test' Option 2 feels better because the new name itself does not convey the wrong message about its role. Fixes SCYLLADB-1237	2026-03-25 13:21:08 +01:00
Dario Mirovic	73de865ca3	test: audit: remove unused pytest.mark.single_node Remove unused pytest.mark.single_node in TestCQLAudit class. This is a leftover from audit tests migration from test/cluster/dtest to test/cluster. Refs SCYLLADB-1237	2026-03-25 13:18:37 +01:00
Radosław Cybulski	ded62b2c5e	alternator/test: add scylla_only to always write isolation fixture Add scylla_only fixture dependency to the test_table_ss_new_and_old_images_write_isolation_always fixture. This ensures all tests using the 'always' write isolation mode are skipped when running against DynamoDB (--aws), since the system:write_isolation tag is a Scylla-only feature.	2026-03-25 12:38:09 +01:00
Radosław Cybulski	7d404cdd51	alternator: fix BatchWriteItem squashed Streams entries BatchWriteItem with items for the same partition (and write isolation set to always) will trigger LWT and run different cdc code path, which will result in wrong Streams data being returned to the user - changes will be randomly squashed together. For example batch write: batch.put_item(Item={'p': 'p', 'c': 'c0'}) batch.put_item(Item={'p': 'p', 'c': 'c1'}) batch.put_item(Item={'p': 'p', 'c': 'c2'}) instead of producing 3 modify / insert events will produce one: type=INSERT, key={'c': {'S': 'c0'}, 'p': {'S': 'p'}}, old_image=None, new_image={'c': {'S': 'c2'}, 'p': {'S': 'p'}} with `new_image` having different `c` key from `key` field. This happens because BatchWriteItem (when using LWT) emits it's changes to cdc under the same timestamp. This results in in all log entries being put in single cdc "bucket" (under the same cdc$timestamp key). Previous parsing algorithm would interpret those changes as a change to a single item and squash them together. The patch rewrites algorithm to use `std::unordered_map` for records based on value of clustering key, that is added to every cdc log entry. This allows rebuilding all item modifications. Fixes #28439 Fixes: SCYLLADB-540	2026-03-25 11:40:53 +01:00
Radosław Cybulski	85da03c88d	alternator: add BatchWriteItem test (failing) Add additional BatchWriteItem tests (some failing): - `test_streams_batchwrite_no_clustering_deletes_non_existing_items` `test_streams_batchwrite_no_clustering_deletes_existing_items` - those tests pass, we add it here for completness, as non clustering tables trigger different paths. - `test_streams_batchwrite_into_the_same_partition_deletes_existing_items` - failing test, that checks combinations of puts and deletes in a single batch write (so for example 3 items, 2 puts and 1 delete). - `test_streams_batchwrite_into_the_same_partition_will_report_wrong_stream_data` - failing simple test. Tests fail, because current implementation, when writing cdc log entries will squash all changes done to the same partition together. The data is still there, but when GetRecords is called and we parse cdc log entries, we don't correctly recover it (see issue #28439 for more details).	2026-03-25 11:40:53 +01:00
Marcin Maliszkiewicz	f988ec18cb	test/lib: fix port in-use detection in start_docker_service Previously, the result of when_all was discarded. when_all stores exceptions in the returned futures rather than throwing, so the outer catch(in_use&) could never trigger. Now we capture the when_all result and inspect each future individually to properly detect in_use from either stream. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1216 Closes scylladb/scylladb#29219	2026-03-25 11:45:53 +02:00
Artsiom Mishuta	cd1679934c	test/pylib: use exponential backoff in wait_for() Change wait_for() defaults from period=1s/no backoff to period=0.1s with 1.5x backoff capped at 1.0s. This catches fast conditions in 100ms instead of 1000ms, benefiting ~100 call sites automatically. Add completion logging with elapsed time and iteration count. Tested local with test/cluster/test_fencing.py::test_fence_hints (dev mode), log output: wait_for(at_least_one_hint_failed) completed in 0.83s (4 iterations) wait_for(exactly_one_hint_sent) completed in 1.34s (5 iterations) Fixes SCYLLADB-738 Closes scylladb/scylladb#29173	2026-03-24 23:49:49 +02:00
Botond Dénes	d52fbf7ada	Merge 'test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces' from Dawid Mędrek The test was flaky. The scenario looked like this: 1. Stop server 1. 2. Set its rf_rack_valid_keyspaces configuration option to true. 3. Create an RF-rack-invalid keyspace. 4. Start server 1 and expect a failure during start-up. It was wrong. We cannot predict when the Raft mutation corresponding to the newly created keyspace will arrive at the node or when it will be processed. If the check of the RF-rack-valid keyspaces we perform at start-up was done before that, it won't include the keyspace. This will lead to a test failure. Unfortunately, it's not feasible to perform a read barrier during start-up. What's more, although it would help the test, it wouldn't be useful otherwise. Because of that, we simply fix the test, at least for now. The new scenario looks like this: 1. Disable the rf_rack_valid_keyspaces configuration option on server 1. 2. Start the server. 3. Create an RF-rack-invalid keyspace. 4. Perform a read barrier on server 1. This will ensure that it has observed all Raft mutations, and we won't run into the same problem. 5. Stop the node. 6. Set its rf_rack_valid_keyspaces configuration option to true. 7. Try to start the node and observe a failure. This will make the test perform consistently. --- I ran the test (in dev mode, on my local machine) three times before these changes, and three times with them. I include the time results below. Before: ``` real 0m47.570s user 0m41.631s sys 0m8.634s real 0m50.495s user 0m42.499s sys 0m8.607s real 0m50.375s user 0m41.832s sys 0m8.789s ``` After: ``` real 0m50.509s user 0m43.535s sys 0m9.715s real 0m50.857s user 0m44.185s sys 0m9.811s real 0m50.873s user 0m44.289s sys 0m9.737s ``` Fixes SCYLLADB-1137 Backport: The test is present on all supported branches, and so we should backport these changes to them. Closes scylladb/scylladb#29218 * github.com:scylladb/scylladb: test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py	2026-03-24 21:09:19 +02:00
Patryk Jędrzejczak	141aa2d696	Merge 'test/cluster/test_incremental_repair.py: fix typo + enable compaction DEBUG logs' from Botond Dénes This PR contains two small improvements to `test_incremental_repair.py` motivated by the sporadic failure of `test_tablet_incremental_repair_and_scrubsstables_abort`. The test fails with `assert 3 == 2` on `len(sst_add)` in the second repair round. The extra SSTable has `repaired_at=0`, meaning scrub unexpectedly produced more unrepaired SSTables than anticipated. Since scrub (and compaction in general) logs at DEBUG level and the test did not enable debug logging, the existing logs do not contain enough information to determine the root cause. Commit 1 fixes a long-standing typo in the helper function name (`preapre` -> `prepare`). Commit 2 enables `compaction=debug` for the Scylla nodes started by `do_tablet_incremental_repair_and_ops`, which covers all `test_tablet_incremental_repair_and_` variants. This will capture full compaction/scrub activity on the next reproduction, making the failure diagnosable. Refs: SCYLLADB-1086 Backport: test improvement, no backport Closes scylladb/scylladb#29175 https://github.com/scylladb/scylladb: test/cluster/test_incremental_repair.py: enable compaction DEBUG logs in do_tablet_incremental_repair_and_ops test/cluster/test_incremental_repair.py: fix typo preapre -> prepare	2026-03-24 16:27:01 +01:00
Pavel Emelyanov	2d8540f1ee	transport: fix process_startup cert-auth path missing connection-ready setup When authenticate() returns a user directly (certificate-based auth, introduced in `20e9619bb1`), process_startup was missing the same post-authentication bookkeeping that the no-auth and SASL paths perform: - update_scheduling_group(): without it, the connection runs under the default scheduling group instead of the one mapped to the user's service level. - _authenticating = false / _ready = true: without them, system.clients reports connection_stage = AUTHENTICATING forever instead of READY. - on_connection_ready(): without it, the connection never releases its slot in the uninitialized-connections concurrency semaphore (acquired at connection creation), leaking one unit per cert-authenticated connection for the lifetime of the connection. The omission was introduced when on_connection_ready() was added to the else and SASL branches in `474e84199c` but the cert-auth branch was missed. Fixes: `20e9619bb1` ("auth: support certificate-based authentication") Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 18:02:46 +03:00
Pavel Emelyanov	da6fe14035	transport: test that connection_stage is READY after auth via all process_startup paths The cert-auth path in process_startup (introduced in `20e9619bb1`) was missing _ready = true, _authenticating = false, update_scheduling_group() and on_connection_ready(). The result is that connections authenticated via certificate show connection_stage = AUTHENTICATING in system.clients forever, run under the wrong service-level scheduling group, and hold the uninitialized-connections semaphore slot for the lifetime of the connection. Add a parametrized cluster test that verifies all three process_startup branches result in connection_stage = READY: - allow_all: AllowAllAuthenticator (no-auth path) - password: PasswordAuthenticator (SASL/process_auth_response path) - cert_bypass: CertificateAuthenticator with transport_early_auth_bypass error injection (cert-auth path -- the buggy one) The injection is added to certificate_authenticator::authenticate() so tests can bypass actual TLS certificate parsing while still exercising the cert-auth code path in process_startup. The cert_bypass case is marked xfail until the bug is fixed. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-24 18:01:28 +03:00
Benny Halevy	1a7b013377	test: add test_sstable_clone_preserves_staging_state	2026-03-24 16:48:01 +02:00
Benny Halevy	22f2010477	test: derive sstable state from directory in test_env::make_sstable Instead of always passing sstable_state::normal, infer the state from the last component of the directory path by comparing against the known state subdirectory constants (staging_dir, upload_dir, quarantine_dir). Any unrecognized path component (the common case for normal-state sstables) maps to sstable_state::normal. When a non-normal state is detected, strip the state subdirectory from dir so that the base table directory is passed to storage.	2026-03-24 16:48:01 +02:00
Ernest Zaslavsky	c670183be8	cmake: fix precompiled header (PCH) creation Two issues prevented the precompiled header from compiling successfully when using CMake directly (rather than the configure.py + ninja build system): a) Propagate build flags to Rust binding targets reusing the PCH. The wasmtime_bindings and inc targets reuse the PCH from scylla-precompiled-header, which is compiled with Seastar's flags (including sanitizer flags in Debug/Sanitize modes). Without matching compile options, the compiler rejects the PCH due to flag mismatch (e.g., -fsanitize=address). Link these targets against Seastar::seastar to inherit the required compile options. Closes scylladb/scylladb#28941	2026-03-24 15:53:40 +02:00
Dawid Mędrek	e639dcda0b	test: cluster: Deflake test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces The test was flaky. The scenario looked like this: 1. Stop server 1. 2. Set its rf_rack_valid_keyspaces configuration option to true. 3. Create an RF-rack-invalid keyspace. 4. Start server 1 and expect a failure during start-up. It was wrong. We cannot predict when the Raft mutation corresponding to the newly created keyspace will arrive at the node or when it will be processed. If the check of the RF-rack-valid keyspaces we perform at start-up was done before that, it won't include the keyspace. This will lead to a test failure. Unfortunately, it's not feasible to perform a read barrier during start-up. What's more, although it would help the test, it wouldn't be useful otherwise. Because of that, we simply fix the test, at least for now. The new scenario looks like this: 1. Disable the rf_rack_valid_keyspaces configuration option on server 1. 2. Start the server. 3. Create an RF-rack-invalid keyspace. 4. Perform a read barrier on server 1. This will ensure that it has observed all Raft mutations, and we won't run into the same problem. 5. Stop the node. 6. Set its rf_rack_valid_keyspaces configuration option to true. 7. Try to start the node and observe a failure. This will make the test perform consistently. --- I ran the test (in dev mode, on my local machine) three times before these changes, and three times with them. I include the time results below. Before: ``` real 0m47.570s user 0m41.631s sys 0m8.634s real 0m50.495s user 0m42.499s sys 0m8.607s real 0m50.375s user 0m41.832s sys 0m8.789s ``` After: ``` real 0m50.509s user 0m43.535s sys 0m9.715s real 0m50.857s user 0m44.185s sys 0m9.811s real 0m50.873s user 0m44.289s sys 0m9.737s ``` Fixes SCYLLADB-1137	2026-03-24 14:27:36 +01:00
Patryk Jędrzejczak	503a6e2d7e	locator: everywhere_replication_strategy: fix sanity_check_read_replicas when read_new is true ERMs created in `calculate_vnode_effective_replication_map` have RF computed based on the old token metadata during a topology change. The reading replicas, however, are computed based on the new token metadata (`target_token_metadata`) when `read_new` is true. That can create a mismatch for EverywhereStrategy during some topology changes - RF can be equal to the number of reading replicas +-1. During bootstrap, this can cause the `everywhere_replication_strategy::sanity_check_read_replicas` check to fail in debug mode. We fix the check in this commit by allowing one more reading replica when `read_new` is true. Fixes https://scylladb.atlassian.net/browse/SCYLLADB-1147 Closes scylladb/scylladb#29150	2026-03-24 13:43:39 +01:00
Jenkins Promoter	0f02c0d6fa	Update pgo profiles - x86_64	2026-03-24 14:11:38 +02:00
Dawid Mędrek	4fead4baae	test: cluster: Mark test with @pytest.mark.asyncio in test_multidc.py One of the tests, test_startup_with_keyspaces_violating_rf_rack_valid_keyspaces, didn't have the marker. Let's add it now.	2026-03-24 12:52:00 +01:00
Botond Dénes	ffd58ca1f0	Merge 'test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints' from Dawid Mędrek Before these changes, we would send mutations to the node and immediately query the metrics to see how many hints had been written. However, that could lead to random failures of the test: even if the mutations have finished executing, hints are stored asynchronously, so we don't have a guarantee they have already been processed. To prevent such failures, we rewrite the check: we will perform multiple checks against the metrics until we have confirmed that the hints have indeed been written or we hit the timeout. We're generous with the timeout: we give the test 60 seconds. That should be enough time to avoid flakiness even on super slow machines, and if the test does fail, we will know something is really wrong. As a bonus, we improve the test in general too. We explicitly express the preconditions we rely on, as well as bump the log level. If the test fails in the future, it might be very difficult do debug it without this additional information. Fixes SCYLLADB-1133 Backport: The test is present on all supported branches. To avoid running into more failures, we should backport these changes to them. Closes scylladb/scylladb#29191 * github.com:scylladb/scylladb: test: cluster: Increase log level in test_write_cl_any_to_dead_node_generates_hints test: cluster: Await all mutations concurrently in test_write_cl_any_to_dead_node_generates_hints test: cluster: Specify min_tablet_count in test_write_cl_any_to_dead_node_generates_hints test: cluster: Use new_test_table in test_write_cl_any_to_dead_node_generates_hints test: cluster: Introduce auxiliary function keyspace_has_tablets test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints	2026-03-24 13:39:56 +02:00
Calle Wilund	f1b3bff4a5	dockerized_service: Convert log reader to pipes and push to test log Refs: SCYLLADB-1106 Ensures any stderr logs from mock services will echo to the test log regardless of the log file we write. To help debug failed CI.	2026-03-24 12:35:42 +01:00
Calle Wilund	38aaed1ed4	test::cluster::conftest::GSServer: Fix unpublish for when publish was not called Use checked dict access to check the set vars. Fixes: SCYLLADB-1106	2026-03-24 12:33:56 +01:00
Calle Wilund	b382f3593c	scylla_cluster: Use thread safe future signalling	2026-03-24 12:33:56 +01:00
Nikos Dragazis	d09196068c	api: Add REST endpoint for migration finalization The endpoint is the following: POST /storage_service/vnode_tablet_migrations/keyspaces/{keyspace}/finalization When called, it issues a `finalize_migration` topology request and waits for its completion. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:21:12 +02:00
Nikos Dragazis	c88ddecfca	topology_coordinator: Add `finalize_migration` request Vnodes-to-tablets migration needs a finalization step to finish or rollback the migration. Finishing the migration involves switching the keyspace schema to tablets and clearing the `intended_storage_mode` from system.topology. Rolling back the migration involves deleting the tablet maps and clearing the `intended_storage_mode`. The finalization needs to be done as a topology request to exclude with other operations such as repair and TRUNCATE. This patch introduces the `finalize_migration` global topology request for this purpose. The request takes a keyspace name as an argument. The direction of the finalization (i.e., forward path vs rollback) is inferred from the `intended_storage_mode` of all nodes (not ideal, should be made explicit). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:39 +02:00
Nikos Dragazis	0e1e6ebdc5	database: Construct migrating tables with tablet ERMs Extend `database::add_column_family()` with a `storage_mode` argument. If the table is under vnodes-to-tablets migration and the storage mode is "tablets", create a tablet ERM. Make the distributed loader determine the storage mode from topology (`intended_storage_mode` column in system.topology). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:39 +02:00
Nikos Dragazis	2f93ab281b	api: Add REST endpoint for upgrading nodes to tablets The endpoint is the following: POST /storage_service/vnode_tablet_migrations/node/storage_mode?intended_mode={tablets,vnodes} This endpoint is part of the vnodes-to-tablets migration process and controls a node's intended_storage_mode in system.topology. The storage mode represents the node-local data distribution model, i.e., how data are organized across shards. The node will apply the intended storage mode to migrating tables upon next restart by resharding their SSTables (either on vnode boundaries if intended_mode=tablets, or with the static sharder if intended_mode=vnodes). Note that this endpoint controls the intended_storage_mode of the local node only. This has the nice benefit that once the API call returns, the change has not only been committed to group0 but also applied to the local node's state machine. This guarantees that the change is part of the node's local copy upon next restart; no additional read barrier is needed. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:20:35 +02:00
Nikos Dragazis	c4c3a95863	api: Add REST endpoint for starting vnodes-to-tablets migration The endpoint is the following: POST /storage_service/vnode_tablet_migrations/keyspaces/{keyspace} Its purpose is to start the migration of a whole keyspace from vnodes to tablets. When called, Scylla will synchronously create a tablet map for each table in the specified keyspace. The tablet maps of all tables are identical and they mirror the vnode layout; they contain one tablet per vnode and each tablet uses the same replica hosts and token boundaries as the corresponding vnode. The only difference from vnodes lies in the sharding approach. Tablets are assigned to a single shard - using a round-robin strategy in this patch - whereas vnodes are distributed evenly across all shards. If the tablet count per shard is low and tablet sizes are uneven, or some shards have more tablets than others, performance may degrade during the migration process. For example, a cluster with i8g.48xlarge (192 vCPUs), 256 vnodes per node and RF=3 will have 256 * 3 / 192 vCPUs = 4 tablet replicas per shard during the migration. One additional tablet or a double-sized tablet would cause 25% overcommit. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 13:19:47 +02:00
Andrei Chekun	f6fd3bbea0	test.py: reduce timeout for one test Reduce the timeout for one test to 60 minutes. The longest test we had so far was ~10-15 minutes. So reducing this timeout is pretty safe and should help with hanging tests. Closes scylladb/scylladb#29212	2026-03-24 12:50:10 +02:00
Benny Halevy	ca9ff134b8	sstables: log debug message in filesystem_storage::clone	2026-03-24 12:26:03 +02:00
Nikos Dragazis	b7f4ae8218	topology_state_machine: Add intended_storage_mode to system.topology Part of the vnodes-to-tablets migration is to reshard the SSTables of each node on vnode boundaries. Resharding is a heavy operation that runs on startup while the node is offline. Since nodes can restart for unexpected reasons, we need a flag to do it in a controllable way. We also need the ability to roll back the migration, which requires resharding in the opposite direction. This means a node must be aware of the intended migration direction. To address both requirements, this patch introduces a new column, intended_storage_mode, in system.topology. A non-null value indicates that a node should perform a migration and specifies the migration direction. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	bc8109f1a4	distributed_loader: Wire vnode-based resharding into table populator Make the table populator migration-aware. If a table is migrating to tablets, switch from normal resharding to vnode-based resharding. Vnode-based resharding requires passing a vector of "owned ranges" upon which resharding will segregate the SSTables. Compute it from the tablet map. We could also compute them from the vnodes, since tablets are identical to vnodes during the migration, but in the future we may switch to a different model (multiple tablets per vnode). Let the distributed loader decide if a table is migrating or not and communicate that to the table populator. A table is migrating if the keyspace replication strategy uses vnodes but the table replication strategy uses tablets. Currently, tables cannot enter this "migrating" state; support for this will be introduced in the next patches. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	63399951df	replica: Pick any compaction group for resharding In the previous patch, reshard compaction was extended with a special operation mode where SSTables from vnode-based tables are segregated on vnode boundaries and not with the static sharder. This will later be wired into vnodes-to-tablets migration. The problem is that resharding requires a compaction group. With a vnode-based table, there is only one compaction group per shard, and this is what the current code utilizes (`try_get_compaction_group_view_with_static_sharding()`). But the new operation mode will apply to migrating tables, which use a `tablet_storage_group_manager`, which creates one compaction group for each tablet. Some compaction group needs to be selected. Pick any compaction group that is available on the current shard. Reshard compaction is an operation that happens early in the startup process; compaction groups do not own any SSTables yet, so all compaction groups are equivalent. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Benny Halevy	d1c6141407	compaction: resharding_compaction: add vnodes_resharding option In this mode, the output sstables generated by resharding compaction are segregated by token range, based on the keyspace vnode-based owned token ranges vector. A basic unit test was also added to sstable_directory_test. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	d153a95943	storage_service: Preserve ERM flavor of migrating tables When a table is migrating from vnodes to tablets, the cluster is in a mixed state where some nodes use vnode ERMs and others use tablet ERMs. The ERM flavor is a node-local property that expresses the node's storage organization. Preserve the flavor across token metadata changes. The flavor needs to be on par with storage, but the storage can change only on startup, as it requires resharding all SSTables to conform with the flavor. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	4a3e26d5e3	tablet_allocator: Exclude migrating tables from load balancing The tablet load balancer operates on all tablet-based tables that appear in the tablet metadata. With the introduction of the vnodes-to-tablets migration procedure later in this series, migrating tables will also appear in the tablet metadata, but they need to be treated as vnode tables until migration is finished. This patch excludes such tables from load balancing. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Nikos Dragazis	3e2dc078c9	feature_service: Add vnodes_to_tablets_migrations feature Vnodes-to-tablets migrations require cluster-level support: the REST API and the group0 state need to be supported by all nodes. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Dawid Mędrek	148217bed6	test: cluster: Increase log level in test_write_cl_any_to_dead_node_generates_hints We increase the log level of `hints_manager` to TRACE in the test. If it fails, it may be incredibly difficult to debug it without any additional information.	2026-03-23 19:19:17 +01:00
Dawid Mędrek	2b472fe7fd	test: cluster: Await all mutations concurrently in test_write_cl_any_to_dead_node_generates_hints	2026-03-23 19:19:17 +01:00
Dawid Mędrek	ae12c712ce	test: cluster: Specify min_tablet_count in test_write_cl_any_to_dead_node_generates_hints The test relies on the assumption that mutations will be distributed more or less uniformly over the nodes. Although in practice this should not be possible, theoretically it's possible that there's only one tablet allocated for the table. To clearly indicate this precondition, we explicitly set the property `min_tablet_count` when creating the table. This way, we have a gurantee that the table has multiple tablets. The load balancer should now take care of distributing them over the nodes equally. Thanks to that, `servers[1]` will have some tablets, and so it'll be the target for some of the mutations we perform.	2026-03-23 19:19:14 +01:00
Dawid Mędrek	dd446aa442	test: cluster: Use new_test_table in test_write_cl_any_to_dead_node_generates_hints The context manager is the de-facto standard in the test suite. It will also allow us for a prettier way to conditionally enable per-table tablet options in the following commit.	2026-03-23 19:07:01 +01:00
Dawid Mędrek	dea79b09a9	test: cluster: Introduce auxiliary function keyspace_has_tablets The function is adapted from its counterpart in the cqlpy test suite: cqlpy/util.py::keyspace_has_tablets. We will use it in a commit in this series to conditionally set tablet properties when creating a table. It might also be useful in general.	2026-03-23 19:07:01 +01:00
Dawid Mędrek	3d04fd1d13	test: cluster: Deflake test_write_cl_any_to_dead_node_generates_hints Before these changes, we would send mutations to the node and immediately query the metrics to see how many hints had been written. However, that could lead to random failures of the test: even if the mutations have finished executing, hints are stored asynchronously, so we don't have a guarantee they have already been processed. To prevent such failures, we rewrite the check: we will perform multiple checks against the metrics until we have confirmed that the hints have indeed been written or we hit the timeout. We're generous with the timeout: we give the test 60 seconds. That should be enough time to avoid flakiness even on super slow machines, and if the test does fail, we will know something is really wrong. Fixes SCYLLADB-1133	2026-03-23 19:06:57 +01:00
Piotr Dulikowski	63067f594d	strong_consistency: fake taking and dropping snapshots Snapshots are not implemented yet for strong consistency - attempting to take, transfer or drop a snapshot results in an exception. However, the logic of our state machine forces snapshot transfer even if there are no lagging replicas - every raft::server::configuration::snapshot_threshold log entries. We have actually encountered an issue in our benchmarks where snapshots were being taken even though the cluster was not under any disruption, and this is one of the possible causes. It turns out that we can safely allow for taking snapshots right now - we can just implement it as a no-op and return a random UUID. Conversely, dropping a snapshot can also be a no-op. This is safe because snapshot transfer still throws an exception - as long as the taken/recovered snapshots are never attempted to be transferred.	2026-03-23 17:03:36 +01:00
Piotr Dulikowski	dd1d3dd1ee	strong_consistency: adjust limits for snapshots Raft snapshots are not implemented yet for strong consistency. Adjust the current raft group config to make them much less likely to occur: - snapshot_threshold config option decides how many log entries need to be applied after the last snapshot. Set it to the maximum value for size_t in order to effectively disable it. - snapshot_threshold_log_size defines a threshold for the log memory usage over which a snapshot is created. Increase it from the default 2MB to 10MB. - max_log_size defines the threshold for the log memory usage over which requests are stopped to be admitted until the log is shrunk back by a snapshot. Set it to 20MB, as this option is recommended to be at least twice as much as snapshot_threshold_log_size. Refs: SCYLLADB-1115	2026-03-23 17:03:36 +01:00
Pavel Emelyanov	57ef712243	test/backup: drop create_dataset helper It has no more callers after the previous patch. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 17:01:20 +03:00
Pavel Emelyanov	2353091cbd	test/backup: use new_test_keyspace in test_restore_primary_replica Replace create_dataset + manual DROP/CREATE KEYSPACE with two sequential new_test_keyspace context manager blocks, matching the pattern used by do_test_streaming_scopes. The first block covers backup, the second covers restore. Keyspace lifecycle is now automatic. The streaming directions validation loop is moved outside of the second context block, since it only parses logs and has no dependency on the keyspace being alive. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 16:59:47 +03:00
Botond Dénes	f5438e0587	test/cluster/test_incremental_repair.py: enable compaction DEBUG logs in do_tablet_incremental_repair_and_ops The test sporadically fails because scrub produces an unexpected number of SSTables. Compaction logs are needed to diagnose why, but were not captured since scrub runs at DEBUG level. Enable compaction=debug for the servers started by do_tablet_incremental_repair_and_ops so the next reproduction provides enough information to root-cause the issue. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 15:48:26 +02:00
Botond Dénes	f6ab576ed9	test/cluster/test_incremental_repair.py: fix typo preapre -> prepare Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-23 15:48:12 +02:00
Pavel Emelyanov	cb329b10bf	code: Add maintenance/maintenance group And move some activities from streaming group into it, namely - tablet_allocator background group - sstables_manager-s components reclaimer - tablet storage group manager merge completion fiber - prometheus All other activity that was in streaming group remains there, but can be moved to this group (or to new maintenance subgroup) later. All but prometheus are patched here, prometheus still uses the maintenance_sched_group variable in main.cc, so it transparently moves into new group Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:03 +03:00
Pavel Emelyanov	de9bfe0f1d	backup: Add maintenance/backup group The snapshot_ctl::backup_task_impl runs in configured scheduling group. Now it's streaming one. This patch introduces the maintenance/backup group and re-configures backup task with it. The group gets its --backup_io_throughput_mb_per_sec option that controls bandwidth limit for this sub-group only. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	6f43e8562e	compaction: Add maintenance/maintenance_compaction group Compaction manager tells compaction_sched_group from maintenance_compaction_sched_group. The latter, however, is set to be "streaming" group. This patch adds real maintenance_compaction group under the maintenance supergroup and makes compaction manager use it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	13355d1845	main: Introduce maintenance supergroup And just move streaming group inside it. Next patches will populate this supergroup further. The new supergroup gets its --maintenance-io-throughput-mb-per-sec option that controls supergroup-wide IO bandwidth applied to it. If not configured, the supergroup gets the throughput from streaming to be backward compatible. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	7cb9fa0778	main: Move all maintenance sched group into streaming one The main.cc code uses two variables to reference streaming scheduling. This patch stops using the maintenance_sched_group one, because it's in fact streaming group, and real "maintenance" will appear later in this set. One place is deliberately not patched -- prometheus code starts before dbcfg.streaming_scheduling_group appears, so it still sits uses the maintenance_sched_group variable. This fact will be used in one of the next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	45ecf15fff	database: Use local variable for current_scheduling_group The classify_request() helper captures current scheduling group into local variable and compares it with groups from db_config to decide which "class" it belongs to. One if uses current_scheduling_group(), while it could use the local variable. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Pavel Emelyanov	15c41bfb6c	code: Live-update IO throughputs from main Currently we have two live-updateable IO-throughput options -- one for streaming and one for compaction. Both are observed and the changed value is applied to the corresponding scheduling_group by the relevant serice -- respectively, stream_manager and compaction_manager. Both observe/react/apply places use pretty heavy boilerplate code for such simple task. Next patches will make things worse by adding two more options to control IO throughput of some other groups. Said that, the proposal is to hold the updating code in main.cc with the help of a wrapper class. In there all the needed bits are at hand, and classes can get their IO updates applied easily. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-23 16:00:02 +03:00
Calle Wilund	b36dc80835	scylla_cluster: Remove left-over debug printout	2026-03-23 11:07:59 +01:00
Pavel Emelyanov	c114d1b82c	api: Inline describe_ring JSON handling There are two helpers for describe_ring endpoint. Both can be squashed together for code brevity. Also, while at it, the "keyspace" parameter is not properly validated by the endpoint. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:51:32 +03:00
Pavel Emelyanov	9a2e583f29	storage_service: Make describe_ring_for_table() take table_id All callers already have it. It makes no difference for the method itself with which table identifier to work, but will help to simplify the flow in API handler (next patch) Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:49:24 +03:00
Pavel Emelyanov	4bc8ec174c	repair: Remove db/config.hh from repair/*.cc files Now all the code uses repair_service::config and no longer needs global config description. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-20 19:36:50 +03:00
Pavel Emelyanov	35f625e5c7	repair: Move repair_multishard_reader options onto repair_service::config This actually uses two interconnected options: repair_multishard_reader_buffer_hint_size and repair_multishard_reader_enable_read_ahead. Both are propagated through repair_service::config and pass their values to repair_reader/make_reader at construction time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:36:50 +03:00
Pavel Emelyanov	9bc0d27aae	repair: Move critical_disk_utilization_level onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	80aa0fcdc2	repair: Move repair_partition_count_estimation_ratio onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	585cb0c718	repair: Move repair_hints_batchlog_flush_cache_time_in_ms onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	d8f7f86e10	repair: Move enable_small_table_optimization_for_rbno onto repair_service::config Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-03-20 19:23:47 +03:00
Pavel Emelyanov	38a23ff927	repair: Introduce repair_service::config Most other services have their configs, rpair still uses global db::config. Add an empty config struct to repair_service to carry db::config options the repair service needs. Subsequent patches will populate the struct with options. The config is created in main.cc as sharded_parameter because all future options are live-updateable and should capture theirs source from db::config on correct shard. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-03-20 19:23:47 +03:00
Artsiom Mishuta	0ede308a04	test/pylib: save logs on success only during teardown phase Previously, when --save-log-on-success was enabled, logs were saved for every test phase (setup, call, teardown)in 3 files. Restrict it to only the teardown phase, that contains all 3 in case of test success, to avoid redundant log entries.	2026-03-19 16:35:22 +01:00
Artsiom Mishuta	cbc07569c0	test: Lower default log level from DEBUG to INFO 1. test.py — Removed --log-level=DEBUG flag from pytest args 2. test/pytest.ini — Changed log_level to INFO (that was set DEBUG in test.py), changed log_file_level from DEBUG to INFO, added clarifying comments	2026-03-19 16:32:30 +01:00
Gleb Natapov	2d8b3e751b	view: drop unused v1 builder code	2026-03-18 17:45:40 +02:00
Gleb Natapov	77d3245e02	view: remove upgrade to raft code Since we do no longer support upgrade from versions that do not support v2 of view building code we can remove upgrade code and make sure we do not boot with old builder version.	2026-03-18 17:45:40 +02:00
Amnon Heiman	03d7ab17c9	storage_proxy: migrate CAS contention histograms to estimated_histogram_with_max Replace CAS contention histograms in storage proxy stats with estimated_histogram_with_max<128> and switch metrics/API aggregation to the new histogram path. Introduce a dedicated cas_contention_histogram alias and use it for cas_read_contention and cas_write_contention. Update API histogram reduction to merge the new histogram type via estimated_histogram_with_max_merge. Convert API JSON serialization to explicit offsets/counts using get_buckets_offsets() and get_buckets_counts(). Export CAS contention metrics with to_metrics_histogram(...) instead of the legacy get_histogram(1, 8) path for consistent bucket handling.	2026-03-12 14:10:35 +01:00
Amnon Heiman	cedd049218	estimated_histogram.hh: Add bucket offset and count to approx_exponential_histogram Add utility accessors to approx_exponential_histogram to export bucket boundaries and bucket counts in a form suitable for display/tests when Min < Precision causes repeated integer limits. Add MAX compile-time constant alias for the template Max parameter. Add get_buckets_offsets() to return bucket lower limits with duplicate adjacent limits removed. Add get_buckets_counts() to return counts aligned with the deduplicated limits, merging counts from buckets that share the same lower limit. Keep existing histogram behavior unchanged. This new functionality is intended for API use and not for performance-critical paths. Signed-off-by: Amnon Heiman <amnon@scylladb.com>	2026-03-12 14:04:40 +01:00
Nadav Har'El	b411d436de	config: move named_value<T> method bodies out-of-line The previous commit added extern template declarations to suppress named_value<T> instantiation in every translation units, but those only suppress non-inline members. All method bodies defined inside the class body were inline and thus exempt from extern template, so they were still emitted as weak symbols in every TU that used them. Fix this by moving all named_value<T> method definitions out of the class body in config_file.hh and into config_file_impl.hh as out-of-line template definitions. Since config_file_impl.hh is included only by db/config.cc, utils/config_file.cc, sstables/compressor.cc, and ent/encryption/encryption_config.cc, the method bodies are now compiled in only those four TUs. Also add the two missing explicit instantiation pairs that caused linker errors: - named_value<vector<object_storage_endpoint_param>> in db/config.cc - named_value<encryption_config::string_string_map> in encryption_config.cc	2026-03-11 13:20:03 +02:00
Nadav Har'El	e0c13518ae	config: suppress named_value<T> instantiation in every source file config.hh is included by a large fraction of the codebase. It pulls in utils/config_file.hh, whose named_value<T> template has its method bodies defined in config_file_impl.hh. Those bodies depend on three of the heaviest Boost headers – boost/program_options.hpp, boost/lexical_cast.hpp, and boost/regex.hpp – as well as yaml-cpp. Because the method bodies are reachable from config.hh, every translation unit that includes config.hh was silently instantiating all of named_value<T>'s methods (for each distinct T) and compiling that Boost/yaml-cpp machinery from scratch. Fix this by adding extern template struct declarations for all 32 distinct named_value<T> specialisations used by db::config: - the 14 primitive/stdlib types go into utils/config_file.hh - the 18 db-specific types (enum_option<…>, seed_provider_type, etc.) go into db/config.hh Matching explicit template struct instantiation definitions are added in db/config.cc, which is already the only translation unit that includes config_file_impl.hh. As a result the Boost/yaml-cpp template machinery is compiled exactly once (in config.o) instead of being re-instantiated in every including TU. One subtlety: named_value<seed_provider_type> has an explicit member specialisation of add_command_line_option. Per [temp.expl.spec], such a specialisation must be declared before any extern template declaration of the enclosing class template, so a forward declaration of the specialisation is added to config.hh ahead of the extern template line. Also, for some of the types we explicitly instantiated in db/config.cc, the named_value<T> constructor calls config_type_for<T>(), which we also need to provide explicit specializations - some of them we already had but some were missing. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2026-03-11 11:30:39 +02:00

2760 changed files with 47641 additions and 19647 deletions

8

.github/CODEOWNERS vendored

View File

 tests/counter_test* @nuivall
 # DOCS
-docs/* @annastuchlik @tzach
+/docs/ @annastuchlik @tzach
-docs/alternator @annastuchlik @tzach @nyh
+/docs/alternator/ @annastuchlik @tzach @nyh
 # GOSSIP
 gms/* @tgrabiec @asias @kbr-scylla
 # PYTEST-BASED CQL TESTS
 test/cqlpy/* @nyh
+# TEST FRAMEWORK
+test/pylib/* @xtrey
+test.py @xtrey
 # RAFT
 raft/* @kbr-scylla @gleb-cloudius @kostja
 test/raft/* @kbr-scylla @gleb-cloudius @kostja

									
										33

.github/copilot-instructions.md
									
										vendored
									
												View File
												
					@@ -5,13 +5,14 @@ High-performance distributed NoSQL database. Core values: performance, correctne

					## Build System

					## Build System

					### Modern Build (configure.py + ninja)

					### Using native OS environment

					```bash

					```bash

					# Configure (run once per mode, or when switching modes)

					# Configure (run once)

					./configure.py --mode=<mode>  # mode: dev, debug, release, sanitize

					./configure.py

					# Build everything

					# Build everything

					ninja <mode>-build  # e.g., ninja dev-build

					ninja <mode>-build  # modes: dev, debug, release, sanitize

					                    # dev is recommended for development (fastest compilation)

					# Build Scylla binary only (sufficient for Python integration tests)

					# Build Scylla binary only (sufficient for Python integration tests)

					ninja build/<mode>/scylla

					ninja build/<mode>/scylla

					@@ -20,6 +21,9 @@ ninja build/<mode>/scylla

					ninja build/<mode>/test/boost/<test_name>

					ninja build/<mode>/test/boost/<test_name>

					```

					```

					### Using frozen toolchain (Docker)

					Prefix any build command with `./tools/toolchain/dbuild`.

					## Running Tests

					## Running Tests

					### C++ Unit Tests

					### C++ Unit Tests

					@@ -36,9 +40,9 @@ ninja build/<mode>/test/boost/<test_name>

					```

					```

					**Important:** 

					**Important:** 

					- Use full path with `.cc` extension (e.g., `test/boost/test_name.cc`, not `boost/test_name`)

					- Use full path with `.cc` extension (e.g., `test/boost/memtable_test.cc`)

					- To run a single test case, append `::<test_case_name>` to the file path

					- To run a single test case, append `::<test_case_name>` to the file path

					- If you encounter permission issues with cgroup metric gathering, add `--no-gather-metrics` flag

					- If you encounter permission issues with cgroup metrics, add `--no-gather-metrics` to the `./test.py` command

					**Rebuilding Tests:**

					**Rebuilding Tests:**

					- test.py does NOT automatically rebuild when test source files are modified

					- test.py does NOT automatically rebuild when test source files are modified

					@@ -60,25 +64,21 @@ ninja build/<mode>/scylla

					# Run a single test case from a file

					# Run a single test case from a file

					./test.py --mode=<mode> test/<suite>/<test_name>.py::<test_function_name>

					./test.py --mode=<mode> test/<suite>/<test_name>.py::<test_function_name>

					# Run all tests in a directory

					./test.py --mode=<mode> test/<suite>/

					# Examples

					# Examples

					./test.py --mode=dev test/alternator/

					./test.py --mode=dev test/alternator/

					./test.py --mode=dev test/cluster/test_raft_voters.py::test_raft_limited_voters_retain_coordinator

					./test.py --mode=dev test/cqlpy/test_json.py

					./test.py --mode=dev test/cqlpy/test_json.py

					./test.py --mode=dev test/cluster/test_raft_voters.py::test_raft_limited_voters_retain_coordinator

					# Optional flags

					# Optional flags

					./test.py --mode=dev test/cluster/test_raft_no_quorum.py -v  # Verbose output

					./test.py --mode=dev test/cluster/test_raft_no_quorum.py -v --repeat 5

					./test.py --mode=dev test/cluster/test_raft_no_quorum.py --repeat 5  # Repeat test 5 times

					```

					```

					**Important:**

					**Important:**

					- Use full path with `.py` extension (e.g., `test/cluster/test_raft_no_quorum.py`, not `cluster/test_raft_no_quorum`)

					- Use full path with `.py` extension

					- To run a single test case, append `::<test_function_name>` to the file path

					- To run a single test case, append `::<test_function_name>` to the file path

					- Add `-v` for verbose output

					- Add `-v` for verbose output

					- Add `--repeat <num>` to repeat a test multiple times

					- Add `--repeat <num>` to repeat a test multiple times

					- After modifying C++ source files, only rebuild the Scylla binary for Python tests - building the entire repository is unnecessary

					- After modifying C++ source files, only rebuild the Scylla binary for Python tests

					## Code Philosophy

					## Code Philosophy

					- Performance matters in hot paths (data read/write, inner loops)

					- Performance matters in hot paths (data read/write, inner loops)

					@@ -92,10 +92,13 @@ ninja build/<mode>/scylla

					## Test Philosophy

					## Test Philosophy

					- Performance matters. Tests should run as quickly as possible. Sleeps in the code are highly discouraged and should be avoided, to reduce run time and flakiness.

					- Performance matters. Tests should run as quickly as possible. Sleeps in the code are highly discouraged and should be avoided, to reduce run time and flakiness.

					- Stability matters. Tests should be stable. New tests should be executed 100 times at least to ensure they pass 100 out of 100 times. (use --repeat 100 --max-failures 1 when running it)

					- Stability matters. Tests should be stable. New tests should be executed 100 times at least to ensure they pass 100 out of 100 times. (use --repeat 100 --max-failures 1 when running it)

					- Unit tests should ideally test one thing and one thing only.

					- Unit tests should ideally test one thing only.

					- Tests for bug fixes should run before the fix - and show the failure and after the fix - and show they now pass.

					- Tests for bug fixes should run before the fix - and show the failure and after the fix - and show they now pass.

					- Tests for bug fixes should have in their comments which bug fixes (GitHub or JIRA issue) they test.

					- Tests for bug fixes should have in their comments which bug fixes (GitHub or JIRA issue) they test.

					- Tests in debug are always slower, so if needed, reduce number of iterations, rows, data used, cycles, etc. in debug mode.

					- Tests in debug are always slower, so if needed, reduce number of iterations, rows, data used, cycles, etc. in debug mode.

					- Tests should strive to be repeatable, and not use random input that will make their results unpredictable.

					- Tests should strive to be repeatable, and not use random input that will make their results unpredictable.

					- Tests should consume as little resources as possible. Prefer running tests on a single node if it is sufficient, for example.

					- Tests should consume as little resources as possible. Prefer running tests on a single node if it is sufficient, for example.

					## New Files

					- Include `LicenseRef-ScyllaDB-Source-Available-1.1` in the SPDX header

					- Use the current year for new files; for existing code keep the year as is

									
										14

.github/instructions/cpp.instructions.md
									
										vendored
									
												View File
												
					@@ -25,6 +25,8 @@ applyTo: "**/*.{cc,hh}"

					- Use `seastar::gate` for shutdown coordination

					- Use `seastar::gate` for shutdown coordination

					- Use `seastar::semaphore` for resource limiting (not `std::mutex`)

					- Use `seastar::semaphore` for resource limiting (not `std::mutex`)

					- Break long loops with `maybe_yield()` to avoid reactor stalls

					- Break long loops with `maybe_yield()` to avoid reactor stalls

					- Most Scylla code runs on a single shard where atomics are unnecessary

					- Use Seastar message passing for cross-shard communication

					## Coroutines

					## Coroutines

					```cpp

					```cpp

					@@ -36,10 +38,16 @@ seastar::future<T> func() {

					## Error Handling

					## Error Handling

					- Throw exceptions for errors (futures propagate them automatically)

					- Throw exceptions for errors (futures propagate them automatically)

					- In coroutines, use `co_await coroutine::return_exception_ptr()` or `co_return coroutine::exception()` to avoid the overhead of throwing

					- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead

					- In data path: avoid exceptions, use `std::expected` (or `boost::outcome`) instead

					- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)

					- Use standard exceptions (`std::runtime_error`, `std::invalid_argument`)

					- Database-specific: throw appropriate schema/query exceptions

					- Database-specific: throw appropriate schema/query exceptions

					## Invariant Checking

					- Prefer `throwing_assert()` (`utils/assert.hh`), it logs and throws instead of aborting

					- Use `SCYLLA_ASSERT` where critical to system stability where no clean shutdown is possible, it aborts

					- Use `on_internal_error()` for should-never-happen conditions that should be logged with backtrace

					## Performance

					## Performance

					- Pass large objects by `const&` or `&&` (move semantics)

					- Pass large objects by `const&` or `&&` (move semantics)

					- Use `std::string_view` for non-owning string references

					- Use `std::string_view` for non-owning string references

					@@ -68,7 +76,7 @@ seastar::future<T> func() {

					- Use `#pragma once`

					- Use `#pragma once`

					- Include order: own header, C++ std, Seastar, Boost, project headers

					- Include order: own header, C++ std, Seastar, Boost, project headers

					- Forward declare when possible

					- Forward declare when possible

					- Never `using namespace` in headers (exception: `using namespace seastar` is globally available via `seastarx.hh`)

					- Never `using namespace` in headers. Exception: most headers include `seastarx.hh`, which provides `using namespace seastar` project-wide.

					## Documentation

					## Documentation

					- Public APIs require clear documentation

					- Public APIs require clear documentation

					@@ -101,10 +109,8 @@ seastar::future<T> func() {

					- `malloc`/`free`

					- `malloc`/`free`

					- `printf` family (use logging or fmt)

					- `printf` family (use logging or fmt)

					- Raw pointers for ownership

					- Raw pointers for ownership

					- `using namespace` in headers

					- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)

					- Blocking operations: `std::sleep`, `std::read`, `std::mutex` (use Seastar equivalents)

					- `std::atomic` (reserved for very special circumstances only)

					- New ad-hoc macros (prefer `inline`, `constexpr`, or templates; established project macros like `SCYLLA_ASSERT` are fine)

					- Macros (use `inline`, `constexpr`, or templates instead)

					## Testing

					## Testing

					When modifying existing code, follow TDD: create/update test first, then implement.

					When modifying existing code, follow TDD: create/update test first, then implement.

									
										4

.github/instructions/python.instructions.md
									
										vendored
									
												View File
												
					@@ -7,7 +7,7 @@ applyTo: "**/*.py"

					**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.

					**Important:** Match existing code style. Some directories (like `test/cqlpy` and `test/alternator`) prefer simplicity over type hints and docstrings.

					## Style

					## Style

					- Follow PEP 8

					- Match style of the file and directory you are editing; fall back to PEP 8 if unclear

					- Use type hints for function signatures (unless directory style omits them)

					- Use type hints for function signatures (unless directory style omits them)

					- Use f-strings for formatting

					- Use f-strings for formatting

					- Line length: 160 characters max

					- Line length: 160 characters max

					@@ -25,7 +25,7 @@ from cassandra.cluster import Cluster

					from test.utils import setup_keyspace

					from test.utils import setup_keyspace

					```

					```

					Never use `from module import *`

					Avoid wildcard imports (`from module import *`).

					## Documentation

					## Documentation

					All public functions/classes need docstrings (unless the current directory conventions omit them):

					All public functions/classes need docstrings (unless the current directory conventions omit them):

									
										2

.github/scripts/check-license.py
									
										vendored
									
												View File
												
					@@ -4,7 +4,7 @@

					# Copyright (C) 2024-present ScyllaDB

					# Copyright (C) 2024-present ScyllaDB

					#

					#

					#

					#

					# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					# SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					#

					#

					import argparse

					import argparse

									
										5

.github/workflows/add-label-when-promoted.yaml
									
										vendored
									
												View File
												
					@@ -10,6 +10,9 @@ on:

					    types: [labeled, unlabeled]

					    types: [labeled, unlabeled]

					    branches: [master, next, enterprise]

					    branches: [master, next, enterprise]

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  check-commit:

					  check-commit:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					@@ -30,7 +33,7 @@ jobs:

					            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV

					            echo "DEFAULT_BRANCH=master" >> $GITHUB_ENV

					          fi

					          fi

					      - name: Checkout repository

					      - name: Checkout repository

					        uses: actions/checkout@v4

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          repository: ${{ github.repository }}

					          repository: ${{ github.repository }}

					          ref: ${{ env.DEFAULT_BRANCH }}

					          ref: ${{ env.DEFAULT_BRANCH }}

									
										5

.github/workflows/backport-pr-fixes-validation.yaml
									
										vendored
									
												View File
												
					@@ -5,6 +5,9 @@ on:

					    types: [opened, reopened, edited]

					    types: [opened, reopened, edited]

					    branches: [branch-*]

					    branches: [branch-*]

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  check-fixes-prefix:

					  check-fixes-prefix:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					@@ -13,7 +16,7 @@ jobs:

					      issues: write

					      issues: write

					    steps:

					    steps:

					      - name: Check PR body for "Fixes" prefix patterns

					      - name: Check PR body for "Fixes" prefix patterns

					        uses: actions/github-script@v7

					        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0

					        with:

					        with:

					          script: |

					          script: |

					            const body = context.payload.pull_request.body;

					            const body = context.payload.pull_request.body;

									
										5

.github/workflows/build-scylla.yaml
									
										vendored
									
												View File
												
					@@ -12,6 +12,9 @@ on:

					        description: 'the md5sum for scylla executable'

					        description: 'the md5sum for scylla executable'

					        value: ${{ jobs.build.outputs.md5sum }}

					        value: ${{ jobs.build.outputs.md5sum }}

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  read-toolchain:

					  read-toolchain:

					    uses: ./.github/workflows/read-toolchain.yaml

					    uses: ./.github/workflows/read-toolchain.yaml

					@@ -24,7 +27,7 @@ jobs:

					    outputs:

					    outputs:

					      md5sum: ${{ steps.checksum.outputs.md5sum }}

					      md5sum: ${{ steps.checksum.outputs.md5sum }}

					    steps:

					    steps:

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          submodules: recursive

					          submodules: recursive

					      - name: Generate the building system

					      - name: Generate the building system

									
										5

.github/workflows/call_validate_pr_author_email.yml
									
										vendored
									
												View File
												
					@@ -7,6 +7,11 @@ on:

					      - synchronize

					      - synchronize

					      - reopened

					      - reopened

					permissions:

					  contents: read

					  pull-requests: write

					  statuses: write

					jobs:

					jobs:

					  validate_pr_author_email:

					  validate_pr_author_email:

					    uses: scylladb/github-automation/.github/workflows/validate_pr_author_email.yml@main

					    uses: scylladb/github-automation/.github/workflows/validate_pr_author_email.yml@main

									
										7

.github/workflows/check-license-header.yaml
									
										vendored
									
												View File
												
					@@ -7,8 +7,9 @@ on:

					env:

					env:

					  HEADER_CHECK_LINES: 10

					  HEADER_CHECK_LINES: 10

					  LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.0"

					  LICENSE: "LicenseRef-ScyllaDB-Source-Available-1.1"

					  CHECKED_EXTENSIONS: ".cc .hh .py"

					  CHECKED_EXTENSIONS: ".cc .hh .py"

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  check-license-headers:

					  check-license-headers:

					@@ -19,7 +20,7 @@ jobs:

					    steps:

					    steps:

					      - name: Checkout code

					      - name: Checkout code

					        uses: actions/checkout@v4

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          fetch-depth: 0

					          fetch-depth: 0

					@@ -40,7 +41,7 @@ jobs:

					      - name: Comment on PR if check fails

					      - name: Comment on PR if check fails

					        if: failure()

					        if: failure()

					        uses: actions/github-script@v7

					        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0

					        with:

					        with:

					          script: |

					          script: |

					            const license = '${{ env.LICENSE }}';

					            const license = '${{ env.LICENSE }}';

									
										3

.github/workflows/clang-nightly.yaml
									
										vendored
									
												View File
												
					@@ -9,6 +9,7 @@ env:

					  # use the development branch explicitly

					  # use the development branch explicitly

					  CLANG_VERSION: 21

					  CLANG_VERSION: 21

					  BUILD_DIR: build

					  BUILD_DIR: build

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					permissions: {}

					permissions: {}

					@@ -32,7 +33,7 @@ jobs:

					    steps:

					    steps:

					      - run: |

					      - run: |

					          sudo dnf -y install git

					          sudo dnf -y install git

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          submodules: true

					          submodules: true

					      - name: Install build dependencies

					      - name: Install build dependencies

									
										3

.github/workflows/clang-tidy.yaml
									
										vendored
									
												View File
												
					@@ -18,6 +18,7 @@ env:

					  BUILD_TYPE: RelWithDebInfo

					  BUILD_TYPE: RelWithDebInfo

					  BUILD_DIR: build

					  BUILD_DIR: build

					  CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'

					  CLANG_TIDY_CHECKS: '-*,bugprone-use-after-move'

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					permissions: {}

					permissions: {}

					@@ -42,7 +43,7 @@ jobs:

					          IMAGE: ${{ needs.read-toolchain.image }}

					          IMAGE: ${{ needs.read-toolchain.image }}

					        run: |

					        run: |

					          echo ${{ needs.read-toolchain.image }}

					          echo ${{ needs.read-toolchain.image }}

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          submodules: true

					          submodules: true

					      - run: |

					      - run: |

									
										5

.github/workflows/close_issue_for_scylla_associate.yml
									
										vendored
									
												View File
												
					@@ -7,13 +7,16 @@ on:

					permissions:

					permissions:

					  issues: write

					  issues: write

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  comment-and-close:

					  comment-and-close:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    steps:

					    steps:

					      - name: Comment and close if author email is scylladb.com

					      - name: Comment and close if author email is scylladb.com

					        uses: actions/github-script@v7

					        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0

					        with:

					        with:

					          github-token: ${{ secrets.GITHUB_TOKEN }}

					          github-token: ${{ secrets.GITHUB_TOKEN }}

					          script: |

					          script: |

									
										6

.github/workflows/codespell.yaml
									
										vendored
									
												View File
												
					@@ -4,13 +4,15 @@ on:

					    branches:

					    branches:

					      - master

					      - master

					permissions: {}

					permissions: {}

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  codespell:

					  codespell:

					    name: Check for spelling errors

					    name: Check for spelling errors

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    steps:

					    steps:

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					      - uses: codespell-project/actions-codespell@master

					      - uses: codespell-project/actions-codespell@8f01853be192eb0f849a5c7d721450e7a467c579 # v2.2

					        with:

					        with:

					          only_warn: 1

					          only_warn: 1

					          ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison,iif,tread"

					          ignore_words_list: "ans,datas,fo,ser,ue,crate,nd,reenable,strat,stap,te,raison,iif,tread"

									
										38

.github/workflows/compare-build-systems.yaml
									
										vendored
									
										Normal file
									
												View File
												
					@@ -0,0 +1,38 @@

					name: Compare Build Systems

					on:

					  pull_request:

					    branches:

					      - master

					    paths:

					      - 'configure.py'

					      - '**/CMakeLists.txt'

					      - 'cmake/**'

					      - 'scripts/compare_build_systems.py'

					  workflow_dispatch:

					permissions:

					  contents: read

					# cancel the in-progress run upon a repush

					concurrency:

					  group: ${{ github.workflow }}-${{ github.ref }}

					  cancel-in-progress: true

					jobs:

					  read-toolchain:

					    uses: ./.github/workflows/read-toolchain.yaml

					  compare:

					    name: Compare configure.py vs CMake

					    needs:

					      - read-toolchain

					    runs-on: ubuntu-latest

					    container: ${{ needs.read-toolchain.outputs.image }}

					    steps:

					      - uses: actions/checkout@v4

					        with:

					          submodules: true

					      - name: Compare build systems

					        run: |

					          git config --global --add safe.directory $GITHUB_WORKSPACE

					          python3 scripts/compare_build_systems.py --ci

									
										5

.github/workflows/conflict_reminder.yaml
									
										vendored
									
												View File
												
					@@ -12,13 +12,16 @@ on:

					  schedule:

					  schedule:

					    - cron: '0 10 * * 1'  # Runs every Monday at 10:00am

					    - cron: '0 10 * * 1'  # Runs every Monday at 10:00am

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  notify_conflict_prs:

					  notify_conflict_prs:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    steps:

					    steps:

					      - name: Notify PR Authors of Conflicts

					      - name: Notify PR Authors of Conflicts

					        uses: actions/github-script@v7

					        uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0

					        with:

					        with:

					          script: |

					          script: |

					            console.log("Starting conflict reminder script...");

					            console.log("Starting conflict reminder script...");

									
										7

.github/workflows/differential-shellcheck.yaml
									
										vendored
									
												View File
												
					@@ -13,6 +13,9 @@ on:

					permissions:

					permissions:

					  contents: read

					  contents: read

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  lint:

					  lint:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					@@ -21,12 +24,12 @@ jobs:

					      security-events: write

					      security-events: write

					    steps:

					    steps:

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          fetch-depth: 0

					          fetch-depth: 0

					      - name: Differential ShellCheck

					      - name: Differential ShellCheck

					        uses: redhat-plumbers-in-action/differential-shellcheck@v5

					        uses: redhat-plumbers-in-action/differential-shellcheck@d965e66ec0b3b2f821f75c8eff9b12442d9a7d1e # v5.5.6

					        with:

					        with:

					          severity: warning

					          severity: warning

					          token: ${{ secrets.GITHUB_TOKEN }}

					          token: ${{ secrets.GITHUB_TOKEN }}

									
										7

.github/workflows/docs-pages.yaml
									
										vendored
									
												View File
												
					@@ -5,6 +5,7 @@ name: "Docs / Publish"

					env:

					env:

					  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

					  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

					  DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}

					  DEFAULT_BRANCH: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'master' }}

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					on:

					on:

					  push:

					  push:

					@@ -25,17 +26,17 @@ jobs:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    steps:

					    steps:

					      - name: Checkout

					      - name: Checkout

					        uses: actions/checkout@v4

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          ref: ${{ env.DEFAULT_BRANCH }}

					          ref: ${{ env.DEFAULT_BRANCH }}

					          persist-credentials: false

					          persist-credentials: false

					          fetch-depth: 0

					          fetch-depth: 0

					      - name: Set up Python

					      - name: Set up Python

					        uses: actions/setup-python@v5

					        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0

					        with:

					        with:

					          python-version: "3.12"

					          python-version: "3.12"

					      - name: Install uv

					      - name: Install uv

					        uses: astral-sh/setup-uv@v6

					        uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0

					      - name: Set up env

					      - name: Set up env

					        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

					        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

					      - name: Build docs

					      - name: Build docs

									
										7

.github/workflows/docs-pr.yaml
									
										vendored
									
												View File
												
					@@ -7,6 +7,7 @@ permissions:

					env:

					env:

					  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

					  FLAG: ${{ github.repository == 'scylladb/scylla-enterprise' && 'enterprise' || 'opensource' }}

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					on:

					on:

					  pull_request:

					  pull_request:

					@@ -22,16 +23,16 @@ jobs:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    steps:

					    steps:

					      - name: Checkout

					      - name: Checkout

					        uses: actions/checkout@v4

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          persist-credentials: false

					          persist-credentials: false

					          fetch-depth: 0

					          fetch-depth: 0

					      - name: Set up Python

					      - name: Set up Python

					        uses: actions/setup-python@v5

					        uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0

					        with:

					        with:

					          python-version: "3.12"

					          python-version: "3.12"

					      - name: Install uv

					      - name: Install uv

					        uses: astral-sh/setup-uv@v6

					        uses: astral-sh/setup-uv@cec208311dfd045dd5311c1add060b2062131d57 # v8.0.0

					      - name: Set up env

					      - name: Set up env

					        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

					        run: make -C docs FLAG="${{ env.FLAG }}" setupenv

					      - name: Build docs

					      - name: Build docs

									
										7

.github/workflows/docs-validate-metrics.yml
									
										vendored
									
												View File
												
					@@ -3,6 +3,9 @@ name: Docs / Validate metrics

					permissions:

					permissions:

					  contents: read

					  contents: read

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					on:

					on:

					  pull_request:

					  pull_request:

					    branches:

					    branches:

					@@ -21,12 +24,12 @@ jobs:

					    steps:

					    steps:

					    - name: Checkout code

					    - name: Checkout code

					      uses: actions/checkout@v4

					      uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					      with:

					      with:

					        submodules: true

					        submodules: true

					    - name: Set up Python

					    - name: Set up Python

					      uses: actions/setup-python@v6

					      uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0

					      with:

					      with:

					        python-version: '3.10'

					        python-version: '3.10'

									
										5

.github/workflows/iwyu.yaml
									
										vendored
									
												View File
												
					@@ -13,6 +13,7 @@ env:

					  # supposed to be processed by idl-compiler.py, so we don't check them using the cleaner

					  # supposed to be processed by idl-compiler.py, so we don't check them using the cleaner

					  CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction db dht gms index lang message mutation mutation_writer node_ops raft redis replica service

					  CLEANER_DIRS: test/unit exceptions alternator api auth cdc compaction db dht gms index lang message mutation mutation_writer node_ops raft redis replica service

					  SEASTAR_BAD_INCLUDE_OUTPUT_PATH: build/seastar-bad-include.log

					  SEASTAR_BAD_INCLUDE_OUTPUT_PATH: build/seastar-bad-include.log

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					permissions:

					permissions:

					  contents: read

					  contents: read

					@@ -32,7 +33,7 @@ jobs:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    container: ${{ needs.read-toolchain.outputs.image }}

					    container: ${{ needs.read-toolchain.outputs.image }}

					    steps:

					    steps:

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          submodules: true

					          submodules: true

					      - name: Generate compilation database

					      - name: Generate compilation database

					@@ -89,7 +90,7 @@ jobs:

					            | tee "$SEASTAR_BAD_INCLUDE_OUTPUT_PATH"

					            | tee "$SEASTAR_BAD_INCLUDE_OUTPUT_PATH"

					      - run: |

					      - run: |

					          echo "::remove-matcher owner=seastar-bad-include::"

					          echo "::remove-matcher owner=seastar-bad-include::"

					      - uses: actions/upload-artifact@v4

					      - uses: actions/upload-artifact@bbbca2ddaa5d8feaa63e36b76fdaad77386f024f # v7.0.0

					        with:

					        with:

					          name: Logs

					          name: Logs

					          path: |

					          path: |

									
										3

.github/workflows/make-pr-ready-for-review.yaml
									
										vendored
									
												View File
												
					@@ -7,6 +7,7 @@ on:

					env:

					env:

					  DEFAULT_BRANCH: 'master'

					  DEFAULT_BRANCH: 'master'

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  mark-ready:

					  mark-ready:

					@@ -17,7 +18,7 @@ jobs:

					    steps:

					    steps:

					      - name: Checkout repository

					      - name: Checkout repository

					        uses: actions/checkout@v4

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          repository: ${{ github.repository }}

					          repository: ${{ github.repository }}

					          ref: ${{ env.DEFAULT_BRANCH }}

					          ref: ${{ env.DEFAULT_BRANCH }}

									
										4

.github/workflows/pr-require-backport-label.yaml
									
										vendored
									
												View File
												
					@@ -5,6 +5,8 @@ on:

					    branches:

					    branches:

					      - master

					      - master

					      - next

					      - next

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  label:

					  label:

					    if: github.event.pull_request.draft == false

					    if: github.event.pull_request.draft == false

					@@ -15,7 +17,7 @@ jobs:

					    steps:

					    steps:

					      - name: Wait for label to be added

					      - name: Wait for label to be added

					        run: sleep 1m

					        run: sleep 1m

					      - uses: mheap/github-action-required-labels@v5

					      - uses: mheap/github-action-required-labels@0ac283b4e65c1fb28ce6079dea5546ceca98ccbe # v5.5.2

					        with:

					        with:

					          mode: minimum

					          mode: minimum

					          count: 1

					          count: 1

									
										5

.github/workflows/read-toolchain.yaml
									
										vendored
									
												View File
												
					@@ -7,6 +7,9 @@ on:

					        description: "the toolchain docker image"

					        description: "the toolchain docker image"

					        value: ${{ jobs.read-toolchain.outputs.image }}

					        value: ${{ jobs.read-toolchain.outputs.image }}

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  read-toolchain:

					  read-toolchain:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					@@ -15,7 +18,7 @@ jobs:

					    outputs:

					    outputs:

					      image: ${{ steps.read.outputs.image }}

					      image: ${{ steps.read.outputs.image }}

					    steps:

					    steps:

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          sparse-checkout: tools/toolchain/image

					          sparse-checkout: tools/toolchain/image

					          sparse-checkout-cone-mode: false

					          sparse-checkout-cone-mode: false

									
										5

.github/workflows/seastar.yaml
									
										vendored
									
												View File
												
					@@ -13,6 +13,7 @@ concurrency:

					env:

					env:

					  BUILD_DIR: build

					  BUILD_DIR: build

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  read-toolchain:

					  read-toolchain:

					@@ -29,12 +30,12 @@ jobs:

					          - RelWithDebInfo

					          - RelWithDebInfo

					          - Dev

					          - Dev

					    steps:

					    steps:

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          submodules: true

					          submodules: true

					      - run: |

					      - run: |

					          rm -rf seastar

					          rm -rf seastar

					      - uses: actions/checkout@v4

					      - uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          repository: scylladb/seastar

					          repository: scylladb/seastar

					          submodules: true

					          submodules: true

									
										5

.github/workflows/sync-labels.yaml
									
										vendored
									
												View File
												
					@@ -7,6 +7,9 @@ on:

					  issues:

					  issues:

					    types: [labeled, unlabeled]

					    types: [labeled, unlabeled]

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  label-sync:

					  label-sync:

					    if: ${{ github.repository == 'scylladb/scylladb' }}

					    if: ${{ github.repository == 'scylladb/scylladb' }}

					@@ -21,7 +24,7 @@ jobs:

					          GITHUB_CONTEXT: ${{ toJson(github) }}

					          GITHUB_CONTEXT: ${{ toJson(github) }}

					        run: echo "$GITHUB_CONTEXT"

					        run: echo "$GITHUB_CONTEXT"

					      - name: Checkout repository

					      - name: Checkout repository

					        uses: actions/checkout@v4

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          sparse-checkout: |

					          sparse-checkout: |

					            .github/scripts/sync_labels.py

					            .github/scripts/sync_labels.py

									
										5

.github/workflows/trigger_ci.yaml
									
										vendored
									
												View File
												
					@@ -6,6 +6,9 @@ on:

					  issue_comment:

					  issue_comment:

					    types: [created]

					    types: [created]

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  trigger-ci:

					  trigger-ci:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					@@ -15,7 +18,7 @@ jobs:

					          GITHUB_CONTEXT: ${{ toJson(github) }}

					          GITHUB_CONTEXT: ${{ toJson(github) }}

					        run: echo "$GITHUB_CONTEXT"

					        run: echo "$GITHUB_CONTEXT"

					      - name: Checkout PR code

					      - name: Checkout PR code

					        uses: actions/checkout@v3

					        uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2

					        with:

					        with:

					          fetch-depth: 0  # Needed to access full history

					          fetch-depth: 0  # Needed to access full history

					          ref: ${{ github.event.pull_request.head.ref }}

					          ref: ${{ github.event.pull_request.head.ref }}

									
										5

.github/workflows/urgent_issue_reminder.yml
									
										vendored
									
												View File
												
					@@ -4,13 +4,16 @@ on:

					  schedule:

					  schedule:

					    - cron: '10 8 * * *' # Runs daily at 8 AM

					    - cron: '10 8 * * *' # Runs daily at 8 AM

					env:

					  FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

					jobs:

					jobs:

					  reminder:

					  reminder:

					    runs-on: ubuntu-latest

					    runs-on: ubuntu-latest

					    steps:

					    steps:

					    - name: Send reminders

					    - name: Send reminders

					      uses: actions/github-script@v7

					      uses: actions/github-script@ed597411d8f924073f98dfc5c65a23a2325f34cd # v8.0.0

					      with:

					      with:

					        script: |

					        script: |

					          const labelFilters = ['P0', 'P1', 'Field-Tier1','status/release blocker', 'status/regression']; 

					          const labelFilters = ['P0', 'P1', 'Field-Tier1','status/release blocker', 'status/regression'];

									
										16

AGENTS.md
									
										Normal file
									
												View File
												
					@@ -0,0 +1,16 @@

					# ScyllaDB — AI Agent Instructions

					This file routes you to the relevant instruction files.

					Do NOT load all files at once — read only what applies to your current task.

					## Instruction Files

					- `.github/copilot-instructions.md` — build system, test runner, code philosophy, test philosophy

					- `.github/instructions/cpp.instructions.md` — C++ style, Seastar patterns, memory, error handling (for `*.cc`, `*.hh`)

					- `.github/instructions/python.instructions.md` — Python style, testing conventions (for `*.py`)

					## Which files to read

					- **Always read** `.github/copilot-instructions.md` for build/test commands and project values

					- **If editing C++ files** (`*.cc`, `*.hh`): also read `.github/instructions/cpp.instructions.md`

					- **If editing Python files** (`*.py`): also read `.github/instructions/python.instructions.md`

									
										82

CMakeLists.txt
									
												View File
												
					@@ -2,6 +2,12 @@ cmake_minimum_required(VERSION 3.27)

					project(scylla)

					project(scylla)

					# Disable CMake's automatic -fcolor-diagnostics injection (CMake 3.24+ adds

					# it for Clang+Ninja). configure.py does not add any color diagnostics flags,

					# so we clear the internal CMake variable to prevent injection.

					set(CMAKE_CXX_COMPILE_OPTIONS_COLOR_DIAGNOSTICS "")

					set(CMAKE_C_COMPILE_OPTIONS_COLOR_DIAGNOSTICS "")

					list(APPEND CMAKE_MODULE_PATH

					list(APPEND CMAKE_MODULE_PATH

					  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

					  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

					  ${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)

					  ${CMAKE_CURRENT_SOURCE_DIR}/seastar/cmake)

					@@ -51,6 +57,16 @@ set(CMAKE_CXX_EXTENSIONS ON CACHE INTERNAL "")

					set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")

					set(CMAKE_CXX_SCAN_FOR_MODULES OFF CACHE INTERNAL "")

					set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)

					set(CMAKE_VISIBILITY_INLINES_HIDDEN ON)

					# Global defines matching configure.py

					# Since gcc 13, libgcc doesn't need the exception workaround

					add_compile_definitions(SEASTAR_NO_EXCEPTION_HACK)

					# Hacks needed to expose internal APIs for xxhash dependencies

					add_compile_definitions(XXH_PRIVATE_API)

					# SEASTAR_TESTING_MAIN is added later (after add_subdirectory(seastar) and

					# add_subdirectory(abseil)) to avoid leaking into the seastar subdirectory.

					# If SEASTAR_TESTING_MAIN is defined globally before seastar, it causes a

					# duplicate 'main' symbol in seastar_testing.

					if(is_multi_config)

					if(is_multi_config)

					    find_package(Seastar)

					    find_package(Seastar)

					    # this is atypical compared to standard ExternalProject usage:

					    # this is atypical compared to standard ExternalProject usage:

					@@ -96,12 +112,33 @@ else()

					    set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

					    set(Seastar_EXCLUDE_APPS_FROM_ALL ON CACHE BOOL "" FORCE)

					    set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

					    set(Seastar_EXCLUDE_TESTS_FROM_ALL ON CACHE BOOL "" FORCE)

					    set(Seastar_IO_URING ON CACHE BOOL "" FORCE)

					    set(Seastar_IO_URING ON CACHE BOOL "" FORCE)

					    set(Seastar_SCHEDULING_GROUPS_COUNT 21 CACHE STRING "" FORCE)

					    set(Seastar_SCHEDULING_GROUPS_COUNT 24 CACHE STRING "" FORCE)

					    set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

					    set(Seastar_UNUSED_RESULT_ERROR ON CACHE BOOL "" FORCE)

					    # Match configure.py's build_seastar_shared_libs: Debug and Dev

					    # build Seastar as a shared library, others build it static.

					    if(CMAKE_BUILD_TYPE STREQUAL "Debug" OR CMAKE_BUILD_TYPE STREQUAL "Dev")

					        set(BUILD_SHARED_LIBS ON CACHE BOOL "" FORCE)

					    else()

					        set(BUILD_SHARED_LIBS OFF CACHE BOOL "" FORCE)

					    endif()

					    add_subdirectory(seastar)

					    add_subdirectory(seastar)

					    target_compile_definitions (seastar

					      PRIVATE

					    # Coverage mode sets cmake_build_type='Debug' for Seastar

					        SEASTAR_NO_EXCEPTION_HACK)

					    # (configure.py:515), so Seastar's pkg-config output includes sanitizer

					    # link flags in seastar_libs_coverage (configure.py:2514,2649).

					    # Seastar's own CMake only activates sanitizer targets for Debug/Sanitize

					    # configs, so we inject link options on the seastar target for Coverage.

					    # Using PUBLIC ensures they propagate to all targets linking Seastar

					    # (but not standalone tools like patchelf), matching configure.py's

					    # behavior.  Compile-time flags and defines are handled globally in

					    # cmake/mode.Coverage.cmake.

					    if(CMAKE_BUILD_TYPE STREQUAL "Coverage")

					        target_link_options(seastar

					            PUBLIC

					                -fsanitize=address

					                -fsanitize=undefined

					                -fsanitize=vptr)

					    endif()

					endif()

					endif()

					set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)

					set(ABSL_PROPAGATE_CXX_STD ON CACHE BOOL "" FORCE)

					@@ -111,8 +148,10 @@ if(Scylla_ENABLE_LTO)

					endif()

					endif()

					find_package(Sanitizers QUIET)

					find_package(Sanitizers QUIET)

					# Match configure.py:2192 — abseil gets sanitizer flags with -fno-sanitize=vptr

					# to exclude vptr checks which are incompatible with abseil's usage.

					list(APPEND absl_cxx_flags

					list(APPEND absl_cxx_flags

					    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>>)

					    $<$<CONFIG:Debug,Sanitize>:$<TARGET_PROPERTY:Sanitizers::address,INTERFACE_COMPILE_OPTIONS>;$<TARGET_PROPERTY:Sanitizers::undefined_behavior,INTERFACE_COMPILE_OPTIONS>;-fno-sanitize=vptr>)

					if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

					if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

					    list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})

					    list(APPEND ABSL_GCC_FLAGS ${absl_cxx_flags})

					elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

					elseif(CMAKE_CXX_COMPILER_ID STREQUAL "Clang")

					@@ -137,9 +176,38 @@ add_library(absl::headers ALIAS absl-headers)

					# unfortunately.

					# unfortunately.

					set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

					set_target_properties(absl_strerror PROPERTIES EXCLUDE_FROM_ALL TRUE)

					# Now that seastar and abseil subdirectories are fully processed, add

					# SEASTAR_TESTING_MAIN globally. This matches configure.py's global define

					# without leaking into seastar (which would cause duplicate main symbols).

					add_compile_definitions(SEASTAR_TESTING_MAIN)

					# System libraries dependencies

					# System libraries dependencies

					find_package(Boost REQUIRED

					find_package(Boost REQUIRED

					    COMPONENTS filesystem program_options system thread regex unit_test_framework)

					    COMPONENTS filesystem program_options system thread regex unit_test_framework)

					# When using shared Boost libraries, define BOOST_ALL_DYN_LINK (matching configure.py)

					if(NOT Boost_USE_STATIC_LIBS)

					    add_compile_definitions(BOOST_ALL_DYN_LINK)

					endif()

					# CMake's Boost package config adds per-component defines like

					# BOOST_UNIT_TEST_FRAMEWORK_DYN_LINK, BOOST_REGEX_DYN_LINK, etc. on the

					# imported targets. configure.py only uses BOOST_ALL_DYN_LINK (which covers

					# all components), so strip the per-component defines to align the two build

					# systems.

					foreach(_boost_target

					    Boost::unit_test_framework

					    Boost::regex

					    Boost::filesystem

					    Boost::program_options

					    Boost::system

					    Boost::thread)

					  if(TARGET ${_boost_target})

					    # Completely remove all INTERFACE_COMPILE_DEFINITIONS from the Boost target.

					    # This prevents per-component *_DYN_LINK and *_NO_LIB defines from

					    # propagating. BOOST_ALL_DYN_LINK (set globally) covers all components.

					    set_property(TARGET ${_boost_target} PROPERTY INTERFACE_COMPILE_DEFINITIONS)

					  endif()

					endforeach()

					target_link_libraries(Boost::regex

					target_link_libraries(Boost::regex

					  INTERFACE

					  INTERFACE

					    ICU::i18n

					    ICU::i18n

					@@ -196,6 +264,10 @@ if (Scylla_USE_PRECOMPILED_HEADER)

					    message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")

					    message(STATUS "Using precompiled header for Scylla - remember to add `sloppiness = pch_defines,time_macros` to ccache.conf, if you're using ccache.")

					    target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")

					    target_precompile_headers(scylla-precompiled-header PRIVATE "stdafx.hh")

					    target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)

					    target_compile_definitions(scylla-precompiled-header PRIVATE SCYLLA_USE_PRECOMPILED_HEADER)

					    # Match configure.py: -fpch-validate-input-files-content tells the compiler

					    # to check content of stdafx.hh if timestamps don't match (important for

					    # ccache/git workflows where timestamps may not be preserved).

					    add_compile_options(-fpch-validate-input-files-content)

					  endif()

					  endif()

					else()

					else()

					  set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)

					  set(Scylla_USE_PRECOMPILED_HEADER_USE OFF)

									
										46

LICENSE-ScyllaDB-Source-Available.md
									
												View File
												
					@@ -1,8 +1,8 @@

					## **SCYLLADB SOFTWARE LICENSE AGREEMENT**

					## **SCYLLADB SOFTWARE LICENSE AGREEMENT**

					| Version: | 1.0 |

					| Version: | 1.1 |

					| :---- | :---- |

					| :---- | :---- |

					| Last updated: | December 18, 2024 |

					| Last updated: | April 12, 2026 |

					**Your Acceptance**

					**Your Acceptance**

					@@ -12,20 +12,48 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using

					**Grant of License**

					**Grant of License**

					* **Software Definitions:** Software means the ScyllaDB software provided by Licensor, including the source code, object code, and any accompanying documentation or tools, or any part thereof, as made available under this Agreement.

					* **Definitions:**

					* **Grant of License:** Subject to the terms and conditions of this Agreement, Licensor grants You a limited, non-exclusive, revocable, non-sublicensable, non-transferable, royalty free license to Use the Software, in each case solely for the purposes of:

					  1. **Software:** Software means the ScyllaDB software provided by Licensor, including the source code, object code, and any accompanying documentation or tools, or any part thereof, as made available under this Agreement.

					  2. **Commercial Customer**: means any legal entity (including its Affiliates) that has entered into a transaction with Licensor, or an authorized reseller/distributor, for the provision of any ScyllaDB products or services. This includes, without limitation:  (a) Scope of Service: Any paid subscription, enterprise license, "BYOA" or Database-as-a-Service (DBaaS) offering, technical support, professional services, consulting, or training. (b) Scale and Volume: Any deployment regardless of size, capacity, or performance metrics (c) Payment Method: Any compensation model, including but not limited to, fixed-fee, consumption-based (On-Demand), committed spend, third-party marketplace credits (e.g., AWS, GCP, Azure), or promotional credits and discounts.

					* **Grant of License:** Subject to the terms and conditions of this Agreement, including the Eligibility and Exclusive Use Restrictions clause, Licensor grants You a limited, non-exclusive, revocable, non-sublicensable, non-transferable, royalty free license to Use the Software, in each case solely for the purposes of:

					  1) Copying, distributing, evaluating (including performing benchmarking or comparative tests or evaluations , subject to the limitations below) and improving the Software and ScyllaDB; and

					  1) Copying, distributing, evaluating (including performing benchmarking or comparative tests or evaluations , subject to the limitations below) and improving the Software and ScyllaDB; and

					  2) create a modified version of the Software (each, a "**Licensed Work**"); provided however, that each such Licensed Work keeps all or substantially all of the functions and features of the Software, and/or using all or substantially all of the source code of the Software. You hereby agree that all the Licensed Work are, upon creation, considered Licensed Work of the Licensor, shall be the sole property of the Licensor and its assignees, and the Licensor and its assignees shall be the sole owner of all rights of any kind or nature, in connection with such Licensed Work. You hereby irrevocably and unconditionally assign to the Licensor all the Licensed Work and any part thereof.  This License applies separately for each version of the Licensed Work, which shall be considered "Software" for the purpose of this Agreement.

					  2) create a modified version of the Software (each, a "**Licensed Work**"); provided however, that each such Licensed Work keeps all or substantially all of the functions and features of the Software, and/or using all or substantially all of the source code of the Software. You hereby agree that all the Licensed Work are, upon creation, considered Licensed Work of the Licensor, shall be the sole property of the Licensor and its assignees, and the Licensor and its assignees shall be the sole owner of all rights of any kind or nature, in connection with such Licensed Work. You hereby irrevocably and unconditionally assign to the Licensor all the Licensed Work and any part thereof.  This License applies separately for each version of the Licensed Work, which shall be considered "Software" for the purpose of this Agreement.

					* **Eligibility and Exclusive Use Restrictions**

					**License Limitations, Restrictions and Obligations:** The license grant above is subject to the following limitations, restrictions, and obligations. If Licensee’s Use of the Software does not comply with the above license grant or the terms of this section (including exceeding the Usage Limit set forth below), Licensee must: (i) refrain from any Use of the Software; and (ii) purchase a [commercial paid license](https://www.scylladb.com/scylladb-proprietary-software-license-agreement/) from the Licensor.

					i. 	Restricted to "Never Customers" Only. The license granted under this Agreement is strictly limited to Never Customers. For purposes of this Agreement, a "Never Customer" is an entity (including its Affiliates) that does not have, and has not had within the previous twelve (12) months, a paid commercial subscription, professional services agreement, or any other commercial relationship with Licensor. Satisfaction of the Never Customer criteria is a strict condition precedent to the effectiveness of this License. 

					ii. 	Total Prohibition for Existing Commercial Customers. If You (or any of Your Affiliates) are an existing Commercial Customer of Licensor within the last twelve (12) months, no license is deemed to have been offered or extended to You, and any download or installation of the Software by You is unauthorized. This prohibition applies to all deployments, including but not limited to:

					(a) existing commercial workloads;

					(b) any new use cases, new applications, or new workloads

					iii. **No "Hybrid" Usage**. Licensee is expressly prohibited from combining free tier usage under this Agreement with paid commercial units. 

					If You are a Commercial Customer, all use of the Software across Your entire organization (and any of your Affiliates) must be governed by a valid, paid commercial agreement. Use of the Software under this license by a Commercial Customer (which is not a "Never Customer") shall:

					(a) Void this license *ab initio*;

					(b) Be deemed a material breach of both this Agreement and any existing commercial terms; and

					(c) Entitle Licensor to invoice Licensee for such unauthorized usage at Licensor's standard list prices, retroactive to the date of first use.

					Notwithstanding anything to the contrary in the Eligibility or License Limitations sections above a Commercial Customer may use the Software exclusively for non-production purposes, including Continuous Integration (CI), automated testing, and quality assurance environments, provided that such use at all times remains compliant with the Usage Limit.

					iv. **Verification**. Licensor reserves the right to audit Licensee's environment and corporate identity to ensure compliance with these eligibility criteria.

					For the purposes of this Agreement an "**Affiliate**" means any entity that directly or indirectly controls, is controlled by, or is under common control with a party, where "control" means ownership of more than 50% of the voting stock or decision-making authority

					**License Limitations, Restrictions and Obligations:** The license grant above is subject to the following limitations, restrictions, and obligations. If Licensee’s Use of the Software does not comply with the above license grant or the terms of this section (including exceeding the Usage Limit set forth below), Licensee must: (i) refrain from any Use of the Software; and (ii) unless Licensee is a Never Customer, purchase a [commercial paid license](https://www.scylladb.com/scylladb-proprietary-software-license-agreement/) from the Licensor.

					* **Updates:** You shall be solely responsible for providing all equipment, systems, assets, access, and ancillary goods and services needed to access and Use the Software.  Licensor may modify or update the Software at any time, without notification, in its sole and absolute discretion.  After the effective date of each such update, Licensor shall bear no obligation to run, provide or support legacy versions of the Software.

					* **Updates:** You shall be solely responsible for providing all equipment, systems, assets, access, and ancillary goods and services needed to access and Use the Software.  Licensor may modify or update the Software at any time, without notification, in its sole and absolute discretion.  After the effective date of each such update, Licensor shall bear no obligation to run, provide or support legacy versions of the Software.

					* **"Usage Limit":** Licensee's total overall available storage across all deployments and clusters of the Software and the Licensed Work under this License shall not exceed 10TB and/or an upper limit of 50 VCPUs (hyper threads).

					* **"Usage Limit":** Licensee's total overall available storage across all deployments and clusters of the Software and the Licensed Work under this License shall not exceed 10TB and/or an upper limit of 50 VCPUs (hyper threads).

					* **IP Markings:** Licensee must retain all copyright, trademark, and other proprietary notices contained in the Software. You will not modify, delete, alter, remove, or obscure any intellectual property, including without limitations licensing, copyright, trademark, or any other notices of Licensor in the Software.

					* **IP Markings:** Licensee must retain all copyright, trademark, and other proprietary notices contained in the Software. You will not modify, delete, alter, remove, or obscure any intellectual property, including without limitations licensing, copyright, trademark, or any other notices of Licensor in the Software.

					* **License Reproduction:** You must conspicuously display this Agreement on each copy of the Software. If You receive the Software from a third party, this Agreement still applies to Your Use of the Software. You will be responsible for any breach of this Agreement by any such third-party.

					* **License Reproduction:** You must conspicuously display this Agreement on each copy of the Software. If You receive the Software from a third party, this Agreement still applies to Your Use of the Software. You will be responsible for any breach of this Agreement by any such third-party.

					* Distribution of any Licensed Works is permitted, provided that: (i) You must include in any Licensed Work prominent notices stating that You have modified the Software, (ii) You include a copy of this Agreement with the Licensed Work, and (iii) You clearly identify all modifications made in the Licensed Work and provides attribution to the Licensor as the original author(s) of the Software.

					* Distribution of any Licensed Works is permitted, provided that: (i) You must include in any Licensed Work prominent notices stating that You have modified the Software, (ii) You include a copy of this Agreement with the Licensed Work, and (iii) You clearly identify all modifications made in the Licensed Work and provides attribution to the Licensor as the original author(s) of the Software.

					* **Commercial Use Restrictions:** Licensee may not offer the Software as a software-as-a-service (SaaS) or commercial database-as-as-service (dBaaS) offering.  Licensee may not use the Software to compete with Licensor's existing or future products or services. If your Use of the Software does not comply with the requirements currently in effect as described in this License, you must purchase a commercial license from the Licensor, its affiliated entities, or you must refrain from using the Software and all Licensed Work. Furthermore, if You make any written claim of patent infringement relating to the Software, Your patent license for the Software granted under this Agreement terminates immediately.

					* **Commercial Use Restrictions:** Licensee may not offer the Software as a software-as-a-service (SaaS) or commercial database-as-as-service (dBaaS) offering.  Licensee may not use the Software to compete with Licensor's existing or future products or services. If your Use of the Software does not comply with the requirements currently in effect as described in this License, you must purchase a commercial license from the Licensor, its Affiliated entities, or you must refrain from using the Software and all Licensed Work. Furthermore, if You make any written claim of patent infringement relating to the Software, Your patent license for the Software granted under this Agreement terminates immediately.

					* Notwithstanding anything to the contrary, under the License granted hereunder, You shall not and shall not permit others to: (i) transfer the Software or any portions thereof to any other party except as expressly permitted herein; (ii) attempt to circumvent or overcome any technological protection measures incorporated into the Software; (iii) incorporate the Software into the structure, machinery or controls of any aircraft, other aerial device, military vehicle, hovercraft, waterborne craft or any medical equipment of any kind; or (iv) use the Software or any part thereof in any unlawful, harmful or illegal manner, or in a manner which infringes third parties’ rights in any way, including intellectual property rights.

					* Notwithstanding anything to the contrary, under the License granted hereunder, You shall not and shall not permit others to: (i) transfer the Software or any portions thereof to any other party except as expressly permitted herein; (ii) attempt to circumvent or overcome any technological protection measures incorporated into the Software; (iii) incorporate the Software into the structure, machinery or controls of any aircraft, other aerial device, military vehicle, hovercraft, waterborne craft or any medical equipment of any kind; or (iv) use the Software or any part thereof in any unlawful, harmful or illegal manner, or in a manner which infringes third parties’ rights in any way, including intellectual property rights.

					**Monitoring; Audit**

					**Monitoring; Audit**

					@@ -41,14 +69,14 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using

					**Indemnity; Disclaimer; Limitation of Liability**

					**Indemnity; Disclaimer; Limitation of Liability**

					* **Indemnity:** Licensee hereby agrees to indemnify, defend and hold harmless Licensor and its affiliates from any losses or damages incurred due to a third party claim arising out of: (i) Licensee’s breach of this Agreement; (ii) Licensee’s negligence, willful misconduct or violation of law, or (iii) Licensee’s products or services.

					* **Indemnity:** Licensee hereby agrees to indemnify, defend and hold harmless Licensor and its Affiliates from any losses or damages incurred due to a third party claim arising out of: (i) Licensee’s breach of this Agreement; (ii) Licensee’s negligence, willful misconduct or violation of law, or (iii) Licensee’s products or services.

					* DISCLAIMER OF WARRANTIES:  LICENSEE AGREES THAT LICENSOR HAS MADE NO EXPRESS WARRANTIES REGARDING THE SOFTWARE AND THAT THE SOFTWARE IS BEING PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. LICENSOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THE SOFTWARE, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE; TITLE; MERCHANTABILITY;  OR NON-INFRINGEMENT OF THIRD PARTY RIGHTS. LICENSOR DOES NOT WARRANT THAT THE SOFTWARE WILL OPERATE UNINTERRUPTED OR ERROR FREE, OR THAT ALL ERRORS WILL BE CORRECTED.  LICENSOR DOES NOT GUARANTEE ANY PARTICULAR RESULTS FROM THE USE OF THE SOFTWARE, AND DOES NOT WARRANT THAT THE SOFTWARE IS FIT FOR ANY PARTICULAR PURPOSE.

					* DISCLAIMER OF WARRANTIES:  LICENSEE AGREES THAT LICENSOR HAS MADE NO EXPRESS WARRANTIES REGARDING THE SOFTWARE AND THAT THE SOFTWARE IS BEING PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND. LICENSOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THE SOFTWARE, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF FITNESS FOR A PARTICULAR PURPOSE; TITLE; MERCHANTABILITY;  OR NON-INFRINGEMENT OF THIRD PARTY RIGHTS. LICENSOR DOES NOT WARRANT THAT THE SOFTWARE WILL OPERATE UNINTERRUPTED OR ERROR FREE, OR THAT ALL ERRORS WILL BE CORRECTED.  LICENSOR DOES NOT GUARANTEE ANY PARTICULAR RESULTS FROM THE USE OF THE SOFTWARE, AND DOES NOT WARRANT THAT THE SOFTWARE IS FIT FOR ANY PARTICULAR PURPOSE.

					* LIMITATION OF LIABILITY:  TO THE FULLEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW, IN NO EVENT WILL LICENSOR AND/OR ITS AFFILIATES, EMPLOYEES, OFFICERS AND DIRECTORS BE LIABLE TO LICENSEE FOR (I) ANY LOSS OF USE OR DATA; INTERRUPTION OF BUSINESS; OR ANY INDIRECT; SPECIAL; INCIDENTAL; OR CONSEQUENTIAL DAMAGES OF ANY KIND (INCLUDING LOST PROFITS); AND (II) ANY DIRECT DAMAGES EXCEEDING THE TOTAL AMOUNT OF ONE THOUSAND US DOLLARS ($1,000).  THE FOREGOING PROVISIONS LIMITING THE LIABILITY OF LICENSOR SHALL APPLY REGARDLESS OF THE FORM OR CAUSE OF ACTION, WHETHER IN STRICT LIABILITY, CONTRACT OR TORT.

					* LIMITATION OF LIABILITY:  TO THE FULLEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW, IN NO EVENT WILL LICENSOR AND/OR ITS AFFILIATES, EMPLOYEES, OFFICERS AND DIRECTORS BE LIABLE TO LICENSEE FOR (I) ANY LOSS OF USE OR DATA; INTERRUPTION OF BUSINESS; OR ANY INDIRECT; SPECIAL; INCIDENTAL; OR CONSEQUENTIAL DAMAGES OF ANY KIND (INCLUDING LOST PROFITS); AND (II) ANY DIRECT DAMAGES EXCEEDING THE TOTAL AMOUNT OF ONE THOUSAND US DOLLARS ($1,000).  THE FOREGOING PROVISIONS LIMITING THE LIABILITY OF LICENSOR SHALL APPLY REGARDLESS OF THE FORM OR CAUSE OF ACTION, WHETHER IN STRICT LIABILITY, CONTRACT OR TORT.

					**Proprietary Rights; No Other Rights**

					**Proprietary Rights; No Other Rights**

					* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensor’s trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.

					* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensor’s trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.

					* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback").  If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it.  All right in any trademark or logo of Licensor or its affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.

					* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback").  If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it.  All right in any trademark or logo of Licensor or its Affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.

					* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.

					* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.

					* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.

					* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.

					* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.

					* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.

					@@ -56,7 +84,7 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using

					**Miscellaneous**

					**Miscellaneous**

					* **Miscellaneous:** This Agreement may be modified at any time by Licensor, and constitutes the entire agreement between the parties with respect to the subject matter hereof. Licensee may not assign or subcontract its rights or obligations under this Agreement.  This Agreement does not, and shall not be construed to create any relationship, partnership, joint venture, employer-employee, agency, or franchisor-franchisee relationship between the parties.

					* **Miscellaneous:** This Agreement may be modified at any time by Licensor, and constitutes the entire agreement between the parties with respect to the subject matter hereof. Licensee may not assign or subcontract its rights or obligations under this Agreement.  This Agreement does not, and shall not be construed to create any relationship, partnership, joint venture, employer-employee, agency, or franchisor-franchisee relationship between the parties.

					* **Modifications**: Licensor reserves the right to modify this Agreement at any time. Changes will be effective upon posting to the Website or within the Software repository. Continued use of the Software after such changes constitutes acceptance.

					* **Governing Law & Jurisdiction:** This Agreement shall be governed and construed in accordance with the laws of Israel, without giving effect to their respective conflicts of laws provisions, and the competent courts situated in Tel Aviv, Israel, shall have sole and exclusive jurisdiction over the parties and any conflict and/or dispute arising out of, or in connection to, this Agreement

					* **Governing Law & Jurisdiction:** This Agreement shall be governed and construed in accordance with the laws of Israel, without giving effect to their respective conflicts of laws provisions, and the competent courts situated in Tel Aviv, Israel, shall have sole and exclusive jurisdiction over the parties and any conflict and/or dispute arising out of, or in connection to, this Agreement

					\[*End of ScyllaDB Software License Agreement*\]

					\[*End of ScyllaDB Software License Agreement*\]

2

abseil

Submodule abseil updated: d7aaad83b4...255c84dadd

									
										2

absl-flat_hash_map.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "absl-flat_hash_map.hh"

					#include "absl-flat_hash_map.hh"

									
										2

absl-flat_hash_map.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

alternator/CMakeLists.txt
									
												View File
												
					@@ -9,6 +9,8 @@ target_sources(alternator

					    controller.cc

					    controller.cc

					    server.cc

					    server.cc

					    executor.cc

					    executor.cc

					    executor_read.cc

					    executor_util.cc

					    stats.cc

					    stats.cc

					    serialization.cc

					    serialization.cc

					    expressions.cc

					    expressions.cc

									
										253

alternator/attribute_path.hh
									
										Normal file
									
												View File
												
					@@ -0,0 +1,253 @@

					/*

					 * Copyright 2019-present ScyllaDB

					 */

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					#pragma once

					#include <map>

					#include <memory>

					#include <optional>

					#include <string>

					#include <unordered_map>

					#include <variant>

					#include "utils/rjson.hh"

					#include "utils/overloaded_functor.hh"

					#include "alternator/error.hh"

					#include "alternator/expressions_types.hh"

					namespace alternator {

					// An attribute_path_map object is used to hold data for various attributes

					// paths (parsed::path) in a hierarchy of attribute paths. Each attribute path

					// has a root attribute, and then modified by member and index operators -

					// for example in "a.b[2].c" we have "a" as the root, then ".b" member, then

					// "[2]" index, and finally ".c" member.

					// Data can be added to an attribute_path_map using the add() function, but

					// requires that attributes with data not be *overlapping* or *conflicting*:

					//

					// 1. Two attribute paths which are identical or an ancestor of one another

					//    are considered *overlapping* and not allowed. If a.b.c has data,

					//    we can't add more data in a.b.c or any of its descendants like a.b.c.d.

					//

					// 2. Two attribute paths which need the same parent to have both a member and

					//    an index are considered *conflicting* and not allowed. E.g., if a.b has

					//    data, you can't add a[1]. The meaning of adding both would be that the

					//    attribute a is both a map and an array, which isn't sensible.

					//

					// These two requirements are common to the two places where Alternator uses

					// this abstraction to describe how a hierarchical item is to be transformed:

					//

					// 1. In ProjectExpression: for filtering from a full top-level attribute

					//    only the parts for which user asked in ProjectionExpression.

					//

					// 2. In UpdateExpression: for taking the previous value of a top-level

					//    attribute, and modifying it based on the instructions in the user

					//    wrote in UpdateExpression.

					template<typename T>

					class attribute_path_map_node {

					public:

					    using data_t = T;

					    // We need the extra unique_ptr<> here because libstdc++ unordered_map

					    // doesn't work with incomplete types :-(

					    using members_t =  std::unordered_map<std::string, std::unique_ptr<attribute_path_map_node<T>>>;

					    // The indexes list is sorted because DynamoDB requires handling writes

					    // beyond the end of a list in index order.

					    using indexes_t = std::map<unsigned, std::unique_ptr<attribute_path_map_node<T>>>;

					    // The prohibition on "overlap" and "conflict" explained above means

					    // That only one of data, members or indexes is non-empty.

					    std::optional<std::variant<data_t, members_t, indexes_t>> _content;

					    bool is_empty() const { return !_content; }

					    bool has_value() const { return _content && std::holds_alternative<data_t>(*_content); }

					    bool has_members() const { return _content && std::holds_alternative<members_t>(*_content); }

					    bool has_indexes() const { return _content && std::holds_alternative<indexes_t>(*_content); }

					    // get_members() assumes that has_members() is true

					    members_t& get_members() { return std::get<members_t>(*_content); }

					    const members_t& get_members() const { return std::get<members_t>(*_content); }

					    indexes_t& get_indexes() { return std::get<indexes_t>(*_content); }

					    const indexes_t& get_indexes() const { return std::get<indexes_t>(*_content); }

					    T& get_value() { return std::get<T>(*_content); }

					    const T& get_value() const { return std::get<T>(*_content); }

					};

					template<typename T>

					using attribute_path_map = std::unordered_map<std::string, attribute_path_map_node<T>>;

					using attrs_to_get_node = attribute_path_map_node<std::monostate>;

					// attrs_to_get lists which top-level attribute are needed, and possibly also

					// which part of the top-level attribute is really needed (when nested

					// attribute paths appeared in the query).

					// Most code actually uses optional<attrs_to_get>. There, a disengaged

					// optional means we should get all attributes, not specific ones.

					using attrs_to_get = attribute_path_map<std::monostate>;

					// takes a given JSON value and drops its parts which weren't asked to be

					// kept. It modifies the given JSON value, or returns false to signify that

					// the entire object should be dropped.

					// Note that The JSON value is assumed to be encoded using the DynamoDB

					// conventions - i.e., it is really a map whose key has a type string,

					// and the value is the real object.

					template<typename T>

					bool hierarchy_filter(rjson::value& val, const attribute_path_map_node<T>& h) {

					    if (!val.IsObject() || val.MemberCount() != 1) {

					        // This shouldn't happen. We shouldn't have stored malformed objects.

					        // But today Alternator does not validate the structure of nested

					        // documents before storing them, so this can happen on read.

					        throw api_error::internal(format("Malformed value object read: {}", val));

					    }

					    const char* type = val.MemberBegin()->name.GetString();

					    rjson::value& v = val.MemberBegin()->value;

					    if (h.has_members()) {

					        const auto& members = h.get_members();

					        if (type[0] != 'M' || !v.IsObject()) {

					            // If v is not an object (dictionary, map), none of the members

					            // can match.

					            return false;

					        }

					        rjson::value newv = rjson::empty_object();

					        for (auto it = v.MemberBegin(); it != v.MemberEnd(); ++it) {

					            std::string attr = rjson::to_string(it->name);

					            auto x = members.find(attr);

					            if (x != members.end()) {

					                if (x->second) {

					                    // Only a part of this attribute is to be filtered, do it.

					                    if (hierarchy_filter(it->value, *x->second)) {

					                        // because newv started empty and attr are unique

					                        // (keys of v), we can use add() here

					                        rjson::add_with_string_name(newv, attr, std::move(it->value));

					                    }

					                } else {

					                    // The entire attribute is to be kept

					                    rjson::add_with_string_name(newv, attr, std::move(it->value));

					                }

					            }

					        }

					        if (newv.MemberCount() == 0) {

					            return false;

					        }

					        v = newv;

					    } else if (h.has_indexes()) {

					        const auto& indexes = h.get_indexes();

					        if (type[0] != 'L' || !v.IsArray()) {

					            return false;

					        }

					        rjson::value newv = rjson::empty_array();

					        const auto& a = v.GetArray();

					        for (unsigned i = 0; i < v.Size(); i++) {

					            auto x = indexes.find(i);

					            if (x != indexes.end()) {

					                if (x->second) {

					                    if (hierarchy_filter(a[i], *x->second)) {

					                        rjson::push_back(newv, std::move(a[i]));

					                    }

					                } else {

					                    // The entire attribute is to be kept

					                    rjson::push_back(newv, std::move(a[i]));

					                }

					            }

					        }

					        if (newv.Size() == 0) {

					            return false;

					        }

					        v = newv;

					    }

					    return true;

					}

					// Add a path to an attribute_path_map. Throws a validation error if the path

					// "overlaps" with one already in the filter (one is a sub-path of the other)

					// or "conflicts" with it (both a member and index is requested).

					template<typename T>

					void attribute_path_map_add(const char* source, attribute_path_map<T>& map, const parsed::path& p, T value = {}) {

					   using node = attribute_path_map_node<T>;

					    // The first step is to look for the top-level attribute (p.root()):

					    auto it = map.find(p.root());

					    if (it == map.end()) {

					        if (p.has_operators()) {

					            it = map.emplace(p.root(), node {std::nullopt}).first;

					        } else {

					            (void) map.emplace(p.root(), node {std::move(value)}).first;

					            // Value inserted for top-level node. We're done.

					            return;

					        }

					    } else if(!p.has_operators()) {

					        // If p is top-level and we already have it or a part of it

					        // in map, it's a forbidden overlapping path.

					        throw api_error::validation(fmt::format(

					            "Invalid {}: two document paths overlap at {}", source, p.root()));

					    } else if (it->second.has_value()) {

					        // If we're here, it != map.end() && p.has_operators && it->second.has_value().

					        // This means the top-level attribute already has a value, and we're

					        // trying to add a non-top-level value. It's an overlap.

					        throw api_error::validation(fmt::format("Invalid {}: two document paths overlap at {}", source, p.root()));

					    }

					    node* h = &it->second;

					    // The second step is to walk h from the top-level node to the inner node

					    // where we're supposed to insert the value:

					    for (const auto& op : p.operators()) {

					        std::visit(overloaded_functor {

					            [&] (const std::string& member) {

					                if (h->is_empty()) {

					                    *h = node {typename node::members_t()};

					                } else if (h->has_indexes()) {

					                    throw api_error::validation(format("Invalid {}: two document paths conflict at {}", source, p));

					                } else if (h->has_value()) {

					                    throw api_error::validation(format("Invalid {}: two document paths overlap at {}", source, p));

					                }

					                typename node::members_t& members = h->get_members();

					                auto it = members.find(member);

					                if (it == members.end()) {

					                    it = members.insert({member, std::make_unique<node>()}).first;

					                }

					                h = it->second.get();

					            },

					            [&] (unsigned index) {

					                if (h->is_empty()) {

					                    *h = node {typename node::indexes_t()};

					                } else if (h->has_members()) {

					                    throw api_error::validation(format("Invalid {}: two document paths conflict at {}", source, p));

					                } else if (h->has_value()) {

					                    throw api_error::validation(format("Invalid {}: two document paths overlap at {}", source, p));

					                }

					                typename node::indexes_t& indexes = h->get_indexes();

					                auto it = indexes.find(index);

					                if (it == indexes.end()) {

					                    it = indexes.insert({index, std::make_unique<node>()}).first;

					                }

					                h = it->second.get();

					            }

					        }, op);

					    }

					    // Finally, insert the value in the node h.

					    if (h->is_empty()) {

					        *h = node {std::move(value)};

					    } else {

					        throw api_error::validation(format("Invalid {}: two document paths overlap at {}", source, p));

					    }

					}

					// A very simplified version of the above function for the special case of

					// adding only top-level attribute. It's not only simpler, we also use a

					// different error message, referring to a "duplicate attribute" instead of

					// "overlapping paths". DynamoDB also has this distinction (errors in

					// AttributesToGet refer to duplicates, not overlaps, but errors in

					// ProjectionExpression refer to overlap - even if it's an exact duplicate).

					template<typename T>

					void attribute_path_map_add(const char* source, attribute_path_map<T>& map, const std::string& attr, T value = {}) {

					   using node = attribute_path_map_node<T>;

					    auto it = map.find(attr);

					    if (it == map.end()) {

					        map.emplace(attr, node {std::move(value)});

					    } else {

					        throw api_error::validation(fmt::format(

					            "Invalid {}: Duplicate attribute: {}", source, attr));

					    }

					}

					} // namespace alternator

									
										2

alternator/auth.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "alternator/error.hh"

					#include "alternator/error.hh"

									
										2

alternator/auth.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

alternator/conditions.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include <string_view>

					#include <string_view>

									
										2

alternator/conditions.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					/*

					/*

									
										2

alternator/consumed_capacity.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "consumed_capacity.hh"

					#include "consumed_capacity.hh"

									
										2

alternator/consumed_capacity.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										11

alternator/controller.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include <seastar/core/with_scheduling_group.hh>

					#include <seastar/core/with_scheduling_group.hh>

					@@ -18,6 +18,7 @@

					#include "service/memory_limiter.hh"

					#include "service/memory_limiter.hh"

					#include "auth/service.hh"

					#include "auth/service.hh"

					#include "service/qos/service_level_controller.hh"

					#include "service/qos/service_level_controller.hh"

					#include "vector_search/vector_store_client.hh"

					using namespace seastar;

					using namespace seastar;

					@@ -31,10 +32,12 @@ controller::controller(

					        sharded<service::storage_service>& ss,

					        sharded<service::storage_service>& ss,

					        sharded<service::migration_manager>& mm,

					        sharded<service::migration_manager>& mm,

					        sharded<db::system_distributed_keyspace>& sys_dist_ks,

					        sharded<db::system_distributed_keyspace>& sys_dist_ks,

					        sharded<db::system_keyspace>& sys_ks,

					        sharded<cdc::generation_service>& cdc_gen_svc,

					        sharded<cdc::generation_service>& cdc_gen_svc,

					        sharded<service::memory_limiter>& memory_limiter,

					        sharded<service::memory_limiter>& memory_limiter,

					        sharded<auth::service>& auth_service,

					        sharded<auth::service>& auth_service,

					        sharded<qos::service_level_controller>& sl_controller,

					        sharded<qos::service_level_controller>& sl_controller,

					        sharded<vector_search::vector_store_client>& vsc,

					        const db::config& config,

					        const db::config& config,

					        seastar::scheduling_group sg)

					        seastar::scheduling_group sg)

					    : protocol_server(sg)

					    : protocol_server(sg)

					@@ -43,10 +46,12 @@ controller::controller(

					    , _ss(ss)

					    , _ss(ss)

					    , _mm(mm)

					    , _mm(mm)

					    , _sys_dist_ks(sys_dist_ks)

					    , _sys_dist_ks(sys_dist_ks)

					    , _sys_ks(sys_ks)

					    , _cdc_gen_svc(cdc_gen_svc)

					    , _cdc_gen_svc(cdc_gen_svc)

					    , _memory_limiter(memory_limiter)

					    , _memory_limiter(memory_limiter)

					    , _auth_service(auth_service)

					    , _auth_service(auth_service)

					    , _sl_controller(sl_controller)

					    , _sl_controller(sl_controller)

					    , _vsc(vsc)

					    , _config(config)

					    , _config(config)

					{

					{

					}

					}

					@@ -91,8 +96,8 @@ future<> controller::start_server() {

					        auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {

					        auto get_timeout_in_ms = [] (const db::config& cfg) -> utils::updateable_value<uint32_t> {

					            return cfg.alternator_timeout_in_ms;

					            return cfg.alternator_timeout_in_ms;

					        };

					        };

					        _executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_ss), std::ref(_mm), std::ref(_sys_dist_ks),

					        _executor.start(std::ref(_gossiper), std::ref(_proxy), std::ref(_ss), std::ref(_mm), std::ref(_sys_dist_ks), std::ref(_sys_ks),

					                        sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), _ssg.value(),

					                        sharded_parameter(get_cdc_metadata, std::ref(_cdc_gen_svc)), std::ref(_vsc), _ssg.value(),

					                        sharded_parameter(get_timeout_in_ms, std::ref(_config))).get();

					                        sharded_parameter(get_timeout_in_ms, std::ref(_config))).get();

					        _server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper), std::ref(_auth_service), std::ref(_sl_controller)).get();

					        _server.start(std::ref(_executor), std::ref(_proxy), std::ref(_gossiper), std::ref(_auth_service), std::ref(_sl_controller)).get();

					        // Note: from this point on, if start_server() throws for any reason,

					        // Note: from this point on, if start_server() throws for any reason,

									
										11

alternator/controller.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

					@@ -22,6 +22,7 @@ class memory_limiter;

					namespace db {

					namespace db {

					class system_distributed_keyspace;

					class system_distributed_keyspace;

					class system_keyspace;

					class config;

					class config;

					}

					}

					@@ -43,6 +44,10 @@ namespace qos {

					class service_level_controller;

					class service_level_controller;

					}

					}

					namespace vector_search {

					class vector_store_client;

					}

					namespace alternator {

					namespace alternator {

					// This is the official DynamoDB API version.

					// This is the official DynamoDB API version.

					@@ -61,10 +66,12 @@ class controller : public protocol_server {

					    sharded<service::storage_service>& _ss;

					    sharded<service::storage_service>& _ss;

					    sharded<service::migration_manager>& _mm;

					    sharded<service::migration_manager>& _mm;

					    sharded<db::system_distributed_keyspace>& _sys_dist_ks;

					    sharded<db::system_distributed_keyspace>& _sys_dist_ks;

					    sharded<db::system_keyspace>& _sys_ks;

					    sharded<cdc::generation_service>& _cdc_gen_svc;

					    sharded<cdc::generation_service>& _cdc_gen_svc;

					    sharded<service::memory_limiter>& _memory_limiter;

					    sharded<service::memory_limiter>& _memory_limiter;

					    sharded<auth::service>& _auth_service;

					    sharded<auth::service>& _auth_service;

					    sharded<qos::service_level_controller>& _sl_controller;

					    sharded<qos::service_level_controller>& _sl_controller;

					    sharded<vector_search::vector_store_client>& _vsc;

					    const db::config& _config;

					    const db::config& _config;

					    std::vector<socket_address> _listen_addresses;

					    std::vector<socket_address> _listen_addresses;

					@@ -79,10 +86,12 @@ public:

					        sharded<service::storage_service>& ss,

					        sharded<service::storage_service>& ss,

					        sharded<service::migration_manager>& mm,

					        sharded<service::migration_manager>& mm,

					        sharded<db::system_distributed_keyspace>& sys_dist_ks,

					        sharded<db::system_distributed_keyspace>& sys_dist_ks,

					        sharded<db::system_keyspace>& sys_ks,

					        sharded<cdc::generation_service>& cdc_gen_svc,

					        sharded<cdc::generation_service>& cdc_gen_svc,

					        sharded<service::memory_limiter>& memory_limiter,

					        sharded<service::memory_limiter>& memory_limiter,

					        sharded<auth::service>& auth_service,

					        sharded<auth::service>& auth_service,

					        sharded<qos::service_level_controller>& sl_controller,

					        sharded<qos::service_level_controller>& sl_controller,

					        sharded<vector_search::vector_store_client>& vsc,

					        const db::config& config,

					        const db::config& config,

					        seastar::scheduling_group sg);

					        seastar::scheduling_group sg);

									
										2

alternator/error.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

3058

alternator/executor.cc

View File

File diff suppressed because it is too large Load Diff

									
										238

alternator/executor.hh
									
												View File
												
					@@ -3,13 +3,15 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

					#include <seastar/core/future.hh>

					#include <seastar/core/future.hh>

					#include "audit/audit.hh"

					#include "seastarx.hh"

					#include "seastarx.hh"

					#include <seastar/core/future.hh>

					#include <seastar/core/sharded.hh>

					#include <seastar/core/sharded.hh>

					#include <seastar/util/noncopyable_function.hh>

					#include <seastar/util/noncopyable_function.hh>

					@@ -20,15 +22,23 @@

					#include "db/config.hh"

					#include "db/config.hh"

					#include "alternator/error.hh"

					#include "alternator/error.hh"

					#include "stats.hh"

					#include "alternator/attribute_path.hh"

					#include "alternator/stats.hh"

					#include "alternator/executor_util.hh"

					#include "utils/rjson.hh"

					#include "utils/rjson.hh"

					#include "utils/updateable_value.hh"

					#include "utils/updateable_value.hh"

					#include "utils/simple_value_with_expiry.hh"

					#include "tracing/trace_state.hh"

					#include "tracing/trace_state.hh"

					namespace db {

					namespace db {

					    class system_distributed_keyspace;

					    class system_distributed_keyspace;

					    class system_keyspace;

					}

					namespace audit {

					class audit_info_alternator;

					}

					}

					namespace query {

					namespace query {

					@@ -46,6 +56,10 @@ namespace service {

					    class storage_service;

					    class storage_service;

					}

					}

					namespace vector_search {

					    class vector_store_client;

					}

					namespace cdc {

					namespace cdc {

					    class metadata;

					    class metadata;

					}

					}

					@@ -58,82 +72,13 @@ class gossiper;

					class schema_builder;

					class schema_builder;

					namespace alternator {

					namespace alternator {

					enum class table_status;

					enum class table_status;

					class rmw_operation;

					class rmw_operation;

					class put_or_delete_item;

					class put_or_delete_item;

					schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request);

					bool is_alternator_keyspace(const sstring& ks_name);

					// Wraps the db::get_tags_of_table and throws if the table is missing the tags extension.

					const std::map<sstring, sstring>& get_tags_of_table_or_throw(schema_ptr schema);

					// An attribute_path_map object is used to hold data for various attributes

					// paths (parsed::path) in a hierarchy of attribute paths. Each attribute path

					// has a root attribute, and then modified by member and index operators -

					// for example in "a.b[2].c" we have "a" as the root, then ".b" member, then

					// "[2]" index, and finally ".c" member.

					// Data can be added to an attribute_path_map using the add() function, but

					// requires that attributes with data not be *overlapping* or *conflicting*:

					//

					// 1. Two attribute paths which are identical or an ancestor of one another

					//    are considered *overlapping* and not allowed. If a.b.c has data,

					//    we can't add more data in a.b.c or any of its descendants like a.b.c.d.

					//

					// 2. Two attribute paths which need the same parent to have both a member and

					//    an index are considered *conflicting* and not allowed. E.g., if a.b has

					//    data, you can't add a[1]. The meaning of adding both would be that the

					//    attribute a is both a map and an array, which isn't sensible.

					//

					// These two requirements are common to the two places where Alternator uses

					// this abstraction to describe how a hierarchical item is to be transformed:

					//

					// 1. In ProjectExpression: for filtering from a full top-level attribute

					//    only the parts for which user asked in ProjectionExpression.

					//

					// 2. In UpdateExpression: for taking the previous value of a top-level

					//    attribute, and modifying it based on the instructions in the user

					//    wrote in UpdateExpression.

					template<typename T>

					class attribute_path_map_node {

					public:

					    using data_t = T;

					    // We need the extra unique_ptr<> here because libstdc++ unordered_map

					    // doesn't work with incomplete types :-(

					    using members_t =  std::unordered_map<std::string, std::unique_ptr<attribute_path_map_node<T>>>;

					    // The indexes list is sorted because DynamoDB requires handling writes

					    // beyond the end of a list in index order.

					    using indexes_t = std::map<unsigned, std::unique_ptr<attribute_path_map_node<T>>>;

					    // The prohibition on "overlap" and "conflict" explained above means

					    // That only one of data, members or indexes is non-empty.

					    std::optional<std::variant<data_t, members_t, indexes_t>> _content;

					    bool is_empty() const { return !_content; }

					    bool has_value() const { return _content && std::holds_alternative<data_t>(*_content); }

					    bool has_members() const { return _content && std::holds_alternative<members_t>(*_content); }

					    bool has_indexes() const { return _content && std::holds_alternative<indexes_t>(*_content); }

					    // get_members() assumes that has_members() is true

					    members_t& get_members() { return std::get<members_t>(*_content); }

					    const members_t& get_members() const { return std::get<members_t>(*_content); }

					    indexes_t& get_indexes() { return std::get<indexes_t>(*_content); }

					    const indexes_t& get_indexes() const { return std::get<indexes_t>(*_content); }

					    T& get_value() { return std::get<T>(*_content); }

					    const T& get_value() const { return std::get<T>(*_content); }

					};

					template<typename T>

					using attribute_path_map = std::unordered_map<std::string, attribute_path_map_node<T>>;

					using attrs_to_get_node = attribute_path_map_node<std::monostate>;

					// attrs_to_get lists which top-level attribute are needed, and possibly also

					// which part of the top-level attribute is really needed (when nested

					// attribute paths appeared in the query).

					// Most code actually uses optional<attrs_to_get>. There, a disengaged

					// optional means we should get all attributes, not specific ones.

					using attrs_to_get = attribute_path_map<std::monostate>;

					namespace parsed {

					namespace parsed {

					class expression_cache;

					class expression_cache;

					}

					}

					@@ -144,9 +89,12 @@ class executor : public peering_sharded_service<executor> {

					    service::storage_proxy& _proxy;

					    service::storage_proxy& _proxy;

					    service::migration_manager& _mm;

					    service::migration_manager& _mm;

					    db::system_distributed_keyspace& _sdks;

					    db::system_distributed_keyspace& _sdks;

					    db::system_keyspace& _system_keyspace;

					    cdc::metadata& _cdc_metadata;

					    cdc::metadata& _cdc_metadata;

					    vector_search::vector_store_client& _vsc;

					    utils::updateable_value<bool> _enforce_authorization;

					    utils::updateable_value<bool> _enforce_authorization;

					    utils::updateable_value<bool> _warn_authorization;

					    utils::updateable_value<bool> _warn_authorization;

					    seastar::sharded<audit::audit>& _audit;

					    // An smp_service_group to be used for limiting the concurrency when

					    // An smp_service_group to be used for limiting the concurrency when

					    // forwarding Alternator request between shards - if necessary for LWT.

					    // forwarding Alternator request between shards - if necessary for LWT.

					    smp_service_group _ssg;

					    smp_service_group _ssg;

					@@ -171,7 +119,6 @@ public:

					    // is written in chunks to the output_stream. This allows for efficient

					    // is written in chunks to the output_stream. This allows for efficient

					    // handling of large responses without needing to allocate a large buffer

					    // handling of large responses without needing to allocate a large buffer

					    // in memory.

					    // in memory.

					    using body_writer = noncopyable_function<future<>(output_stream<char>&&)>;

					    using request_return_type = std::variant<std::string, body_writer, api_error>;

					    using request_return_type = std::variant<std::string, body_writer, api_error>;

					    stats _stats;

					    stats _stats;

					    // The metric_groups object holds this stat object's metrics registered

					    // The metric_groups object holds this stat object's metrics registered

					@@ -186,53 +133,60 @@ public:

					             service::storage_service& ss,

					             service::storage_service& ss,

					             service::migration_manager& mm,

					             service::migration_manager& mm,

					             db::system_distributed_keyspace& sdks,

					             db::system_distributed_keyspace& sdks,

					             db::system_keyspace& system_keyspace,

					             cdc::metadata& cdc_metadata,

					             cdc::metadata& cdc_metadata,

					             vector_search::vector_store_client& vsc,

					             smp_service_group ssg,

					             smp_service_group ssg,

					             utils::updateable_value<uint32_t> default_timeout_in_ms);

					             utils::updateable_value<uint32_t> default_timeout_in_ms);

					    ~executor();

					    ~executor();

					    future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> create_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> describe_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> delete_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> update_table(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> put_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> delete_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> update_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> list_tables(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> scan(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header);

					    future<request_return_type> describe_endpoints(client_state& client_state, service_permit permit, rjson::value request, std::string host_header, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> batch_write_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> batch_get_item(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request);

					    future<request_return_type> query(client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> tag_resource(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> untag_resource(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> list_tags_of_resource(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> update_time_to_live(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> update_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> list_streams(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> describe_stream(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> get_shard_iterator(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request);

					    future<request_return_type> get_records(client_state& client_state, tracing::trace_state_ptr, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<request_return_type> describe_continuous_backups(client_state& client_state, service_permit permit, rjson::value request);

					    future<request_return_type> describe_continuous_backups(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<> start();

					    future<> start();

					    future<> stop();

					    future<> stop();

					    static sstring table_name(const schema&);

					    static db::timeout_clock::time_point default_timeout();

					    static db::timeout_clock::time_point default_timeout();

					private:

					private:

					    static thread_local utils::updateable_value<uint32_t> s_default_timeout_in_ms;

					    static thread_local utils::updateable_value<uint32_t> s_default_timeout_in_ms;

					public:

					    static schema_ptr find_table(service::storage_proxy&, std::string_view table_name);

					    static schema_ptr find_table(service::storage_proxy&, const rjson::value& request);

					private:

					    friend class rmw_operation;

					    friend class rmw_operation;

					    static void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string,std::string> * = nullptr, const std::map<sstring, sstring> *tags = nullptr);

					    // Helper to set up auditing for an Alternator operation. Checks whether

					    // the operation should be audited (via will_log()) and if so, allocates

					    // and populates audit_info. No allocation occurs when auditing is disabled.

					    void maybe_audit(std::unique_ptr<audit::audit_info_alternator>& audit_info,

					                     audit::statement_category category,

					                     std::string_view ks_name,

					                     std::string_view table_name,

					                     std::string_view operation_name,

					                     const rjson::value& request,

					                     std::optional<db::consistency_level> cl = std::nullopt);

					    future<rjson::value> fill_table_description(schema_ptr schema, table_status tbl_status, service::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit);

					    future<rjson::value> fill_table_description(schema_ptr schema, table_status tbl_status, service::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit);

					    future<executor::request_return_type> create_table_on_shard0(service::client_state&& client_state, tracing::trace_state_ptr trace_state, rjson::value request, bool enforce_authorization, bool warn_authorization, const db::tablets_mode_t::mode tablets_mode);

					    future<executor::request_return_type> create_table_on_shard0(service::client_state&& client_state, tracing::trace_state_ptr trace_state, rjson::value request, bool enforce_authorization,

					            bool warn_authorization, const db::tablets_mode_t::mode tablets_mode, std::unique_ptr<audit::audit_info_alternator>& audit_info);

					    future<> do_batch_write(

					    future<> do_batch_write(

					        std::vector<std::pair<schema_ptr, put_or_delete_item>> mutation_builders,

					        std::vector<std::pair<schema_ptr, put_or_delete_item>> mutation_builders,

					@@ -245,60 +199,34 @@ private:

					        tracing::trace_state_ptr trace_state, service_permit permit);

					        tracing::trace_state_ptr trace_state, service_permit permit);

					public:

					public:

					    static void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string,std::string>&, const std::map<sstring, sstring> *tags = nullptr);

					    static std::optional<rjson::value> describe_single_item(schema_ptr,

					        const query::partition_slice&,

					        const cql3::selection::selection&,

					        const query::result&,

					        const std::optional<attrs_to_get>&,

					        uint64_t* = nullptr);

					    // Converts a multi-row selection result to JSON compatible with DynamoDB.

					    // For each row, this method calls item_callback, which takes the size of

					    // the item as the parameter.

					    static future<std::vector<rjson::value>> describe_multi_item(schema_ptr schema,

					        const query::partition_slice&& slice,

					        shared_ptr<cql3::selection::selection> selection,

					        foreign_ptr<lw_shared_ptr<query::result>> query_result,

					        shared_ptr<const std::optional<attrs_to_get>> attrs_to_get,

					        noncopyable_function<void(uint64_t)> item_callback = {});

					    static void describe_single_item(const cql3::selection::selection&,

					        const std::vector<managed_bytes_opt>&,

					        const std::optional<attrs_to_get>&,

					        rjson::value&,

					        uint64_t* item_length_in_bytes = nullptr,

					        bool = false);

					    static bool add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp);

					    static bool add_stream_options(const rjson::value& stream_spec, schema_builder&, service::storage_proxy& sp);

					    static void supplement_table_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp);

					    static void supplement_table_info(rjson::value& descr, const schema& schema, service::storage_proxy& sp);

					    static void supplement_table_stream_info(rjson::value& descr, const schema& schema, const service::storage_proxy& sp);

					    static void supplement_table_stream_info(rjson::value& descr, const schema& schema, const service::storage_proxy& sp);

					};

					};

					// is_big() checks approximately if the given JSON value is "bigger" than

					// returns table creation time in seconds since epoch for `db_clock`

					// the given big_size number of bytes. The goal is to *quickly* detect

					double get_table_creation_time(const schema &schema);

					// oversized JSON that, for example, is too large to be serialized to a

					// contiguous string - we don't need an accurate size for that. Moreover,

					// as soon as we detect that the JSON is indeed "big", we can return true

					// and don't need to continue calculating its exact size.

					// For simplicity, we use a recursive implementation. This is fine because

					// Alternator limits the depth of JSONs it reads from inputs, and doesn't

					// add more than a couple of levels in its own output construction.

					bool is_big(const rjson::value& val, int big_size = 100'000);

					// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,

					// result of parsing ARN (Amazon Resource Name)

					// SELECT, DROP, etc.) on the given table. When permission is denied an

					// ARN format is `arn:<partition>:<service>:<region>:<account-id>:<resource-type>/<resource-id>/<postfix>`

					// appropriate user-readable api_error::access_denied is thrown.

					// we ignore partition, service and account-id

					future<> verify_permission(bool enforce_authorization, bool warn_authorization, const service::client_state&, const schema_ptr&, auth::permission, alternator::stats& stats);

					// resource-type must be string "table"

					// resource-id will be returned as table_name

					/**

					// region will be returned as keyspace_name

					 * Make return type for serializing the object "streamed",

					// postfix is a string after resource-id and will be returned as is (whole), including separator.

					 * i.e. direct to HTTP output stream. Note: only useful for

					struct arn_parts {

					 * (very) large objects as there are overhead issues with this

					    std::string_view keyspace_name;

					 * as well, but for massive lists of return objects this can

					    std::string_view table_name;

					 * help avoid large allocations/many re-allocs

					    std::string_view postfix;

					 */

					};

					executor::body_writer make_streamed(rjson::value&&);

					// arn - arn to parse

					// arn_field_name - identifier of the ARN, used only when reporting an error (in error messages), for example "Incorrect resource identifier `<arn_field_name>`"

					// type_name - used only when reporting an error (in error messages), for example "... is not a valid <type_name> ARN ..."

					// expected_postfix - optional filter of postfix value (part of ARN after resource-id, including separator, see comments for struct arn_parts).

					//    If is empty - then postfix value must be empty as well

					//    if not empty - postfix value must start with expected_postfix, but might be longer

					arn_parts parse_arn(std::string_view arn, std::string_view arn_field_name, std::string_view type_name, std::string_view expected_postfix);

					// The format is ks1|ks2|ks3... and table1|table2|table3...

					sstring print_names_for_audit(const std::set<sstring>& names);

					}

					}

1957

alternator/executor_read.cc Normal file

View File

File diff suppressed because it is too large Load Diff

									
										559

alternator/executor_util.cc
									
										Normal file
									
												View File
												
					@@ -0,0 +1,559 @@

					/*

					 * Copyright 2019-present ScyllaDB

					 */

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					#include "alternator/executor_util.hh"

					#include "alternator/executor.hh"

					#include "alternator/error.hh"

					#include "auth/resource.hh"

					#include "auth/service.hh"

					#include "cdc/log.hh"

					#include "data_dictionary/data_dictionary.hh"

					#include "db/tags/utils.hh"

					#include "replica/database.hh"

					#include "cql3/selection/selection.hh"

					#include "cql3/result_set.hh"

					#include "serialization.hh"

					#include "service/storage_proxy.hh"

					#include "types/map.hh"

					#include <fmt/format.h>

					namespace alternator {

					extern logging::logger elogger; // from executor.cc

					std::optional<int> get_int_attribute(const rjson::value& value, std::string_view attribute_name) {

					    const rjson::value* attribute_value = rjson::find(value, attribute_name);

					    if (!attribute_value)

					        return {};

					    if (!attribute_value->IsInt()) {

					        throw api_error::validation(fmt::format("Expected integer value for attribute {}, got: {}",

					                attribute_name, value));

					    }

					    return attribute_value->GetInt();

					}

					std::string get_string_attribute(const rjson::value& value, std::string_view attribute_name, const char* default_return) {

					    const rjson::value* attribute_value = rjson::find(value, attribute_name);

					    if (!attribute_value)

					        return default_return;

					    if (!attribute_value->IsString()) {

					        throw api_error::validation(fmt::format("Expected string value for attribute {}, got: {}",

					                attribute_name, value));

					    }

					    return rjson::to_string(*attribute_value);

					}

					bool get_bool_attribute(const rjson::value& value, std::string_view attribute_name, bool default_return) {

					    const rjson::value* attribute_value = rjson::find(value, attribute_name);

					    if (!attribute_value) {

					        return default_return;

					    }

					    if (!attribute_value->IsBool()) {

					        throw api_error::validation(fmt::format("Expected boolean value for attribute {}, got: {}",

					                attribute_name, value));

					    }

					    return attribute_value->GetBool();

					}

					std::optional<std::string> find_table_name(const rjson::value& request) {

					    const rjson::value* table_name_value = rjson::find(request, "TableName");

					    if (!table_name_value) {

					        return std::nullopt;

					    }

					    if (!table_name_value->IsString()) {

					        throw api_error::validation("Non-string TableName field in request");

					    }

					    std::string table_name = rjson::to_string(*table_name_value);

					    return table_name;

					}

					std::string get_table_name(const rjson::value& request) {

					    auto name = find_table_name(request);

					    if (!name) {

					        throw api_error::validation("Missing TableName field in request");

					    }

					    return *name;

					}

					schema_ptr find_table(service::storage_proxy& proxy, const rjson::value& request) {

					    auto table_name = find_table_name(request);

					    if (!table_name) {

					        return nullptr;

					    }

					    return find_table(proxy, *table_name);

					}

					schema_ptr find_table(service::storage_proxy& proxy, std::string_view table_name) {

					    try {

					        return proxy.data_dictionary().find_schema(sstring(executor::KEYSPACE_NAME_PREFIX) + sstring(table_name), table_name);

					    } catch(data_dictionary::no_such_column_family&) {

					        // DynamoDB returns validation error even when table does not exist

					        // and the table name is invalid.

					        validate_table_name(table_name);

					        throw api_error::resource_not_found(

					                fmt::format("Requested resource not found: Table: {} not found", table_name));

					    }

					}

					schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request) {

					    auto schema = find_table(proxy, request);

					    if (!schema) {

					        // if we get here then the name was missing, since syntax or missing actual CF

					        // checks throw. Slow path, but just call get_table_name to generate exception.

					        get_table_name(request);

					    }

					    return schema;

					}

					map_type attrs_type() {

					    static thread_local auto t = map_type_impl::get_instance(utf8_type, bytes_type, true);

					    return t;

					}

					const std::map<sstring, sstring>& get_tags_of_table_or_throw(schema_ptr schema) {

					    auto tags_ptr = db::get_tags_of_table(schema);

					    if (tags_ptr) {

					        return *tags_ptr;

					    } else {

					        throw api_error::validation(format("Table {} does not have valid tagging information", schema->ks_name()));

					    }

					}

					bool is_alternator_keyspace(std::string_view ks_name) {

					    return ks_name.starts_with(executor::KEYSPACE_NAME_PREFIX);

					}

					// This tag is set on a GSI when the user did not specify a range key, causing

					// Alternator to add the base table's range key as a spurious range key. It is

					// used by describe_key_schema() to suppress reporting that key.

					extern const sstring SPURIOUS_RANGE_KEY_ADDED_TO_GSI_AND_USER_DIDNT_SPECIFY_RANGE_KEY_TAG_KEY;

					void describe_key_schema(rjson::value& parent, const schema& schema, std::unordered_map<std::string, std::string>* attribute_types, const std::map<sstring, sstring>* tags) {

					    rjson::value key_schema = rjson::empty_array();

					    const bool ignore_range_keys_as_spurious = tags != nullptr && tags->contains(SPURIOUS_RANGE_KEY_ADDED_TO_GSI_AND_USER_DIDNT_SPECIFY_RANGE_KEY_TAG_KEY);

					    for (const column_definition& cdef : schema.partition_key_columns()) {

					        rjson::value key = rjson::empty_object();

					        rjson::add(key, "AttributeName", rjson::from_string(cdef.name_as_text()));

					        rjson::add(key, "KeyType", "HASH");

					        rjson::push_back(key_schema, std::move(key));

					        if (attribute_types) {

					            (*attribute_types)[cdef.name_as_text()] = type_to_string(cdef.type);

					        }

					    }

					    if (!ignore_range_keys_as_spurious) {

					        // NOTE: user requested key (there can be at most one) will always come first.

					        // There might be more keys following it, which were added, but those were

					        // not requested by the user, so we ignore them.

					        for (const column_definition& cdef : schema.clustering_key_columns()) {

					            rjson::value key = rjson::empty_object();

					            rjson::add(key, "AttributeName", rjson::from_string(cdef.name_as_text()));

					            rjson::add(key, "KeyType", "RANGE");

					            rjson::push_back(key_schema, std::move(key));

					            if (attribute_types) {

					                (*attribute_types)[cdef.name_as_text()] = type_to_string(cdef.type);

					            }

					            break;

					        }

					    }

					    rjson::add(parent, "KeySchema", std::move(key_schema));

					}

					// Check if the given string has valid characters for a table name, i.e. only

					// a-z, A-Z, 0-9, _ (underscore), - (dash), . (dot). Note that this function

					// does not check the length of the name - instead, use validate_table_name()

					// to validate both the characters and the length.

					static bool valid_table_name_chars(std::string_view name) {

					    for (auto c : name) {

					        if ((c < 'a' || c > 'z') &&

					            (c < 'A' || c > 'Z') &&

					            (c < '0' || c > '9') &&

					            c != '_' &&

					            c != '-' &&

					            c != '.') {

					            return false;

					        }

					    }

					    return true;

					}

					std::string view_name(std::string_view table_name, std::string_view index_name, const std::string& delim, bool validate_len) {

					    if (index_name.length() < 3) {

					        throw api_error::validation("IndexName must be at least 3 characters long");

					    }

					    if (!valid_table_name_chars(index_name)) {

					        throw api_error::validation(

					                fmt::format("IndexName '{}' must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", index_name));

					    }

					    std::string ret = std::string(table_name) + delim + std::string(index_name);

					    if (ret.length() > max_auxiliary_table_name_length && validate_len) {

					        throw api_error::validation(

					                fmt::format("The total length of TableName ('{}') and IndexName ('{}') cannot exceed {} characters",

					                        table_name, index_name, max_auxiliary_table_name_length - delim.size()));

					    }

					    return ret;

					}

					std::string gsi_name(std::string_view table_name, std::string_view index_name, bool validate_len) {

					    return view_name(table_name, index_name, ":", validate_len);

					}

					std::string lsi_name(std::string_view table_name, std::string_view index_name, bool validate_len) {

					    return view_name(table_name, index_name, "!:", validate_len);

					}

					void check_key(const rjson::value& key, const schema_ptr& schema) {

					    if (key.MemberCount() != (schema->clustering_key_size() == 0 ? 1 : 2)) {

					        throw api_error::validation("Given key attribute not in schema");

					    }

					}

					void verify_all_are_used(const rjson::value* field,

					        const std::unordered_set<std::string>& used, const char* field_name, const char* operation) {

					    if (!field) {

					        return;

					    }

					    for (auto it = field->MemberBegin(); it != field->MemberEnd(); ++it) {

					        if (!used.contains(rjson::to_string(it->name))) {

					            throw api_error::validation(

					                format("{} has spurious '{}', not used in {}",

					                    field_name, rjson::to_string_view(it->name), operation));

					        }

					    }

					}

					// This function increments the authorization_failures counter, and may also

					// log a warn-level message and/or throw an access_denied exception, depending

					// on what enforce_authorization and warn_authorization are set to.

					// Note that if enforce_authorization is false, this function will return

					// without throwing. So a caller that doesn't want to continue after an

					// authorization_error must explicitly return after calling this function.

					static void authorization_error(stats& stats, bool enforce_authorization, bool warn_authorization, std::string msg) {

					    stats.authorization_failures++;

					    if (enforce_authorization) {

					        if (warn_authorization) {

					            elogger.warn("alternator_warn_authorization=true: {}", msg);

					        }

					        throw api_error::access_denied(std::move(msg));

					    } else {

					        if (warn_authorization) {

					            elogger.warn("If you set alternator_enforce_authorization=true the following will be enforced: {}", msg);

					        }

					    }

					}

					future<> verify_permission(

					    bool enforce_authorization,

					    bool warn_authorization,

					    const service::client_state& client_state,

					    const schema_ptr& schema,

					    auth::permission permission_to_check,

					    stats& stats) {

					    if (!enforce_authorization && !warn_authorization) {

					        co_return;

					    }

					    // Unfortunately, the fix for issue #23218 did not modify the function

					    // that we use here - check_has_permissions(). So if we want to allow

					    // writes to internal tables (from try_get_internal_table()) only to a

					    // superuser, we need to explicitly check it here.

					    if (permission_to_check == auth::permission::MODIFY && is_internal_keyspace(schema->ks_name())) {

					        if (!client_state.user() ||

					            !client_state.user()->name ||

					            !co_await client_state.get_auth_service()->underlying_role_manager().is_superuser(*client_state.user()->name)) {

					                sstring username = "<anonymous>";

					                if (client_state.user() && client_state.user()->name) {

					                    username = client_state.user()->name.value();

					                }

					                authorization_error(stats, enforce_authorization, warn_authorization, fmt::format(

					                    "Write access denied on internal table {}.{} to role {} because it is not a superuser",

					                    schema->ks_name(), schema->cf_name(), username));

					                co_return;

					        }

					    }

					    auto resource = auth::make_data_resource(schema->ks_name(), schema->cf_name());

					    if (!client_state.user() || !client_state.user()->name ||

					        !co_await client_state.check_has_permission(auth::command_desc(permission_to_check, resource))) {

					        sstring username = "<anonymous>";

					        if (client_state.user() && client_state.user()->name) {

					            username = client_state.user()->name.value();

					        }

					        // Using exceptions for errors makes this function faster in the

					        // success path (when the operation is allowed).

					        authorization_error(stats, enforce_authorization, warn_authorization, fmt::format(

					            "{} access on table {}.{} is denied to role {}, client address {}",

					            auth::permissions::to_string(permission_to_check),

					            schema->ks_name(), schema->cf_name(), username, client_state.get_client_address()));

					    }

					}

					// Similar to verify_permission() above, but just for CREATE operations.

					// Those do not operate on any specific table, so require permissions on

					// ALL KEYSPACES instead of any specific table.

					future<> verify_create_permission(bool enforce_authorization, bool warn_authorization, const service::client_state& client_state, stats& stats) {

					    if (!enforce_authorization && !warn_authorization) {

					        co_return;

					    }

					    auto resource = auth::resource(auth::resource_kind::data);

					    if (!co_await client_state.check_has_permission(auth::command_desc(auth::permission::CREATE, resource))) {

					        sstring username = "<anonymous>";

					        if (client_state.user() && client_state.user()->name) {

					            username = client_state.user()->name.value();

					        }

					        authorization_error(stats, enforce_authorization, warn_authorization, fmt::format(

					            "CREATE access on ALL KEYSPACES is denied to role {}", username));

					    }

					}

					schema_ptr try_get_internal_table(const data_dictionary::database& db, std::string_view table_name) {

					    size_t it = table_name.find(executor::INTERNAL_TABLE_PREFIX);

					    if (it != 0) {

					        return schema_ptr{};

					    }

					    table_name.remove_prefix(executor::INTERNAL_TABLE_PREFIX.size());

					    size_t delim = table_name.find_first_of('.');

					    if (delim == std::string_view::npos) {

					        return schema_ptr{};

					    }

					    std::string_view ks_name = table_name.substr(0, delim);

					    table_name.remove_prefix(ks_name.size() + 1);

					    // Only internal keyspaces can be accessed to avoid leakage

					    auto ks = db.try_find_keyspace(ks_name);

					    if (!ks || !ks->is_internal()) {

					        return schema_ptr{};

					    }

					    try {

					        return db.find_schema(ks_name, table_name);

					    } catch (data_dictionary::no_such_column_family&) {

					        // DynamoDB returns validation error even when table does not exist

					        // and the table name is invalid.

					        validate_table_name(table_name);

					        throw api_error::resource_not_found(

					            fmt::format("Requested resource not found: Internal table: {}.{} not found", ks_name, table_name));

					    }

					}

					schema_ptr get_table_from_batch_request(const service::storage_proxy& proxy, const rjson::value::ConstMemberIterator& batch_request) {

					    sstring table_name = rjson::to_sstring(batch_request->name); // JSON keys are always strings

					    try {

					        return proxy.data_dictionary().find_schema(sstring(executor::KEYSPACE_NAME_PREFIX) + table_name, table_name);

					    } catch(data_dictionary::no_such_column_family&) {

					        // DynamoDB returns validation error even when table does not exist

					        // and the table name is invalid.

					        validate_table_name(table_name);

					        throw api_error::resource_not_found(format("Requested resource not found: Table: {} not found", table_name));

					    }

					}

					lw_shared_ptr<stats> get_stats_from_schema(service::storage_proxy& sp, const schema& schema) {

					    try {

					        replica::table& table = sp.local_db().find_column_family(schema.id());

					        if (!table.get_stats().alternator_stats) {

					            table.get_stats().alternator_stats = seastar::make_shared<table_stats>(schema.ks_name(), schema.cf_name());

					        }

					        return table.get_stats().alternator_stats->_stats;

					    } catch (std::runtime_error&) {

					        // If we're here it means that a table we are currently working on was deleted before the

					        // operation completed, returning a temporary object is fine, if the table get deleted so will its metrics

					        return make_lw_shared<stats>();

					    }

					}

					void describe_single_item(const cql3::selection::selection& selection,

					    const std::vector<managed_bytes_opt>& result_row,

					    const std::optional<attrs_to_get>& attrs_to_get,

					    rjson::value& item,

					    uint64_t* item_length_in_bytes,

					    bool include_all_embedded_attributes)

					{

					    const auto& columns = selection.get_columns();

					    auto column_it = columns.begin();

					    for (const managed_bytes_opt& cell : result_row) {

					        if (!cell) {

					            ++column_it;

					            continue;

					        }

					        std::string column_name = (*column_it)->name_as_text();

					        if (column_name != executor::ATTRS_COLUMN_NAME) {

					            if (item_length_in_bytes) {

					                (*item_length_in_bytes) += column_name.length() + cell->size();

					            }

					            if (!attrs_to_get || attrs_to_get->contains(column_name)) {

					                // item is expected to start empty, and column_name are unique

					                // so add() makes sense

					                rjson::add_with_string_name(item, column_name, rjson::empty_object());

					                rjson::value& field = item[column_name.c_str()];

					                cell->with_linearized([&] (bytes_view linearized_cell) {

					                    rjson::add_with_string_name(field, type_to_string((*column_it)->type), json_key_column_value(linearized_cell, **column_it));

					                });

					            }

					        } else {

					            auto deserialized = attrs_type()->deserialize(*cell);

					            auto keys_and_values = value_cast<map_type_impl::native_type>(deserialized);

					            for (auto entry : keys_and_values) {

					                std::string attr_name = value_cast<sstring>(entry.first);

					                if (item_length_in_bytes) {

					                    (*item_length_in_bytes) += attr_name.length();

					                }

					                if (include_all_embedded_attributes || !attrs_to_get || attrs_to_get->contains(attr_name)) {

					                    bytes value = value_cast<bytes>(entry.second);

					                    if (item_length_in_bytes && value.length()) {

					                        // ScyllaDB uses one extra byte compared to DynamoDB for the bytes length

					                        (*item_length_in_bytes) += value.length() - 1;

					                    }

					                    rjson::value v = deserialize_item(value);

					                    if (attrs_to_get) {

					                        auto it = attrs_to_get->find(attr_name);

					                        if (it != attrs_to_get->end()) {

					                            // attrs_to_get may have asked for only part of

					                            // this attribute. hierarchy_filter() modifies v,

					                            // and returns false when nothing is to be kept.

					                            if (!hierarchy_filter(v, it->second)) {

					                                continue;

					                            }

					                        }

					                    }

					                    // item is expected to start empty, and attribute

					                    // names are unique so add() makes sense

					                    rjson::add_with_string_name(item, attr_name, std::move(v));

					                } else if (item_length_in_bytes) {

					                    (*item_length_in_bytes) += value_cast<bytes>(entry.second).length() - 1;

					                }

					            }

					        }

					        ++column_it;

					    }

					}

					std::optional<rjson::value> describe_single_item(schema_ptr schema,

					        const query::partition_slice& slice,

					        const cql3::selection::selection& selection,

					        const query::result& query_result,

					        const std::optional<attrs_to_get>& attrs_to_get,

					        uint64_t* item_length_in_bytes) {

					    rjson::value item = rjson::empty_object();

					    cql3::selection::result_set_builder builder(selection, gc_clock::now());

					    query::result_view::consume(query_result, slice, cql3::selection::result_set_builder::visitor(builder, *schema, selection));

					    auto result_set = builder.build();

					    if (result_set->empty()) {

					        if (item_length_in_bytes) {

					            // empty results is counted as having a minimal length (e.g. 1 byte).

					            (*item_length_in_bytes) += 1;

					        }

					        // If there is no matching item, we're supposed to return an empty

					        // object without an Item member - not one with an empty Item member

					        return {};

					    }

					    if (result_set->size() > 1) {

					        // If the result set contains multiple rows, the code should have

					        // called describe_multi_item(), not this function.

					        throw std::logic_error("describe_single_item() asked to describe multiple items");

					    }

					    describe_single_item(selection, *result_set->rows().begin(), attrs_to_get, item, item_length_in_bytes);

					    return item;

					}

					static void check_big_array(const rjson::value& val, int& size_left);

					static void check_big_object(const rjson::value& val, int& size_left);

					// For simplicity, we use a recursive implementation. This is fine because

					// Alternator limits the depth of JSONs it reads from inputs, and doesn't

					// add more than a couple of levels in its own output construction.

					bool is_big(const rjson::value& val, int big_size) {

					    if (val.IsString()) {

					        return ssize_t(val.GetStringLength()) > big_size;

					    } else if (val.IsObject()) {

					        check_big_object(val, big_size);

					        return big_size < 0;

					    } else if (val.IsArray()) {

					        check_big_array(val, big_size);

					        return big_size < 0;

					    }

					    return false;

					}

					static void check_big_array(const rjson::value& val, int& size_left) {

					    // Assume a fixed size of 10 bytes for each number, boolean, etc., or

					    // beginning of a sub-object. This doesn't have to be accurate.

					    size_left -= 10 * val.Size();

					    for (const auto& v : val.GetArray()) {

					        if (size_left < 0) {

					            return;

					        }

					        // Note that we avoid recursive calls for the leaves (anything except

					        // array or object) because usually those greatly outnumber the trunk.

					        if (v.IsString()) {

					            size_left -= v.GetStringLength();

					        } else if (v.IsObject()) {

					            check_big_object(v, size_left);

					        } else if (v.IsArray()) {

					            check_big_array(v, size_left);

					        }

					    }

					}

					static void check_big_object(const rjson::value& val, int& size_left) {

					    size_left -= 10 * val.MemberCount();

					    for (const auto& m : val.GetObject()) {

					        if (size_left < 0) {

					            return;

					        }

					        size_left -= m.name.GetStringLength();

					        if (m.value.IsString()) {

					            size_left -= m.value.GetStringLength();

					        } else if (m.value.IsObject()) {

					            check_big_object(m.value, size_left);

					        } else if (m.value.IsArray()) {

					            check_big_array(m.value, size_left);

					        }

					    }

					}

					void validate_table_name(std::string_view name, const char* source) {

					    if (name.length() < 3 || name.length() > max_table_name_length) {

					        throw api_error::validation(

					                format("{} must be at least 3 characters long and at most {} characters long", source, max_table_name_length));

					    }

					    if (!valid_table_name_chars(name)) {

					        throw api_error::validation(

					                format("{} must satisfy regular expression pattern: [a-zA-Z0-9_.-]+", source));

					    }

					}

					void validate_cdc_log_name_length(std::string_view table_name) {

					    if (cdc::log_name(table_name).length() > max_auxiliary_table_name_length) {

					        // CDC will add cdc_log_suffix ("_scylla_cdc_log") to the table name

					        // to create its log table, and this will exceed the maximum allowed

					        // length. To provide a more helpful error message, we assume that

					        // cdc::log_name() always adds a suffix of the same length.

					        int suffix_len = cdc::log_name(table_name).length() - table_name.length();

					        throw api_error::validation(fmt::format("Streams or vector search cannot be enabled on a table whose name is longer than {} characters: {}",

					            max_auxiliary_table_name_length - suffix_len, table_name));

					    }

					}

					body_writer make_streamed(rjson::value&& value) {

					    return [value = std::move(value)](output_stream<char>&& _out) mutable -> future<> {

					        auto out = std::move(_out);

					        std::exception_ptr ex;

					        try {

					            co_await rjson::print(value, out);

					        } catch (...) {

					            ex = std::current_exception();

					        }

					        co_await out.close();

					        co_await rjson::destroy_gently(std::move(value));

					        if (ex) {

					            co_await coroutine::return_exception_ptr(std::move(ex));

					        }

					    };

					}

					} // namespace alternator

									
										247

alternator/executor_util.hh
									
										Normal file
									
												View File
												
					@@ -0,0 +1,247 @@

					/*

					 * Copyright 2019-present ScyllaDB

					 */

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					// This header file, and the implementation file executor_util.cc, contain

					// various utility functions that are reused in many different operations

					// (API requests) across Alternator's code - in files such as executor.cc,

					// executor_read.cc, streams.cc, ttl.cc, and more. These utility functions

					// include things like extracting and validating pieces from a JSON request,

					// checking permissions, constructing auxiliary table names, and more.

					#pragma once

					#include <map>

					#include <optional>

					#include <string>

					#include <string_view>

					#include <unordered_map>

					#include <unordered_set>

					#include <seastar/core/future.hh>

					#include <seastar/util/noncopyable_function.hh>

					#include "utils/rjson.hh"

					#include "schema/schema_fwd.hh"

					#include "types/types.hh"

					#include "auth/permission.hh"

					#include "alternator/stats.hh"

					#include "alternator/attribute_path.hh"

					#include "utils/managed_bytes.hh"

					namespace query { class partition_slice; class result; }

					namespace cql3::selection { class selection; }

					namespace data_dictionary { class database; }

					namespace service { class storage_proxy; class client_state; }

					namespace alternator {

					/// The body_writer is used for streaming responses - where the response body

					/// is written in chunks to the output_stream. This allows for efficient

					/// handling of large responses without needing to allocate a large buffer in

					/// memory. It is one of the variants of executor::request_return_type.

					using body_writer = noncopyable_function<future<>(output_stream<char>&&)>;

					/// Get the value of an integer attribute, or an empty optional if it is

					/// missing. If the attribute exists, but is not an integer, a descriptive

					/// api_error is thrown.

					std::optional<int> get_int_attribute(const rjson::value& value, std::string_view attribute_name);

					/// Get the value of a string attribute, or a default value if it is missing.

					/// If the attribute exists, but is not a string, a descriptive api_error is

					/// thrown.

					std::string get_string_attribute(const rjson::value& value, std::string_view attribute_name, const char* default_return);

					/// Get the value of a boolean attribute, or a default value if it is missing.

					/// If the attribute exists, but is not a bool, a descriptive api_error is

					/// thrown.

					bool get_bool_attribute(const rjson::value& value, std::string_view attribute_name, bool default_return);

					/// Extract table name from a request.

					/// Most requests expect the table's name to be listed in a "TableName" field.

					/// get_table_name() returns the name or api_error in case the table name is

					/// missing or not a string.

					std::string get_table_name(const rjson::value& request);

					/// find_table_name() is like get_table_name() except that it returns an

					/// optional table name - it returns an empty optional when the TableName

					/// is missing from the request, instead of throwing as get_table_name()

					/// does. However, find_table_name() still throws if a TableName exists but

					/// is not a string.

					std::optional<std::string> find_table_name(const rjson::value& request);

					/// Extract table schema from a request.

					/// Many requests expect the table's name to be listed in a "TableName" field

					/// and need to look it up as an existing table. The get_table() function

					/// does this, with the appropriate validation and api_error in case the table

					/// name is missing, invalid or the table doesn't exist. If everything is

					/// successful, it returns the table's schema.

					schema_ptr get_table(service::storage_proxy& proxy, const rjson::value& request);

					/// This find_table() variant is like get_table() excepts that it returns a

					/// nullptr instead of throwing if the request does not mention a TableName.

					/// In other cases of errors (i.e., a table is mentioned but doesn't exist)

					/// this function throws too.

					schema_ptr find_table(service::storage_proxy& proxy, const rjson::value& request);

					/// This find_table() variant is like the previous one except that it takes

					/// the table name directly instead of a request object. It is used in cases

					/// where we already have the table name extracted from the request.

					schema_ptr find_table(service::storage_proxy& proxy, std::string_view table_name);

					// We would have liked to support table names up to 255 bytes, like DynamoDB.

					// But Scylla creates a directory whose name is the table's name plus 33

					// bytes (dash and UUID), and since directory names are limited to 255 bytes,

					// we need to limit table names to 222 bytes, instead of 255. See issue #4480.

					// We actually have two limits here,

					// * max_table_name_length is the limit that Alternator will impose on names

					//   of new Alternator tables.

					// * max_auxiliary_table_name_length is the potentially higher absolute limit

					//   that Scylla imposes on the names of auxiliary tables that Alternator

					//   wants to create internally - i.e. materialized views or CDC log tables.

					// The second limit might mean that it is not possible to add a GSI to an

					// existing table, because the name of the new auxiliary table may go over

					// the limit. The second limit is also one of the reasons why the first limit

					// is set lower than 222 - to have room to enable streams which add the extra

					// suffix "_scylla_cdc_log" to the table name.

					inline constexpr int max_table_name_length = 192;

					inline constexpr int max_auxiliary_table_name_length = 222;

					/// validate_table_name() validates the TableName parameter in a request - it

					/// should be called in CreateTable, and in other requests only when noticing

					/// that the named table doesn't exist. 

					/// The DynamoDB developer guide, https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.NamingRulesDataTypes.html#HowItWorks.NamingRules

					/// specifies that table "names must be between 3 and 255 characters long and

					/// can contain only the following characters: a-z, A-Z, 0-9, _ (underscore),

					/// - (dash), . (dot)". However, Alternator only allows max_table_name_length

					/// characters (see above) - not 255.

					/// validate_table_name() throws the appropriate api_error if this validation

					/// fails.

					void validate_table_name(std::string_view name, const char* source = "TableName");

					/// Validate that a CDC log table could be created for the base table with a

					/// given table_name, and if not, throw a user-visible api_error::validation.

					/// It is not possible to create a CDC log table if the table name is so long

					/// that adding the 15-character suffix "_scylla_cdc_log" (cdc_log_suffix)

					/// makes it go over max_auxiliary_table_name_length.

					/// Note that if max_table_name_length is set to less than 207 (which is

					/// max_auxiliary_table_name_length-15), then this function will never

					/// fail. However, it's still important to call it in UpdateTable, in case

					/// we have pre-existing tables with names longer than this to avoid #24598.

					void validate_cdc_log_name_length(std::string_view table_name);

					/// Checks if a keyspace, given by its name, is an Alternator keyspace.

					/// This just checks if the name begins in executor::KEYSPACE_NAME_PREFIX,

					/// a prefix that all keyspaces created by Alternator's CreateTable use.

					bool is_alternator_keyspace(std::string_view ks_name);

					/// Wraps db::get_tags_of_table() and throws api_error::validation if the

					/// table is missing the tags extension.

					const std::map<sstring, sstring>& get_tags_of_table_or_throw(schema_ptr schema);

					/// Returns a type object representing the type of the ":attrs" column used

					/// by Alternator to store all non-key attribute. This type is a map from

					/// string (attribute name) to bytes (serialized attribute value).

					map_type attrs_type();

					// In DynamoDB index names are local to a table, while in Scylla, materialized

					// view names are global (in a keyspace). So we need to compose a unique name

					// for the view taking into account both the table's name and the index name.

					// We concatenate the table and index name separated by a delim character

					// (a character not allowed by DynamoDB in ordinary table names, default: ":").

					// The downside of this approach is that it limits the sum of the lengths,

					// instead of each component individually as DynamoDB does.

					// The view_name() function assumes the table_name has already been validated

					// but validates the legality of index_name and the combination of both.

					std::string view_name(std::string_view table_name, std::string_view index_name,

					        const std::string& delim = ":", bool validate_len = true);

					std::string gsi_name(std::string_view table_name, std::string_view index_name,

					        bool validate_len = true);

					std::string lsi_name(std::string_view table_name, std::string_view index_name,

					        bool validate_len = true);

					/// After calling pk_from_json() and ck_from_json() to extract the pk and ck

					/// components of a key, and if that succeeded, call check_key() to further

					/// check that the key doesn't have any spurious components.

					void check_key(const rjson::value& key, const schema_ptr& schema);

					/// Fail with api_error::validation if the expression if has unused attribute

					/// names or values. This is how DynamoDB behaves, so we do too.

					void verify_all_are_used(const rjson::value* field,

					        const std::unordered_set<std::string>& used,

					        const char* field_name,

					        const char* operation);

					/// Check CQL's Role-Based Access Control (RBAC) permission (MODIFY,

					/// SELECT, DROP, etc.) on the given table. When permission is denied an

					/// appropriate user-readable api_error::access_denied is thrown.

					future<> verify_permission(bool enforce_authorization, bool warn_authorization, const service::client_state&, const schema_ptr&, auth::permission, stats& stats);

					/// Similar to verify_permission() above, but just for CREATE operations.

					/// Those do not operate on any specific table, so require permissions on

					/// ALL KEYSPACES instead of any specific table.

					future<> verify_create_permission(bool enforce_authorization, bool warn_authorization, const service::client_state&, stats& stats);

					// Sets a KeySchema JSON array inside the given parent object describing the

					// key attributes of the given schema as HASH or RANGE keys. Additionally,

					// adds mappings from key attribute names to their DynamoDB type string into

					// attribute_types.

					void describe_key_schema(rjson::value& parent, const schema&, std::unordered_map<std::string, std::string>* attribute_types = nullptr, const std::map<sstring, sstring>* tags = nullptr);

					/// is_big() checks approximately if the given JSON value is "bigger" than

					/// the given big_size number of bytes. The goal is to *quickly* detect

					/// oversized JSON that, for example, is too large to be serialized to a

					/// contiguous string - we don't need an accurate size for that. Moreover,

					/// as soon as we detect that the JSON is indeed "big", we can return true

					/// and don't need to continue calculating its exact size.

					bool is_big(const rjson::value& val, int big_size = 100'000);

					/// try_get_internal_table() handles the special case that the given table_name

					/// begins with INTERNAL_TABLE_PREFIX (".scylla.alternator."). In that case,

					/// this function assumes that the rest of the name refers to an internal

					/// Scylla table (e.g., system table) and returns the schema of that table -

					/// or an exception if it doesn't exist. Otherwise, if table_name does not

					/// start with INTERNAL_TABLE_PREFIX, this function returns an empty schema_ptr

					/// and the caller should look for a normal Alternator table with that name.

					schema_ptr try_get_internal_table(const data_dictionary::database& db, std::string_view table_name);

					/// get_table_from_batch_request() is used by batch write/read operations to

					/// look up the schema for a table named in a batch request, by the JSON member

					/// name (which is the table name in a BatchWriteItem or BatchGetItem request).

					schema_ptr get_table_from_batch_request(const service::storage_proxy& proxy, const rjson::value::ConstMemberIterator& batch_request);

					/// Returns (or lazily creates) the per-table stats object for the given schema.

					/// If the table has been deleted, returns a temporary stats object.

					lw_shared_ptr<stats> get_stats_from_schema(service::storage_proxy& sp, const schema& schema);

					/// Writes one item's attributes into `item` from the given selection result

					/// row. If include_all_embedded_attributes is true, all attributes from the

					/// ATTRS_COLUMN map column are included regardless of attrs_to_get.

					void describe_single_item(const cql3::selection::selection&,

					    const std::vector<managed_bytes_opt>&,

					    const std::optional<attrs_to_get>&,

					    rjson::value&,

					    uint64_t* item_length_in_bytes = nullptr,

					    bool include_all_embedded_attributes = false);

					/// Converts a single result row to a JSON item, or returns an empty optional

					/// if the result is empty.

					std::optional<rjson::value> describe_single_item(schema_ptr,

					    const query::partition_slice&,

					    const cql3::selection::selection&,

					    const query::result&,

					    const std::optional<attrs_to_get>&,

					    uint64_t* item_length_in_bytes = nullptr);

					/// Make a body_writer (function that can write output incrementally to the

					/// HTTP stream) from the given JSON object.

					/// Note: only useful for (very) large objects as there are overhead issues

					/// with this as well, but for massive lists of return objects this can

					/// help avoid large allocations/many re-allocs.

					body_writer make_streamed(rjson::value&&);

					} // namespace alternator

									
										8

alternator/expressions.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "expressions.hh"

					#include "expressions.hh"

					@@ -744,7 +744,7 @@ void validate_attr_name_length(std::string_view supplementary_context, size_t at

					    constexpr const size_t DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX = 65535;

					    constexpr const size_t DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX = 65535;

					    const size_t max_length = is_key ? DYNAMODB_KEY_ATTR_NAME_SIZE_MAX : DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX;

					    const size_t max_length = is_key ? DYNAMODB_KEY_ATTR_NAME_SIZE_MAX : DYNAMODB_NONKEY_ATTR_NAME_SIZE_MAX;

					    if (attr_name_length > max_length) {

					    if (attr_name_length > max_length || attr_name_length == 0) {

					        std::string error_msg;

					        std::string error_msg;

					        if (!error_msg_prefix.empty()) {

					        if (!error_msg_prefix.empty()) {

					            error_msg += error_msg_prefix;

					            error_msg += error_msg_prefix;

					@@ -754,7 +754,11 @@ void validate_attr_name_length(std::string_view supplementary_context, size_t at

					            error_msg += supplementary_context;

					            error_msg += supplementary_context;

					            error_msg += " - ";

					            error_msg += " - ";

					        }

					        }

					        if (attr_name_length == 0) {

					            error_msg += "Empty attribute name";

					        } else {

					            error_msg += fmt::format("Attribute name is too large, must be less than {} bytes", std::to_string(max_length + 1));

					            error_msg += fmt::format("Attribute name is too large, must be less than {} bytes", std::to_string(max_length + 1));

					        }

					        throw api_error::validation(error_msg);

					        throw api_error::validation(error_msg);

					    }

					    }

					}

					}

2

alternator/expressions.g

View File

  */
 /*
- * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
+ * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1
  */
 /*

									
										2

alternator/expressions.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

alternator/expressions_types.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

alternator/extract_from_attrs.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										6

alternator/http_compression.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "alternator/http_compression.hh"

					#include "alternator/http_compression.hh"

					@@ -264,7 +264,7 @@ private:

					    }

					    }

					};

					};

					executor::body_writer compress(response_compressor::compression_type ct, const db::config& cfg, executor::body_writer&& bw) {

					body_writer compress(response_compressor::compression_type ct, const db::config& cfg, body_writer&& bw) {

					    return [bw = std::move(bw), ct, level = cfg.alternator_response_gzip_compression_level()](output_stream<char>&& out) mutable -> future<> {

					    return [bw = std::move(bw), ct, level = cfg.alternator_response_gzip_compression_level()](output_stream<char>&& out) mutable -> future<> {

					        output_stream_options opts;

					        output_stream_options opts;

					        opts.trim_to_size = true;

					        opts.trim_to_size = true;

					@@ -287,7 +287,7 @@ executor::body_writer compress(response_compressor::compression_type ct, const d

					    };

					    };

					}

					}

					future<std::unique_ptr<http::reply>> response_compressor::generate_reply(std::unique_ptr<http::reply> rep, sstring accept_encoding, const char* content_type, executor::body_writer&& body_writer) {

					future<std::unique_ptr<http::reply>> response_compressor::generate_reply(std::unique_ptr<http::reply> rep, sstring accept_encoding, const char* content_type, body_writer&& body_writer) {

					    response_compressor::compression_type ct = find_compression(accept_encoding, std::numeric_limits<size_t>::max());

					    response_compressor::compression_type ct = find_compression(accept_encoding, std::numeric_limits<size_t>::max());

					    if (ct != response_compressor::compression_type::none) {

					    if (ct != response_compressor::compression_type::none) {

					        rep->add_header("Content-Encoding", get_encoding_name(ct));

					        rep->add_header("Content-Encoding", get_encoding_name(ct));

									
										4

alternator/http_compression.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

					@@ -85,7 +85,7 @@ public:

					    future<std::unique_ptr<http::reply>> generate_reply(std::unique_ptr<http::reply> rep,

					    future<std::unique_ptr<http::reply>> generate_reply(std::unique_ptr<http::reply> rep,

					         sstring accept_encoding, const char* content_type, std::string&& response_body);

					         sstring accept_encoding, const char* content_type, std::string&& response_body);

					    future<std::unique_ptr<http::reply>> generate_reply(std::unique_ptr<http::reply> rep,

					    future<std::unique_ptr<http::reply>> generate_reply(std::unique_ptr<http::reply> rep,

					         sstring accept_encoding, const char* content_type, executor::body_writer&& body_writer);

					         sstring accept_encoding, const char* content_type, body_writer&& body_writer);

					};

					};

					}

					}

									
										2

alternator/parsed_expression_cache.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "expressions.hh"

					#include "expressions.hh"

									
										2

alternator/rmw_operation.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										4

alternator/serialization.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "utils/base64.hh"

					#include "utils/base64.hh"

					@@ -14,12 +14,12 @@

					#include "types/concrete_types.hh"

					#include "types/concrete_types.hh"

					#include "types/json_utils.hh"

					#include "types/json_utils.hh"

					#include "mutation/position_in_partition.hh"

					#include "mutation/position_in_partition.hh"

					#include "alternator/executor_util.hh"

					static logging::logger slogger("alternator-serialization");

					static logging::logger slogger("alternator-serialization");

					namespace alternator {

					namespace alternator {

					bool is_alternator_keyspace(const sstring& ks_name);

					type_info type_info_from_string(std::string_view type) {

					type_info type_info_from_string(std::string_view type) {

					    static thread_local const std::unordered_map<std::string_view, type_info> type_infos = {

					    static thread_local const std::unordered_map<std::string_view, type_info> type_infos = {

									
										2

alternator/serialization.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										136

alternator/server.cc
									
												View File
												
					@@ -3,10 +3,12 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "alternator/server.hh"

					#include "alternator/server.hh"

					#include "audit/audit.hh"

					#include "alternator/executor_util.hh"

					#include "gms/application_state.hh"

					#include "gms/application_state.hh"

					#include "utils/log.hh"

					#include "utils/log.hh"

					#include <fmt/ranges.h>

					#include <fmt/ranges.h>

					@@ -142,7 +144,7 @@ public:

					                    return _response_compressor.generate_reply(std::move(rep), std::move(accept_encoding),

					                    return _response_compressor.generate_reply(std::move(rep), std::move(accept_encoding),

					                                                               REPLY_CONTENT_TYPE, std::move(str));

					                                                               REPLY_CONTENT_TYPE, std::move(str));

					                },

					                },

					                [&] (executor::body_writer&& body_writer) {

					                [&] (body_writer&& body_writer) {

					                    return _response_compressor.generate_reply(std::move(rep), std::move(accept_encoding),

					                    return _response_compressor.generate_reply(std::move(rep), std::move(accept_encoding),

					                                                               REPLY_CONTENT_TYPE, std::move(body_writer));

					                                                               REPLY_CONTENT_TYPE, std::move(body_writer));

					                },

					                },

					@@ -699,6 +701,17 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

					        // for such a size.

					        // for such a size.

					        co_return api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", request_content_length_limit));

					        co_return api_error::payload_too_large(fmt::format("Request content length limit of {} bytes exceeded", request_content_length_limit));

					    }

					    }

					    // Check the concurrency limit early, before acquiring memory and

					    // reading the request body, to avoid piling up memory from excess

					    // requests that will be rejected anyway. This mirrors the CQL

					    // transport which also checks concurrency before memory acquisition

					    // (transport/server.cc).

					    if (_pending_requests.get_count() >= _max_concurrent_requests) {

					        _executor._stats.requests_shed++;

					        co_return api_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}", _pending_requests.get_count()));

					    }

					    _pending_requests.enter();

					    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

					    // JSON parsing can allocate up to roughly 2x the size of the raw

					    // JSON parsing can allocate up to roughly 2x the size of the raw

					    // document, + a couple of bytes for maintenance.

					    // document, + a couple of bytes for maintenance.

					    // If the Content-Length of the request is not available, we assume

					    // If the Content-Length of the request is not available, we assume

					@@ -760,12 +773,6 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

					        _executor._stats.unsupported_operations++;

					        _executor._stats.unsupported_operations++;

					        co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));

					        co_return api_error::unknown_operation(fmt::format("Unsupported operation {}", op));

					    }

					    }

					    if (_pending_requests.get_count() >= _max_concurrent_requests) {

					        _executor._stats.requests_shed++;

					        co_return api_error::request_limit_exceeded(format("too many in-flight requests (configured via max_concurrent_requests_per_shard): {}", _pending_requests.get_count()));

					    }

					    _pending_requests.enter();

					    auto leave = defer([this] () noexcept { _pending_requests.leave(); });

					    executor::client_state client_state(service::client_state::external_tag(),

					    executor::client_state client_state(service::client_state::external_tag(),

					        _auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());

					        _auth_service, &_sl_controller, _timeout_config.current_values(), req->get_client_address());

					    if (!username.empty()) {

					    if (!username.empty()) {

					@@ -784,8 +791,21 @@ future<executor::request_return_type> server::handle_api_request(std::unique_ptr

					        if (!json_request.IsObject()) {

					        if (!json_request.IsObject()) {

					            co_return api_error::validation("Request content must be an object");

					            co_return api_error::validation("Request content must be an object");

					        }

					        }

					                co_return co_await callback(_executor, client_state, trace_state,

					        std::unique_ptr<audit::audit_info_alternator> audit_info;

					                    make_service_permit(std::move(units)), std::move(json_request), std::move(req));

					        std::exception_ptr ex = {};

					        executor::request_return_type ret;

					        try {

					            ret = co_await callback(_executor, client_state, trace_state, make_service_permit(std::move(units)), std::move(json_request), std::move(req), audit_info);

					        } catch (...) {

					            ex = std::current_exception();

					        }

					        if (audit_info) {

					            co_await audit::inspect(*audit_info, client_state, ex != nullptr);

					        }

					        if (ex) {

					            co_return coroutine::exception(std::move(ex));

					        }

					        co_return ret;

					    };

					    };

					    co_return co_await _sl_controller.with_user_service_level(user, std::ref(f));

					    co_return co_await _sl_controller.with_user_service_level(user, std::ref(f));

					}

					}

					@@ -829,77 +849,77 @@ server::server(executor& exec, service::storage_proxy& proxy, gms::gossiper& gos

					        , _pending_requests("alternator::server::pending_requests")

					        , _pending_requests("alternator::server::pending_requests")

					        , _timeout_config(_proxy.data_dictionary().get_config())

					        , _timeout_config(_proxy.data_dictionary().get_config())

					      , _callbacks{

					      , _callbacks{

					        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"CreateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.create_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DescribeTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.describe_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DeleteTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.delete_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"UpdateTable", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.update_table(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"PutItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.put_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"UpdateItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.update_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"GetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DeleteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.delete_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"ListTables", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.list_tables(client_state, std::move(permit), std::move(json_request));

					            return e.list_tables(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"Scan", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.scan(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DescribeEndpoints", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"));

					            return e.describe_endpoints(client_state, std::move(permit), std::move(json_request), req->get_header("Host"), audit_info);

					        }},

					        }},

					        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"BatchWriteItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.batch_write_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"BatchGetItem", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.batch_get_item(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"Query", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.query(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"TagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.tag_resource(client_state, std::move(permit), std::move(json_request));

					            return e.tag_resource(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"UntagResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.untag_resource(client_state, std::move(permit), std::move(json_request));

					            return e.untag_resource(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"ListTagsOfResource", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request));

					            return e.list_tags_of_resource(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"UpdateTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"UpdateTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.update_time_to_live(client_state, std::move(permit), std::move(json_request));

					            return e.update_time_to_live(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DescribeTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DescribeTimeToLive", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.describe_time_to_live(client_state, std::move(permit), std::move(json_request));

					            return e.describe_time_to_live(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"ListStreams", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.list_streams(client_state, std::move(permit), std::move(json_request));

					            return e.list_streams(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DescribeStream", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.describe_stream(client_state, std::move(permit), std::move(json_request));

					            return e.describe_stream(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"GetShardIterator", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request));

					            return e.get_shard_iterator(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"GetRecords", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request));

					            return e.get_records(client_state, std::move(trace_state), std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					        {"DescribeContinuousBackups", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req) {

					        {"DescribeContinuousBackups", [] (executor& e, executor::client_state& client_state, tracing::trace_state_ptr trace_state, service_permit permit, rjson::value json_request, std::unique_ptr<request> req, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					            return e.describe_continuous_backups(client_state, std::move(permit), std::move(json_request));

					            return e.describe_continuous_backups(client_state, std::move(permit), std::move(json_request), audit_info);

					        }},

					        }},

					    } {

					    } {

					}

					}

									
										4

alternator/server.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

					@@ -34,7 +34,7 @@ class server : public peering_sharded_service<server> {

					    // DynamoDB also has the same limit set to 16 MB.

					    // DynamoDB also has the same limit set to 16 MB.

					    static constexpr size_t request_content_length_limit = 16*MB;

					    static constexpr size_t request_content_length_limit = 16*MB;

					    using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,

					    using alternator_callback = std::function<future<executor::request_return_type>(executor&, executor::client_state&,

					            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>)>;

					            tracing::trace_state_ptr, service_permit, rjson::value, std::unique_ptr<http::request>, std::unique_ptr<audit::audit_info_alternator>&)>;

					    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

					    using alternator_callbacks_map = std::unordered_map<std::string_view, alternator_callback>;

					    httpd::http_server _http_server;

					    httpd::http_server _http_server;

									
										2

alternator/stats.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "stats.hh"

					#include "stats.hh"

									
										2

alternator/stats.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

745

alternator/streams.cc

View File

File diff suppressed because it is too large Load Diff

									
										62

alternator/streams.hh
									
										Normal file
									
												View File
												
					@@ -0,0 +1,62 @@

					/*

					 * Copyright 2026-present ScyllaDB

					 */

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					#pragma once

					#include "utils/chunked_vector.hh"

					#include "cdc/generation.hh"

					#include <generator>

					namespace cdc {

					    class stream_id;

					}

					namespace alternator {

					    class stream_id_range {

					        // helper class for manipulating (possibly wrapped around) range of stream_ids

					        // it holds one or two ranges [lo1, end1) and [lo2, end2)

					        // if the range doesn't wrap around, then lo2 == end2 == items.end()

					        // if the range wraps around, then

					        // `lo1 == items.begin() and end2 == items.end()` must be true

					        // the object doesn't own `items`, but it does manipulate it - it will

					        // reorder elements (so both ranges were next to each other) and sort them by unsigned comparison

					        // usage - create an object with needed ranges. before iteration call `prepare_for_iterating` method -

					        // it will reorder elements of `items` array to what is needed and then call begin / end pair.

					        // note - `items` array will be modified - elements will be reordered, but no elements will be added or removed.

					        // `items` array must stay intact as long as iteration is in progress.

					        utils::chunked_vector<cdc::stream_id>::iterator _lo1 = {}, _end1 = {}, _lo2 = {}, _end2 = {};

					        const cdc::stream_id* _skip_to = nullptr;

					        bool _prepared = false;

					    public:

					        stream_id_range(

					                utils::chunked_vector<cdc::stream_id> &items,

					                utils::chunked_vector<cdc::stream_id>::iterator lo1,

					                utils::chunked_vector<cdc::stream_id>::iterator end1);

					        stream_id_range(

					                utils::chunked_vector<cdc::stream_id> &items,

					                utils::chunked_vector<cdc::stream_id>::iterator lo1,

					                utils::chunked_vector<cdc::stream_id>::iterator end1,

					                utils::chunked_vector<cdc::stream_id>::iterator lo2,

					                utils::chunked_vector<cdc::stream_id>::iterator end2);

					        void set_starting_position(const cdc::stream_id &update_to);

					        // Must be called after construction and after set_starting_position()

					        // (if used), but before begin()/end() iteration.

					        void prepare_for_iterating();

					        utils::chunked_vector<cdc::stream_id>::iterator begin() const { return _lo1; }

					        utils::chunked_vector<cdc::stream_id>::iterator end() const { return _end1; }

					    };

					    stream_id_range find_children_range_from_parent_token(

					        const utils::chunked_vector<cdc::stream_id>& parent_streams,

					        utils::chunked_vector<cdc::stream_id>& current_streams,

					        cdc::stream_id parent,

					        bool uses_tablets

					    );

					}

									
										15

alternator/ttl.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include <chrono>

					#include <chrono>

					@@ -44,6 +44,7 @@

					#include "cql3/query_options.hh"

					#include "cql3/query_options.hh"

					#include "cql3/column_identifier.hh"

					#include "cql3/column_identifier.hh"

					#include "alternator/executor.hh"

					#include "alternator/executor.hh"

					#include "alternator/executor_util.hh"

					#include "alternator/controller.hh"

					#include "alternator/controller.hh"

					#include "alternator/serialization.hh"

					#include "alternator/serialization.hh"

					#include "alternator/ttl_tag.hh"

					#include "alternator/ttl_tag.hh"

					@@ -58,13 +59,17 @@ static logging::logger tlogger("alternator_ttl");

					namespace alternator {

					namespace alternator {

					future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {

					future<executor::request_return_type> executor::update_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					    _stats.api_operations.update_time_to_live++;

					    _stats.api_operations.update_time_to_live++;

					    if (!_proxy.features().alternator_ttl) {

					    if (!_proxy.features().alternator_ttl) {

					        co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Upgrade all nodes to a version that supports it.");

					        co_return api_error::unknown_operation("UpdateTimeToLive not yet supported. Upgrade all nodes to a version that supports it.");

					    }

					    }

					    schema_ptr schema = get_table(_proxy, request);

					    schema_ptr schema = get_table(_proxy, request);

					    maybe_audit(audit_info, audit::statement_category::DDL,

					                schema->ks_name(), schema->cf_name(), "UpdateTimeToLive", request);

					    rjson::value* spec = rjson::find(request, "TimeToLiveSpecification");

					    rjson::value* spec = rjson::find(request, "TimeToLiveSpecification");

					    if (!spec || !spec->IsObject()) {

					    if (!spec || !spec->IsObject()) {

					        co_return api_error::validation("UpdateTimeToLive missing mandatory TimeToLiveSpecification");

					        co_return api_error::validation("UpdateTimeToLive missing mandatory TimeToLiveSpecification");

					@@ -114,9 +119,13 @@ future<executor::request_return_type> executor::update_time_to_live(client_state

					    co_return rjson::print(std::move(response));

					    co_return rjson::print(std::move(response));

					}

					}

					future<executor::request_return_type> executor::describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request) {

					future<executor::request_return_type> executor::describe_time_to_live(client_state& client_state, service_permit permit, rjson::value request, std::unique_ptr<audit::audit_info_alternator>& audit_info) {

					    _stats.api_operations.describe_time_to_live++;

					    _stats.api_operations.describe_time_to_live++;

					    schema_ptr schema = get_table(_proxy, request);

					    schema_ptr schema = get_table(_proxy, request);

					    maybe_audit(audit_info, audit::statement_category::QUERY,

					                schema->ks_name(), schema->cf_name(), "DescribeTimeToLive", request);

					    std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);

					    std::map<sstring, sstring> tags_map = get_tags_of_table_or_throw(schema);

					    rjson::value desc = rjson::empty_object();

					    rjson::value desc = rjson::empty_object();

					    auto i = tags_map.find(TTL_TAG_KEY);

					    auto i = tags_map.find(TTL_TAG_KEY);

									
										2

alternator/ttl.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

alternator/ttl_tag.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										122

api/api-doc/storage_service.json
									
												View File
												
					@@ -743,7 +743,7 @@

					               "parameters":[

					               "parameters":[

					                  {

					                  {

					                     "name":"tag",

					                     "name":"tag",

					                     "description":"the tag given to the snapshot",

					                     "description":"The snapshot tag to delete. If omitted, all snapshots are removed.",

					                     "required":false,

					                     "required":false,

					                     "allowMultiple":false,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "type":"string",

					@@ -751,7 +751,7 @@

					                  },

					                  },

					                  {

					                  {

					                     "name":"kn",

					                     "name":"kn",

					                     "description":"Comma-separated keyspaces name that their snapshot will be deleted",

					                     "description":"Comma-separated list of keyspace names to delete snapshots from. If omitted, snapshots are deleted from all keyspaces.",

					                     "required":false,

					                     "required":false,

					                     "allowMultiple":false,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "type":"string",

					@@ -759,7 +759,7 @@

					                  },

					                  },

					                  {

					                  {

					                     "name":"cf",

					                     "name":"cf",

					                     "description":"an optional table name that its snapshot will be deleted",

					                     "description":"A table name used to filter which table's snapshots are deleted. If omitted or empty, snapshots for all tables are eligible. When provided together with 'kn', the table is looked up in each listed keyspace independently. For secondary indexes, the logical index name (e.g. 'myindex') can be used and is resolved automatically.",

					                     "required":false,

					                     "required":false,

					                     "allowMultiple":false,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "type":"string",

					@@ -3166,6 +3166,83 @@

					         ]

					         ]

					      },

					      },

					      {

					         "path":"/storage_service/vnode_tablet_migrations/keyspaces/{keyspace}",

					         "operations":[{

					             "method":"POST",

					             "summary":"Start vnodes-to-tablets migration for all tables in a keyspace",

					             "type":"void",

					             "nickname":"create_vnode_tablet_migration",

					             "produces":["application/json"],

					             "parameters":[

					                 {

					                     "name":"keyspace",

					                     "description":"Keyspace name",

					                     "required":true,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "paramType":"path"

					                 }

					             ]

					         },

					         {

					             "method":"GET",

					             "summary":"Get a keyspace's vnodes-to-tablets migration status",

					             "type":"vnode_tablet_migration_status",

					             "nickname":"get_vnode_tablet_migration",

					             "produces":["application/json"],

					             "parameters":[

					                 {

					                     "name":"keyspace",

					                     "description":"Keyspace name",

					                     "required":true,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "paramType":"path"

					                 }

					             ]

					         }]

					      },

					      {

					         "path":"/storage_service/vnode_tablet_migrations/node/storage_mode",

					         "operations":[{

					             "method":"PUT",

					             "summary":"Set the intended storage mode for this node during vnodes-to-tablets migration",

					             "type":"void",

					             "nickname":"set_vnode_tablet_migration_node_storage_mode",

					             "produces":["application/json"],

					             "parameters":[

					                 {

					                     "name":"intended_mode",

					                     "description":"Intended storage mode (tablets or vnodes)",

					                     "required":true,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "paramType":"query"

					                 }

					             ]

					         }]

					      },

					      {

					         "path":"/storage_service/vnode_tablet_migrations/keyspaces/{keyspace}/finalization",

					         "operations":[{

					             "method":"POST",

					             "summary":"Finalize vnodes-to-tablets migration for all tables in a keyspace",

					             "type":"void",

					             "nickname":"finalize_vnode_tablet_migration",

					             "produces":["application/json"],

					             "parameters":[

					                 {

					                     "name":"keyspace",

					                     "description":"Keyspace name",

					                     "required":true,

					                     "allowMultiple":false,

					                     "type":"string",

					                     "paramType":"path"

					                 }

					             ]

					         }]

					      },

					      {

					      {

					         "path":"/storage_service/quiesce_topology",

					         "path":"/storage_service/quiesce_topology",

					         "operations":[

					         "operations":[

					@@ -3783,6 +3860,45 @@

					               "description":"The resulting compression ratio (estimated on a random sample of files)"

					               "description":"The resulting compression ratio (estimated on a random sample of files)"

					            }

					            }

					         }

					         }

					      },

					      "vnode_tablet_migration_node_status":{

					         "id":"vnode_tablet_migration_node_status",

					         "description":"Node storage mode info during vnodes-to-tablets migration",

					         "properties":{

					            "host_id":{

					               "type":"string",

					               "description":"The host ID"

					            },

					            "current_mode":{

					               "type":"string",

					               "description":"The current storage mode: `vnodes` or `tablets`"

					            },

					            "intended_mode":{

					               "type":"string",

					               "description":"The intended storage mode: `vnodes` or `tablets`"

					            }

					         }

					      },

					      "vnode_tablet_migration_status":{

					         "id":"vnode_tablet_migration_status",

					         "description":"Vnodes-to-tablets migration status for a keyspace",

					         "properties":{

					            "keyspace":{

					               "type":"string",

					               "description":"The keyspace name"

					            },

					            "status":{

					               "type":"string",

					               "description":"The migration status: `vnodes` (not started), `migrating_to_tablets` (in progress), or `tablets` (complete)"

					            },

					            "nodes":{

					               "type":"array",

					               "items":{

					                  "$ref":"vnode_tablet_migration_node_status"

					               },

					               "description":"Per-node storage mode information. Empty if the keyspace is not being migrated."

					            }

					         }

					      }

					      }

					   }

					   }

					}

					}

									
										2

api/api.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "api.hh"

					#include "api.hh"

									
										2

api/api.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/api_init.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/authorization_cache.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "api/api-doc/authorization_cache.json.hh"

					#include "api/api-doc/authorization_cache.json.hh"

									
										2

api/authorization_cache.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/cache_service.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "cache_service.hh"

					#include "cache_service.hh"

									
										2

api/cache_service.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/client_routes.cc
									
												View File
												
					@@ -4,7 +4,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					 #include <seastar/http/short_streams.hh>

					 #include <seastar/http/short_streams.hh>

									
										2

api/client_routes.hh
									
												View File
												
					@@ -4,7 +4,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/collectd.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "collectd.hh"

					#include "collectd.hh"

									
										2

api/collectd.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/column_family.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include <fmt/ranges.h>

					#include <fmt/ranges.h>

									
										2

api/column_family.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/commitlog.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "commitlog.hh"

					#include "commitlog.hh"

									
										2

api/commitlog.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/compaction_manager.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include <seastar/core/coroutine.hh>

					#include <seastar/core/coroutine.hh>

									
										2

api/compaction_manager.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										19

api/config.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "api/api.hh"

					#include "api/api.hh"

					@@ -82,15 +82,16 @@ void set_config(std::shared_ptr < api_registry_builder20 > rb, http_context& ctx

					        });

					        });

					    });

					    });

					    cs::find_config_id.set(r, [&cfg] (const_req r) {

					    cs::find_config_id.set(r, [&cfg] (std::unique_ptr<http::request> req) -> future<json::json_return_type> {

					        auto id = r.get_path_param("id");

					        auto id = req->get_path_param("id");

					        for (auto&& cfg_ref : cfg.values()) {

					        auto value = co_await cfg.value_as_json_string_for_name(id);

					            auto&& cfg = cfg_ref.get();

					        if (!value) {

					            if (id == cfg.name()) {

					                return cfg.value_as_json();

					            }

					        }

					            throw bad_param_exception(sstring("No such config entry: ") + id);

					            throw bad_param_exception(sstring("No such config entry: ") + id);

					        }

					        //value is already a json string 

					        json::json_return_type ret{json::json_void()};

					        ret._res = std::move(*value);

					        co_return ret;

					    });

					    });

					    sp::get_rpc_timeout.set(r, [&cfg](const_req req)  {

					    sp::get_rpc_timeout.set(r, [&cfg](const_req req)  {

									
										2

api/config.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/cql_server_test.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "build_mode.hh"

					#include "build_mode.hh"

									
										2

api/cql_server_test.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#ifndef SCYLLA_BUILD_MODE_RELEASE

					#ifndef SCYLLA_BUILD_MODE_RELEASE

									
										2

api/endpoint_snitch.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "locator/snitch_base.hh"

					#include "locator/snitch_base.hh"

									
										2

api/endpoint_snitch.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										4

api/error_injection.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "api/api-doc/error_injection.json.hh"

					#include "api/api-doc/error_injection.json.hh"

					@@ -23,7 +23,7 @@ void set_error_injection(http_context& ctx, routes& r) {

					    hf::enable_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

					    hf::enable_injection.set(r, [](std::unique_ptr<request> req) -> future<json::json_return_type> {

					        sstring injection = req->get_path_param("injection");

					        sstring injection = req->get_path_param("injection");

					        bool one_shot = req->get_query_param("one_shot") == "True";

					        bool one_shot = strcasecmp(req->get_query_param("one_shot").c_str(), "true") == 0;

					        auto params = co_await util::read_entire_stream_contiguous(*req->content_stream);

					        auto params = co_await util::read_entire_stream_contiguous(*req->content_stream);

					        const size_t max_params_size = 1024 * 1024;

					        const size_t max_params_size = 1024 * 1024;

									
										2

api/error_injection.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/failure_detector.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include "failure_detector.hh"

					#include "failure_detector.hh"

									
										2

api/failure_detector.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

									
										2

api/gossiper.cc
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#include <seastar/core/coroutine.hh>

					#include <seastar/core/coroutine.hh>

									
										2

api/gossiper.hh
									
												View File
												
					@@ -3,7 +3,7 @@

					 */

					 */

					/*

					/*

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0

					 * SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.1

					 */

					 */

					#pragma once

					#pragma once

Compare commits

719 Commits debug_form ... master

8 .github/CODEOWNERS vendored Unescape Escape View File

33 .github/copilot-instructions.md vendored Unescape Escape View File

14 .github/instructions/cpp.instructions.md vendored Unescape Escape View File

4 .github/instructions/python.instructions.md vendored Unescape Escape View File

2 .github/scripts/check-license.py vendored Unescape Escape View File

5 .github/workflows/add-label-when-promoted.yaml vendored Unescape Escape View File

5 .github/workflows/backport-pr-fixes-validation.yaml vendored Unescape Escape View File

5 .github/workflows/build-scylla.yaml vendored Unescape Escape View File

5 .github/workflows/call_validate_pr_author_email.yml vendored Unescape Escape View File

7 .github/workflows/check-license-header.yaml vendored Unescape Escape View File

3 .github/workflows/clang-nightly.yaml vendored Unescape Escape View File

3 .github/workflows/clang-tidy.yaml vendored Unescape Escape View File

5 .github/workflows/close_issue_for_scylla_associate.yml vendored Unescape Escape View File

6 .github/workflows/codespell.yaml vendored Unescape Escape View File

38 .github/workflows/compare-build-systems.yaml vendored Normal file Unescape Escape View File

5 .github/workflows/conflict_reminder.yaml vendored Unescape Escape View File

7 .github/workflows/differential-shellcheck.yaml vendored Unescape Escape View File

7 .github/workflows/docs-pages.yaml vendored Unescape Escape View File

7 .github/workflows/docs-pr.yaml vendored Unescape Escape View File

7 .github/workflows/docs-validate-metrics.yml vendored Unescape Escape View File

5 .github/workflows/iwyu.yaml vendored Unescape Escape View File

3 .github/workflows/make-pr-ready-for-review.yaml vendored Unescape Escape View File

4 .github/workflows/pr-require-backport-label.yaml vendored Unescape Escape View File

5 .github/workflows/read-toolchain.yaml vendored Unescape Escape View File

5 .github/workflows/seastar.yaml vendored Unescape Escape View File

5 .github/workflows/sync-labels.yaml vendored Unescape Escape View File

5 .github/workflows/trigger_ci.yaml vendored Unescape Escape View File

5 .github/workflows/urgent_issue_reminder.yml vendored Unescape Escape View File

16 AGENTS.md Normal file Unescape Escape View File

82 CMakeLists.txt Unescape Escape View File

46 LICENSE-ScyllaDB-Source-Available.md Unescape Escape View File

2 abseil

2 absl-flat_hash_map.cc Unescape Escape View File

2 absl-flat_hash_map.hh Unescape Escape View File

2 alternator/CMakeLists.txt Unescape Escape View File

253 alternator/attribute_path.hh Normal file Unescape Escape View File

2 alternator/auth.cc Unescape Escape View File

2 alternator/auth.hh Unescape Escape View File

2 alternator/conditions.cc Unescape Escape View File

2 alternator/conditions.hh Unescape Escape View File

2 alternator/consumed_capacity.cc Unescape Escape View File

2 alternator/consumed_capacity.hh Unescape Escape View File

11 alternator/controller.cc Unescape Escape View File

11 alternator/controller.hh Unescape Escape View File

2 alternator/error.hh Unescape Escape View File

3058 alternator/executor.cc View File

238 alternator/executor.hh Unescape Escape View File

1957 alternator/executor_read.cc Normal file View File

559 alternator/executor_util.cc Normal file Unescape Escape View File

247 alternator/executor_util.hh Normal file Unescape Escape View File

8 alternator/expressions.cc Unescape Escape View File

2 alternator/expressions.g Unescape Escape View File

2 alternator/expressions.hh Unescape Escape View File

2 alternator/expressions_types.hh Unescape Escape View File

2 alternator/extract_from_attrs.hh Unescape Escape View File

6 alternator/http_compression.cc Unescape Escape View File

4 alternator/http_compression.hh Unescape Escape View File

2 alternator/parsed_expression_cache.cc Unescape Escape View File

2 alternator/rmw_operation.hh Unescape Escape View File

4 alternator/serialization.cc Unescape Escape View File

2 alternator/serialization.hh Unescape Escape View File

136 alternator/server.cc Unescape Escape View File

4 alternator/server.hh Unescape Escape View File

2 alternator/stats.cc Unescape Escape View File

2 alternator/stats.hh Unescape Escape View File

745 alternator/streams.cc View File

62 alternator/streams.hh Normal file Unescape Escape View File

15 alternator/ttl.cc Unescape Escape View File

2 alternator/ttl.hh Unescape Escape View File

2 alternator/ttl_tag.hh Unescape Escape View File

122 api/api-doc/storage_service.json Unescape Escape View File

2 api/api.cc Unescape Escape View File

2 api/api.hh Unescape Escape View File

2 api/api_init.hh Unescape Escape View File

2 api/authorization_cache.cc Unescape Escape View File

2 api/authorization_cache.hh Unescape Escape View File

2 api/cache_service.cc Unescape Escape View File

2 api/cache_service.hh Unescape Escape View File

719 Commits

debug_form ... master

8

.github/CODEOWNERS vendored

View File

33

.github/copilot-instructions.md vendored

View File

14

.github/instructions/cpp.instructions.md vendored

View File

4

.github/instructions/python.instructions.md vendored

View File

2

.github/scripts/check-license.py vendored

View File

5

.github/workflows/add-label-when-promoted.yaml vendored

View File

5

.github/workflows/backport-pr-fixes-validation.yaml vendored

View File

5

.github/workflows/build-scylla.yaml vendored

View File

5

.github/workflows/call_validate_pr_author_email.yml vendored

View File

7

.github/workflows/check-license-header.yaml vendored

View File

3

.github/workflows/clang-nightly.yaml vendored

View File

3

.github/workflows/clang-tidy.yaml vendored

View File

5

.github/workflows/close_issue_for_scylla_associate.yml vendored

View File

6

.github/workflows/codespell.yaml vendored

View File

38

.github/workflows/compare-build-systems.yaml vendored Normal file

View File

5

.github/workflows/conflict_reminder.yaml vendored

View File

7

.github/workflows/differential-shellcheck.yaml vendored

View File

7

.github/workflows/docs-pages.yaml vendored

View File

7

.github/workflows/docs-pr.yaml vendored

View File

7

.github/workflows/docs-validate-metrics.yml vendored

View File

5

.github/workflows/iwyu.yaml vendored

View File

3

.github/workflows/make-pr-ready-for-review.yaml vendored

View File

4

.github/workflows/pr-require-backport-label.yaml vendored

View File

5

.github/workflows/read-toolchain.yaml vendored

View File

5

.github/workflows/seastar.yaml vendored

View File

5

.github/workflows/sync-labels.yaml vendored

View File

5

.github/workflows/trigger_ci.yaml vendored

View File

5

.github/workflows/urgent_issue_reminder.yml vendored

View File

16

AGENTS.md Normal file

View File

82

CMakeLists.txt

View File

46

LICENSE-ScyllaDB-Source-Available.md

View File

2

abseil

2

absl-flat_hash_map.cc

View File

2

absl-flat_hash_map.hh

View File

2

alternator/CMakeLists.txt

View File

253

alternator/attribute_path.hh Normal file

View File

2

alternator/auth.cc

View File

2

alternator/auth.hh

View File

2

alternator/conditions.cc

View File

2

alternator/conditions.hh

View File

2

alternator/consumed_capacity.cc

View File

2

alternator/consumed_capacity.hh

View File

11

alternator/controller.cc

View File

11

alternator/controller.hh

View File

2

alternator/error.hh

View File

3058

alternator/executor.cc

View File

238

alternator/executor.hh

View File

1957

alternator/executor_read.cc Normal file

View File

559

alternator/executor_util.cc Normal file

View File

247

alternator/executor_util.hh Normal file

View File

8

alternator/expressions.cc

View File

2

alternator/expressions.g

View File

2

alternator/expressions.hh

View File

2

alternator/expressions_types.hh

View File

2

alternator/extract_from_attrs.hh

View File

6

alternator/http_compression.cc

View File

4

alternator/http_compression.hh

View File

2

alternator/parsed_expression_cache.cc

View File

2

alternator/rmw_operation.hh

View File

4

alternator/serialization.cc

View File

2

alternator/serialization.hh

View File

136

alternator/server.cc

View File

4

alternator/server.hh

View File

2

alternator/stats.cc

View File

2

alternator/stats.hh

View File

745

alternator/streams.cc

View File

62

alternator/streams.hh Normal file

View File

15

alternator/ttl.cc

View File

2

alternator/ttl.hh

View File

2

alternator/ttl_tag.hh

View File

122

api/api-doc/storage_service.json

View File

2

api/api.cc

View File

2

api/api.hh

View File

2

api/api_init.hh

View File

2

api/authorization_cache.cc

View File

2

api/authorization_cache.hh

View File

2

api/cache_service.cc

View File

2

api/cache_service.hh

View File

2

api/client_routes.cc

View File