scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-07 23:43:31 +00:00

Author	SHA1	Message	Date
Aleksandra Martyniuk	7cdf7d62a2	gms: add keyspace_multi_rf_change feature	2026-04-17 09:58:05 +02:00
Avi Kivity	04b54f363b	Merge 'Enable vnodes-to-tablets migrations with arbitrary tokens' from Nikos Dragazis This PR removes the power-of-two token constraint from vnodes-to-tablets migrations, allowing clusters with randomly generated tokens to migrate without manual token reassignment. Previously, migrations required vnode tokens to be a power of two and aligned. In practice, these conditions are not met with Scylla's default random token assignment, so the constraint is a blocker for real-world use. With the introduction of arbitrary tablet boundaries in PR #28459, the tablet layer can now support arbitrary tablet boundaries. This PR builds on that capability to allow arbitrary vnode tokens during migration. When the highest vnode token does not coincide with the end of the token ring, the vnode wraps around, but tablets do not support that. This is handled by splitting it into two tablets: one covering the tail end of the ring and one covering the beginning. Testing has been updated accordingly: existing cluster tests now use randomly generated tokens instead of precomputed power-of-two values, and a new Boost test validates the wrap-around tablet boundary logic. Fixes SCYLLADB-724. New feature, no backport is needed. Closes scylladb/scylladb#29319 * github.com:scylladb/scylladb: test: Use arbitrary tokens in vnodes->tablets migration tests test: boost: Add test for wrap-around vnodes storage_service: Support vnodes->tablets migrations w/ arbitrary tokens storage_service: Hoist migration precondition	2026-04-17 00:46:35 +03:00
Avi Kivity	999e108139	Merge 'test: lib: fix broken retry in start_docker_service' from Dario Mirovic The retry loop in `start_docker_service` passes the parse callbacks via `std::move` into `create_handler` on each iteration. After the first iteration, the moved-from `std::function` objects are empty. All subsequent retries skip output parsing entirely and immediately treat the service as successfully started. This defeats the entire purpose of the retry mechanism. Fix by passing the callbacks by copy instead of move, so the original callbacks remain valid across retries. Fixes SCYLLADB-1542 This is a CI stability issue and should be backported. Closes scylladb/scylladb#29504 * github.com:scylladb/scylladb: test/lib: fix typos in proc_utils, gcs_fixture, and dockerized_service test: gcs_fixture: rename container from "local-kms" to "fake-gcs-server" test: fix proc_utils.cc formatting from previous commit test: lib: use unique container name per retry attempt test: lib: fix broken retry in start_docker_service	2026-04-16 21:48:25 +03:00
Radosław Cybulski	c5ed6b22ae	alternator: add CHILD_SHARDS filtering Add a `CHILD_SHARDS` filter to `DescribeStream` command. When used, user need to pass a parent stream shard id as json's ShardFilter.ShardId field. DescribeStream will then return only list of stream shards, that are direct descendants of passed parent stream shard. Each stream shard cover a consecutive part of token space. A stream shard Q is considered to be a child of stream shard W, when at least one token belongs to token spaces from both streams. The filtering algorithm itself is somewhat complicated - more details in comments in streams.cc. CHILD_SHARDS is a Amazon's functionality and is required by KCL. Add unit tests. Fixes: #25160 Closes scylladb/scylladb#28189	2026-04-16 18:27:55 +03:00
Andrei Chekun	ba04e1e2c3	codeowners: add owner for the test framework Add @xtrey as a codeowner of the test framework Closes scylladb/scylladb#29518	2026-04-16 17:57:21 +03:00
Piotr Szymaniak	d0c3f78d76	test/alternator: extend local TTL streams timeout Increase the non-AWS wait in the TTL streams test to reduce vnode CI flakes caused by delayed expiration visibility. Fixes SCYLLADB-1556 Closes scylladb/scylladb#29516	2026-04-16 15:53:35 +03:00
copilot-swe-agent[bot]	ec7450bff8	topology_coordinator, tablets: Log active tablet transitions when going idle This will make debugging of stalled tablet transitions easier. We saw several issues when topology state machine was blocked by active tablet migrations, which was not obvious at first glance of the logs. Now it will be east to tell if tablet transitions are blocking progress and which transitions are stuck. Closes scylladb/scylladb#28616	2026-04-16 14:34:37 +03:00
Benny Halevy	05a00fe140	compaction_manager: fix use-after-free in postponed_compactions_reevaluation() drain() signals the postponed_reevaluation condition variable to terminate the postponed_compactions_reevaluation() coroutine but does not await its completion. When enable() is called afterwards, it overwrites _waiting_reevalution with a new coroutine, orphaning the old one. During shutdown, really_do_stop() only awaits the latest coroutine via _waiting_reevalution, leaving the orphaned coroutine still alive. After sharded::stop() destroys the compaction_manager, the orphaned coroutine resumes and reads freed memory (is_disabled() accesses _state). Fix by introducing stop_postponed_compactions(), awaiting the reevaluation coroutine in both drain() and stop() after signaling it, if postponed_compactions_reevaluation() is running. It uses an std::optional<future<>> for _waiting_reevalution and std::exchange to leave _waiting_reevalution disengaged when postponed_compactions_reevaluation() is not running. This prevents a race between drain() and stop(). While at it, fix typo in _waiting_reevalution -> _waiting_reevaluation. Fixes: SCYLLADB-1463 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#29443	2026-04-16 14:33:31 +03:00
Emil Maskovsky	91df3795fc	encryption: cover system.raft table in system_info_encryption Extend system_info_encryption to encrypt system.raft SSTables. system.raft contains the Raft log, which may hold sensitive user data (e.g. batched mutations), so it warrants the same treatment as system.batchlog and system.paxos. During upgrade, existing unencrypted system.raft SSTables remain readable. Existing data is rewritten encrypted via compaction, or immediately via nodetool upgradesstables -a. Update the operator-facing system_info_encryption description to mention system.raft and add a focused test that verifies the schema extension is present on system.raft. Fixes: CUSTOMER-268 Backport: 2026.1 - closes an encryption-at-rest coverage gap: system.raft may persist sensitive user-originated data unencrypted; backport to the current LTS. Closes scylladb/scylladb#29242	2026-04-16 13:22:10 +02:00
Botond Dénes	d006c4c476	Merge 'Untie (partially) cql3/statements from db::config' from Pavel Emelyanov There's a bunch of db::config options that are used by cql3/statements/ code. For that they use data_dictionary/database as a proxy to get db::config reference. This PR moves most of these accessed options onto cql_config Options migrated to cql_config: 1. select_internal_page_size 2. strict_allow_filtering 3. enable_parallelized_aggregation 4. batch_size_warn_threshold_in_kb 5. batch_size_fail_threshold_in_kb 6. 7 keyspace replication restriction options 7. 2 TWCS restriction options 8. restrict_future_timestamp 9. strict_is_not_null_in_views (with view_restrictions struct) 10. enable_create_table_with_compact_storage Some options need special treatment and are still abused via database, namely: 1. enable_logstor 2. cluster_name 3. partitioner 4. endpoint_snitch Fixing components inter-dependencies, not backporting Closes scylladb/scylladb#29424 * github.com:scylladb/scylladb: cql3: Move enable_create_table_with_compact_storage to cql_config cql3: Move strict_is_not_null_in_views to cql_config cql3: Move restrict_future_timestamp to cql_config cql3: Move TWCS restriction options to cql_config cql3: Move keyspace restriction options to cql_config cql3: Move batch_size_fail_threshold_in_kb to cql_config cql3: Move batch_size_warn_threshold_in_kb to cql_config cql3: Move enable_parallelized_aggregation to cql_config cql3: Move strict_allow_filtering to cql_config cql3: Move select_internal_page_size to cql_config test: Fix cql_test_env to use updateable cql_config from db::config cql3: Add cql_config parameter to parsed_statement::prepare()	2026-04-16 14:04:43 +03:00
Botond Dénes	88a8324e68	erge 'db: store large data records in SSTable metadata and serve via virtual tables' from Benny Halevy `system.large_partitions`, `system.large_rows`, and `system.large_cells` store records keyed by SSTable name. When SSTables are migrated between shards or nodes (resharding, streaming, decommission), the records are lost because the destination never writes entries for the migrated SSTables. This patch series moves the source of truth for large data records into the SSTable's scylla metadata component (new `LargeDataRecords` tag 13) and reimplements the three `system.large_` tables as virtual tables that query live SSTables on demand. A cluster feature flag (`LARGE_DATA_VIRTUAL_TABLES`) gates the transition for safe rolling upgrades. When the cluster feature is enabled, each node drops the old system large_ tables and starts serving the corresponding tables using virtual tables that represent the large data records now stored on the sstables. Note that the virtual tables will be empty after upgrade until the sstables that contained large data are rewritten, therefore it is recommended to run upgrade sstables compaction or major compaction to repopulate the sstables scylla-metadata with large data records. 1. keys: move key_to_str() to keys/keys.hh — make the helper reusable across large_data_handler, virtual tables, and scylla-sstable 2. sstables: add LargeDataRecords metadata type (tag 13) — new struct with binary-serialized key fields, scylla-sstable JSON support, format documentation 3. large_data_handler: rename partition_above_threshold to above_threshold_result — generalize the struct for reuse 4. large_data_handler: return above_threshold_result from maybe_record_large_cells — separate booleans for cell size vs collection elements thresholds 5. sstables: populate LargeDataRecords from writer — bounded min-heaps (one per large_data_type), configurable top-N via `compaction_large_data_records_per_sstable` 6. test: add LargeDataRecords round-trip unit tests — verify write/read, top-N bounding, below-threshold behavior 7. db: call initialize_virtual_tables from shard 0 only — preparatory refactoring to enable cross-shard coordination 8. db: implement large_data virtual tables with feature flag gating — three virtual table classes, feature flag activation, legacy SSTable fallback, dual-threshold dedup, cross-shard collection Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1276 * Although this fixes a bug where large data entries are effectively lost when sstables are renamed or migrated, the changes are intrusive and do not warrant a backport Closes scylladb/scylladb#29257 * github.com:scylladb/scylladb: db: implement large_data virtual tables with feature flag gating db: call initialize_virtual_tables from shard 0 only test: add LargeDataRecords round-trip unit tests sstables: populate LargeDataRecords from writer large_data_handler: return above_threshold_result from maybe_record_large_cells large_data_handler: rename partition_above_threshold to above_threshold_result sstables: add LargeDataRecords metadata type (tag 13) sstables: add fmt::formatter for large_data_type keys: move key_to_str() to keys/keys.hh	2026-04-16 14:03:31 +03:00
Roy Dahan	d2d7604188	ci: pin GitHub Actions to commit SHAs and migrate to Node.js 24 Pin all external GitHub Actions to full commit SHAs and upgrade to their latest major versions to reduce supply chain attack surface: - actions/checkout: v3/v4/v5 -> v6.0.2 - actions/github-script: v7 -> v8.0.0 - actions/setup-python: v5 -> v6.2.0 - actions/upload-artifact: v4 -> v7.0.0 - astral-sh/setup-uv: v6 -> v8.0.0 - mheap/github-action-required-labels: v5.5.2 (pinned) - redhat-plumbers-in-action/differential-shellcheck: v5.5.6 (pinned) - codespell-project/actions-codespell: v2.2 (pinned, was @master) Set FORCE_JAVASCRIPT_ACTIONS_TO_NODE24=true in all 21 workflows that use JavaScript-based actions to opt into the Node.js 24 runtime now. This resolves the deprecation warning: "Node.js 20 actions are deprecated. Please check if updated versions of these actions are available that support Node.js 24. Actions will be forced to run with Node.js 24 by default starting June 2nd, 2026. Node.js 20 will be removed from the runner on September 16th, 2026." See: https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/ scylladb/github-automation references are intentionally left at @main as they are org-internal reusable workflows. Fixes: SCYLLADB-1410 Backport: Backport is required for live branches that run GH actions: 2026.1, 2025.4, 2025.1 and 2024.1 Closes scylladb/scylladb#29421	2026-04-16 13:03:33 +03:00
Pavel Emelyanov	207d3b4a68	test_backup: Remove create_schema() helper Test Remove the create_schema() helper function and inline its logic directly into the four call sites. This simplifies the code by eliminating a trivial wrapper. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#29406	2026-04-16 12:57:26 +03:00
Botond Dénes	830d28a889	Merge 'Use standard helpers to create ks:cf and populate it in test_backup.py' from Pavel Emelyanov The PR removed the create_and_ks() helper from backup test and patches all callers to create keyspace, table and populate them with standard explicit facilities. While patching it turned out that one test doesn't need to populate the table, so it even becomes tiny bit shorter and faster Enhancing test, not backporting Closes scylladb/scylladb#29417 * github.com:scylladb/scylladb: test_backup: Remove create_ks_and_cf helper Test test_backup: Replace create_ks_and_cf with async patterns Test test_backup: Add if-True blocks for indentation Test	2026-04-16 12:54:21 +03:00
Nikos Dragazis	7abcf94823	test: Use arbitrary tokens in vnodes->tablets migration tests The migration tests used to start nodes with pre-computed power-of-two tokens. This was required because the migration itself only supported power-of-two aligned tokens. Now that arbitrary tokens are supported, switch the tests to use Scylla's default random token assignment. Switching to arbitrary tokens makes the tests non-deterministic, but the migration aspects that are affected by the token distribution (resharding, wrap-around vnode split) are out of scope for these tests and covered by dedicated tests. Add a `get_all_vnode_tokens()` helper that queries system.topology at runtime to discover the actual token layout, and derive expected tablet counts from that. Also account for the possible extra wrap-around tablet when the last vnode token does not coincide with MAX_TOKEN. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:47:27 +03:00
Nikos Dragazis	26f0c038af	test: boost: Add test for wrap-around vnodes Add a Boost test to verify that `prepare_for_tablets_migration()` produces the correct tablet boundaries when a wrap-around vnode exists. Tablets cannot wrap around the token ring as vnodes do; the last token of the last tablet must always be MAX_TOKEN. When the last vnode token does not coincide with MAX_TOKEN, the wrap-around vnode must be split into two tablets. The test is parameterized over both cases: unaligned (split expected) and aligned (no split expected). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:47:16 +03:00
Botond Dénes	c355df4461	Merge 'test: Lower default log level from DEBUG to INFO' from Artsiom Mishuta 1. test.py — Removed --log-level=DEBUG flag from pytest args 2. test/pytest.ini — Changed log_level to INFO (that was set DEBUG in test.py), changed log_file_level from DEBUG to INFO, added clarifying comments +minor fix [test/pylib: save logs on success only during teardown phase](`0ede308a04`) Previously, when --save-log-on-success was enabled, logs were saved for every test phase (setup, call, teardown)in 3 files. Restrict it to only the teardown phase, that contains all 3 in case of test success, to avoid redundant log entries. Closes scylladb/scylladb#29086 * github.com:scylladb/scylladb: test/pylib: save logs on success only during teardown phase test: Lower default log level from DEBUG to INFO	2026-04-16 12:46:11 +03:00
Nikos Dragazis	098732ff76	storage_service: Support vnodes->tablets migrations w/ arbitrary tokens The vnodes-to-tablets migration creates tablet maps that mirror the vnode layout: one tablet per vnode, preserving token boundaries and replica placement. However, due to tablet restrictions, the migration requires vnode tokens to be a power of two and uniformly distributed across the token ring. In practice, this restriction is too limiting. Real clusters use randomly generated tokens and a node's token assignment is immutable. To solve this problem, prior work (`01fb97ee78`) has been done to relax the tablet constraints by allowing arbitrary tablet boundaries, removing the requirement for power-of-two sizing and uniform distribution. This patch leverages the relaxed tablet constraints to enable tablet map creation from arbitrary vnode tokens: * Removes all token-related constraints. * Handles wrap-around vnodes. If a vnode wraps (i.e., the highest vnode token is not `dht::token::last()`), it is split into two tablets: - (last_vnode_token, dht::token::last()] - [dht::token::first(), first_vnode_token] The migration ops guide has been updated to remove the power-of-two constraint. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:39:23 +03:00
Nikos Dragazis	8ea8c05120	storage_service: Hoist migration precondition `prepare_for_tablets_migration()` is idempotent; it filters out tables that already have tablet maps and returns early if no tablet maps need to be created. However, this precondition is currently misplaced. Move it higher to skip extra work. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-04-16 12:19:34 +03:00
Botond Dénes	9bfcc25cf7	Merge 'streaming: stream_blob: hold table for streaming' from Michael Litvak When initializing streaming sources in tablet_stream_files_handler we use a reference to the table. We should hold the table while doing so, because otherwise the table may be dropped and destroyed when we yield. Use the table.stream_in_progress() phaser to hold the table while we access it. For sstable file streaming we can release the table after the snapshot is initialized, and the table may be dropped safely because the files are held by the snapshot and we don't access the table anymore. There was a single access to the table for logging but it is replaced by a pre-calculated variable. For logstor segment streaming, currently it doesn't support discarding the segments while they are streamed - when the table is dropped it discard the segments by overwriting and freeing them, so they shouldn't be accessed after that. Therefore, in that case continue to hold the table until streaming is completed. Fixes [SCYLLADB-1533](https://scylladb.atlassian.net/browse/SCYLLADB-1533) It's a pre-existing use-after-free issue in sstable file streaming so should be backported to all releases. It's also made worse with the recent changes of logstor, and affects also non-logstor tables, so the logstor fixes should be in the same release (2026.2). [SCYLLADB-1533]: https://scylladb.atlassian.net/browse/SCYLLADB-1533?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#29488 * github.com:scylladb/scylladb: test: test drop table during streaming streaming: stream_blob: hold table for streaming	2026-04-16 12:12:42 +03:00
Dario Mirovic	50e498ac0d	test/lib: fix typos in proc_utils, gcs_fixture, and dockerized_service Fix assorted typos in comments, strings, and identifiers: - path_preprend -> path_prepend (proc_utils.hh, proc_utils.cc) - laúnch -> launch (proc_utils.cc) - hand/fail -> hang/fail (dockerized_service.py) - inconvinient -> inconvenient (dockerized_service.py) - priviledges -> privileges (gcs_fixture.hh) - remove double semicolon (gcs_fixture.cc) Refs SCYLLADB-1542	2026-04-16 10:58:55 +02:00
Dario Mirovic	11b5997eaf	test: gcs_fixture: rename container from "local-kms" to "fake-gcs-server" The GCS fixture's fake-gcs-server container was named "local-kms", copy-pasted from the AWS KMS fixture. It happened when both were refactored to use the shared start_docker_service helper (`bc544eb08e`). Rename to "fake-gcs-server" to match the Python-side naming and avoid confusion in logs. Refs SCYLLADB-1542	2026-04-16 10:58:52 +02:00
Dario Mirovic	dc7f848bf8	test: fix proc_utils.cc formatting from previous commit Fix indentation of lines moved inside the for-loop in start_docker_service (lines 208-225). Refs SCYLLADB-1542	2026-04-16 10:55:48 +02:00
Dario Mirovic	be4d32c474	test: lib: use unique container name per retry attempt The container name is generated once before the retry loop, so all retry attempts reuse the same name. Move the name generation inside the loop so each attempt gets a fresh name via the incrementing counter, consistent with the comment "publish port ephemeral, allows parallel instances". Formatting changes (indentation) of lines 208-225 in test/lib/proc_utils.cc will be fixed in the next commit. Refs SCYLLADB-1542	2026-04-16 10:55:04 +02:00
Botond Dénes	33682fd14e	Merge 'sstables/storage_manager: fix race between object storage config update and keyspace creation' from Dimitrios Symonidis Previously, config_updater used a serialized_action to trigger update_config() when object_storage_endpoints changed. Because serialized_action::trigger() always schedules the action as a new reactor task (via semaphore::wait().then()), there was a window between the config value becoming visible to the REST API and update_config() actually running. This allowed a concurrent CREATE KEYSPACE to see the new endpoint via is_known_endpoint() before storage_manager had registered it in _object_storage_endpoints. Now config observers run synchronously in a reactor turn and must not suspend. Split the previous monolithic async update_config() coroutine into two phases: - Sync (in the observer, never suspends): storage_manager::_object_storage_endpoints is updated in place; for already-instantiated clients, update_config_sync swaps the new config atomically - Async (per-client gate): background fibers finish the work that can't run in the observer — S3 refreshes credentials under _creds_sem; GCS drains and closes the replaced client. Config reloads triggered by SIGHUP are applied on shard 0 and then broadcast to all other shards. An rwlock has been also introduced to make sure that the configuration has been propagated to all cores. This guarantees that a client requesting a config via the REST API will see a consistent snapshot Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-757 Fixes: [28141](https://github.com/scylladb/scylladb/issues/28141) Closes scylladb/scylladb#28950 * github.com:scylladb/scylladb: test/object_store: verify object storage client creation and live reconfiguration sstables/utils/s3: split config update into sync and async parts test_config: improve logging for wait_for_config API db: introduce read-write lock to synchronize config updates with REST API	2026-04-16 10:20:43 +03:00
Michael Litvak	43c76aaf2b	logstor: split log record to header and data Split the `log_record` to `log_record_header` type that has the record metadata fields and the mutation as a separate field which is the actual record data: struct log_record { log_record_header header; canonical_mutation mut; }; Both the header and mutation have variable serialized size. When a record is serialized in a write_buffer, we first put a small `record_header` that has the header size and data size, then the serialized header and data follow. The `log_location` of a record points to the beginning of the `record_header`, and the size includes the `record_header`. This allows us to read a record header without reading the data when it's not needed and avoid deserializing it: * on recovery, when scanning all segments, we read only the record headers. * on compaction, we read the record header first to determine if the record is alive, if yes then we read the data. Closes scylladb/scylladb#29457	2026-04-16 10:00:35 +03:00
Botond Dénes	8e7ba7efe2	Merge 'commitlog: fix segment replay order by using ordered map per shard' from Sergey Zolotukhin The commitlog replayer groups segments by shard using a std::unordered_multimap, then iterates per-shard segments via equal_range(). However, equal_range() does not guarantee iteration order for elements with the same key, so segments could be replayed out of order within a shard. Correct segment ordering is required for: - Fragmented entry reconstruction, which accumulates fragments across segments and depends on ascending order for efficient processing. - Commitlog-based storage used by the strongly consistent tables feature, which relies on replayed raft items being stored in order. Fix by changing the data structure from std::unordered_multimap<unsigned, commitlog::descriptor> to std::unordered_map<unsigned, utils::chunked_vector<commitlog::descriptor>> Since the descriptors are inserted from a std::set ordered by ID, the vector preserves insertion (and thus ID) order. The per-shard iteration now simply iterates the vector, guaranteeing correct replay order. Fixes: SCYLLADB-1411 Backport: It looks like this issue doesn't cause any trouble, and is required only by the strong consistent tables, so no backporting required. Closes scylladb/scylladb#29372 * github.com:scylladb/scylladb: commitlog: add test to verify segment replay order commitlog: fix replay order by using ordered map per shard	2026-04-16 09:55:27 +03:00
Pavel Emelyanov	335261f351	cql3: Move enable_create_table_with_compact_storage to cql_config Move enable_create_table_with_compact_storage option from db::config to cql_config. This improves separation of concerns by consolidating CQL-specific table creation policies in the cql_config structure. Update the CREATE TABLE statement prepare() function to use the new location for the configuration check. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:52:20 +03:00
Pavel Emelyanov	f20ede79f9	cql3: Move strict_is_not_null_in_views to cql_config Move strict_is_not_null_in_views option from db::config to cql_config via new view_restrictions sub-struct. This improves separation of concerns by keeping view-specific validation policies with other CQL configuration. Update prepare_view() to take view_restrictions reference instead of reaching into db::config, and update all callsites to pass the sub-struct. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:52:19 +03:00
Pavel Emelyanov	027c91f45e	cql3: Move restrict_future_timestamp to cql_config Move restrict_future_timestamp option from db::config to cql_config. This improves separation of concerns as timestamp validation is part of CQL query execution behavior. Update validate_timestamp() function signature to take cql_config reference instead of db::config, and update all callsites in modification_statement and batch_statement to pass cql_config. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:51:53 +03:00
Pavel Emelyanov	7264581881	cql3: Move TWCS restriction options to cql_config Move twcs_max_window_count and restrict_twcs_without_default_ttl options from db::config to cql_config via new twcs_restrictions sub-struct. This improves separation of concerns by keeping TWCS-specific validation policies with other CQL configuration. Update check_restricted_table_properties() to remove unused db parameter and take twcs_restrictions reference instead. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:51:52 +03:00
Pavel Emelyanov	8b853505cd	cql3: Move keyspace restriction options to cql_config Introduce replication_restrictions, a sub-struct of cql_config, to hold the seven keyspace-level policy options that govern how CREATE/ALTER KEYSPACE statements are validated: - restrict_replication_simplestrategy - replication_strategy_warn_list / replication_strategy_fail_list - minimum/maximum_replication_factor_warn/fail_threshold Pass replication_restrictions into check_against_restricted_replication_strategies() instead of having it reach into db::config directly (via both qp.db().get_config() and qp.proxy().data_dictionary().get_config()). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:51:24 +03:00
Benny Halevy	ce00d61917	db: implement large_data virtual tables with feature flag gating Replace the physical system.large_partitions, system.large_rows, and system.large_cells CQL tables with virtual tables that read from LargeDataRecords stored in SSTable scylla metadata (tag 13). The transition is gated by a new LARGE_DATA_VIRTUAL_TABLES cluster feature flag: - Before the feature is enabled: the old physical tables remain in all_tables(), CQL writes are active, no virtual tables are registered. This ensures safe rollback during rolling upgrades. - After the feature is enabled: old physical tables are dropped from disk via legacy_drop_table_on_all_shards(), virtual tables are registered on all shards, and CQL writes are skipped via skip_cql_writes() in cql_table_large_data_handler. Key implementation details: - Three virtual table classes (large_partitions_virtual_table, large_rows_virtual_table, large_cells_virtual_table) extend streaming_virtual_table with cross-shard record collection. - generate_legacy_id() gains a version parameter; virtual tables use version 1 to get different UUIDs than the old physical tables. - compaction_time is derived from SSTable generation UUID at display time via UUID_gen::unix_timestamp(). - Legacy SSTables without LargeDataRecords emit synthetic summary rows based on above_threshold > 0 in LargeDataStats. - The activation logic uses two paths: when the feature is already enabled (test env, restart), it runs as a coroutine; when not yet enabled, it registers a when_enabled callback that runs inside seastar::async from feature_service::enable(). - sstable_3_x_test updated to use a simplified large_data_test_handler and validate LargeDataRecords in SSTable metadata directly.	2026-04-16 08:49:02 +03:00
Benny Halevy	cb6004b625	db: call initialize_virtual_tables from shard 0 only Move the smp::invoke_on_all dispatch from the callers into initialize_virtual_tables() itself, so the function is called once from shard 0 and internally distributes the per-shard virtual table setup to all shards. This simplifies the callers and allows a single place to add cross-shard coordination logic (e.g. feature-gated table registration) in future commits.	2026-04-16 08:49:02 +03:00
Benny Halevy	90d4ff34fb	test: add LargeDataRecords round-trip unit tests Add three new test cases to sstable_3_x_test.cc that verify the LargeDataRecords metadata written by the SSTable writer can be read back after open_data(): - test_large_data_records_round_trip: verifies partition_size, row_size, and cell_size records are written with correct field semantics when thresholds are exceeded - test_large_data_records_top_n_bounded: verifies the bounded min-heap keeps only the top-N largest entries per type - test_large_data_records_none_when_below_threshold: verifies no records are written when data is below all thresholds Also wire large_data_records_per_sstable from db_config into the test env's sstables_manager::config so that config changes propagate through the updateable_value chain to configure_writer().	2026-04-16 08:49:02 +03:00
Benny Halevy	1f7faeef57	sstables: populate LargeDataRecords from writer During compaction (SSTable writing), maintain bounded min-heaps (one per large_data_type) that collect the top-N above-threshold records. On stream end, drain all five heaps into a single LargeDataRecords array and write it into the SSTable's scylla metadata component. Five separate heaps are used: - partition_size, row_size, cell_size: ordered by value (size bytes) - rows_in_partition, elements_in_collection: ordered by elements_count A new config option 'compaction_large_data_records_per_sstable' (default 10) controls the maximum number of records kept per type.	2026-04-16 08:49:02 +03:00
Benny Halevy	8f4976f65d	large_data_handler: return above_threshold_result from maybe_record_large_cells Change maybe_record_large_cells to return above_threshold_result with separate booleans for cell size (.size) and collection elements (.elements) thresholds. This allows the writer to track above_threshold counts for cell_size and elements_in_collection independently.	2026-04-16 08:49:02 +03:00
Benny Halevy	c1b797f288	large_data_handler: rename partition_above_threshold to above_threshold_result Rename partition_above_threshold to above_threshold_result and its 'rows' field to 'elements', making it a generic struct that can be reused for other large data types (e.g., cells with collection elements). Use designated initializers for clarity.	2026-04-16 08:49:02 +03:00
Benny Halevy	d92cd42fe6	sstables: add LargeDataRecords metadata type (tag 13) Add a new scylla metadata component LargeDataRecords (tag 13) that stores per-SSTable top-N large data records. Each record carries: - large_data_type (partition_size, row_size, cell_size, etc.) - binary serialized partition key and clustering key - column name (for cell records) - value (size in bytes) - element count (rows or collection elements, type-dependent) - range tombstones and dead rows (partition records only) The struct uses disk_string<uint32_t> for key/name fields and is serialized via the existing describe_type framework into the SSTable Scylla metadata component. Add JSON support in scylla-sstable and format documentation.	2026-04-16 08:49:01 +03:00
Benny Halevy	85e2c6f2a7	sstables: add fmt::formatter for large_data_type Add a fmt::formatter specialization for sstables::large_data_type and use it in scylla-sstable.cc instead of the local to_string() overload, which is removed.	2026-04-16 08:42:54 +03:00
Benny Halevy	d4283d0ffc	keys: move key_to_str() to keys/keys.hh Move the key_to_str() template function from a file-local static in db/large_data_handler.cc to keys/keys.hh so it can be reused by: - large_data_handler.cc for log messages - virtual tables (db/virtual_tables.cc) for converting binary keys to human-readable CQL display - scylla-sstable for JSON output of LargeDataRecords No functional change.	2026-04-16 08:42:54 +03:00
Pavel Emelyanov	1af26a1dd6	cql3: Move batch_size_fail_threshold_in_kb to cql_config The batch_size_fail_threshold_in_kb option controls the batch size at which an oversized batch error is returned to the client. It belongs in cql_config rather than db::config as it directly governs CQL batch statement behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:27 +03:00
Pavel Emelyanov	4d255cf533	cql3: Move batch_size_warn_threshold_in_kb to cql_config The batch_size_warn_threshold_in_kb option controls the batch size at which a client warning is emitted during batch execution. It belongs in cql_config rather than db::config as it directly governs CQL batch statement behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:27 +03:00
Pavel Emelyanov	a3f097f100	cql3: Move enable_parallelized_aggregation to cql_config The enable_parallelized_aggregation option controls whether aggregation queries are fanned out across shards for parallel execution. It belongs in cql_config rather than db::config as it directly governs CQL query behavior at prepare time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:27 +03:00
Pavel Emelyanov	4314fc0642	cql3: Move strict_allow_filtering to cql_config The strict_allow_filtering option controls whether queries that require ALLOW FILTERING are silently accepted, warned about, or rejected. It belongs in cql_config rather than db::config as it directly governs CQL query behavior at prepare time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:26 +03:00
Pavel Emelyanov	3411ed8bcc	cql3: Move select_internal_page_size to cql_config The select_internal_page_size option controls CQL query execution behavior (internal paging for aggregate/filtered SELECTs) and belongs in cql_config rather than being read directly from db::config at execution time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:26 +03:00
Pavel Emelyanov	728eb20b42	test: Fix cql_test_env to use updateable cql_config from db::config The test environment was creating cql_config with hardcoded default values that were never updated when system.config was modified via CQL. This broke tests that dynamically change configuration values (e.g., TWCS tests). Fix by creating cql_config from db::config using sharded_parameter, which ensures updateable_value fields track the actual db::config sources and reflect changes made during test execution. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-04-16 07:57:26 +03:00
Pavel Emelyanov	60a834d9fa	cql3: Add cql_config parameter to parsed_statement::prepare() Pass cql_config to prepare() so that statement preparation can use CQL-specific configuration rather than reaching into db::config directly. Callers that use default_cql_config: - db/view/view.cc: builds a SELECT statement internally to compute view restrictions, not in response to a user query - cql3/statements/create_view_statement.cc: same -- parses the view's WHERE clause as a synthetic SELECT to extract restrictions - tools/schema_loader.cc: offline schema loading tool, no runtime config available - tools/scylla-sstable.cc: offline sstable inspection tool, no runtime config available Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:25 +03:00
Nadav Har'El	f0e9177130	Merge 'audit/alternator: Make Alternator requests audited' from Piotr Szymaniak Each Alternator API call results in the request being audited, provided the auditing is enabled. Both successful as well as the failed requests are audited, with few exceptions. The chosen audit types for the operations: - CreateTable - DDL - DescribeTable - QUERY - DeleteTable - DDL - UpdateTable - DDL - PutItem - DML - UpdateItem - DML - GetItem - QUERY - DeleteItem - DML - ListTables - QUERY - Scan - QUERY - DescribeEndpoints - QUERY - BatchWriteItem - DML - BatchGetItem - QUERY - Query - QUERY - TagResource - DDL - UntagResource - DDL - ListTagsOfResource - QUERY - UpdateTimeToLive - DDL - DescribeTimeToLive - QUERY - ListStreams - QUERY - DescribeStream - QUERY - GetShardIterator - QUERY - GetRecords - QUERY - DescribeContinuousBackups - QUERY FIXME: The tests are now covering the new functionality only partially. Fixes: scylladb/scylla-enterprise#3796 Fixes: SCYLLADB-467 No need to backport, new functionality. Closes scylladb/scylladb#27953 * github.com:scylladb/scylladb: audit/alternator: support audit_tables=alternator.<table> shorthand audit/alternator: Add negative audit tests audit/alternator: Add testing of auditing audit/alternator: Audit requests audit/alternator: Refactor in preparation for auditing Alternator	2026-04-15 22:17:57 +03:00
Nikos Dragazis	d38f44208a	test/cqlpy: Harden mutation_fragments tests against background flushes Several tests in test_select_from_mutation_fragments.py assume that all mutations end up in a single SSTable. This assumption can be violated by background memtable flushes triggered by commitlog disk pressure. Since the Scylla node is taken from a pool, it may carry unflushed data from prior tests that prevents closed segments from being recycled, thereby increasing the commitlog disk usage. A main source of such pressure is keyspace-level flushes from earlier tests in this module, which rotate commitlog segments without flushing system tables (e.g., `system.compaction_history`), leaving closed segments dirty. Additionally, prior tests in the same module may have left unflushed data on the shared test table (`test_table` fixture), keeping commitlog segments dirty on its behalf as well. When commitlog disk usage exceeds its threshold, the system flushes the test table to reclaim those segments, potentially splitting a running test's mutations across multiple SSTables. This was observed in CI, where test_paging failed because its data was split across two SSTables, resulting in more mutation fragments than the hardcoded expected count. This patch fixes the affected tests in two ways: 1. Where possible, tests are reworked to not assume a single SSTable: - test_paging - test_slicing_rows - test_many_partition_scan 2. Where rework is impractical, major compaction is added after writes and before validation to ensure that only one SSTable will exist: - test_smoke - test_count - test_metadata_and_value - test_slicing_range_tombstone_changes Fixes SCYLLADB-1375. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> Closes scylladb/scylladb#29389	2026-04-15 21:46:00 +03:00

1 2 3 4 5 ...

53287 Commits