scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-14 11:52:00 +00:00

Author	SHA1	Message	Date
Botond Dénes	d65c1523c2	sstables: migrate all malformed_sstable_exception throw sites to throw_malformed_sstable_exception() Replace all direct 'throw malformed_sstable_exception(...)' call sites with the new throw_malformed_sstable_exception() helper, which respects the --abort-on-malformed-sstable-error flag.	2026-05-11 11:58:14 +03:00
Botond Dénes	c3daa6379c	sstables: refactor parse_path() to return std::expected<> instead of throwing make_entry_descriptor() and the two overloads of parse_path() used to signal parse failures by throwing malformed_sstable_exception, which made parse_path() expensive to use as a probe (e.g. to classify directory entries). Change make_entry_descriptor() and both parse_path() overloads to return std::expected<T, sstring>, where the sstring carries the error message on failure, eliminating the exception overhead at probe call sites. Call sites that previously caught malformed_sstable_exception to treat the path as a non-SSTable file (utils/directories.cc, db/snapshot/backup_task.cc, tools/scylla-sstable.cc) now check the expected result directly. Call sites where a parse failure is a genuine error (sstable_directory.cc, sstables.cc, tools/schema_loader.cc, tools/scylla-sstable.cc) re-throw explicitly as malformed_sstable_exception using the error string, preserving the existing error propagation behaviour.	2026-05-11 11:58:14 +03:00
Avi Kivity	5a887362e3	Merge 'Remove legacy tables creation code' from Gleb Natapov Drop creation of `service_levels` and `cdc_generation_descriptions_v2` table creation code since they are no longer needed. Old clusters will still have it because they were created earlier. Also the series contains a small improvement around group0 creation. No backport needed since this removes functionality. Closes scylladb/scylladb#29482 * github.com:scylladb/scylladb: db/system_distributed_keyspace: remove system_distributed_everywhere since it is unused db/system_distributed_keyspace: drop CDC_TOPOLOGY_DESCRIPTION and CDC_GENERATIONS_V2 db/system_distributed_keyspace: remove unused code db/system_distributed_keyspace: drop old cdc_generation_descriptions_v2 table db/system_distributed_keyspace: drop old service_levels table fix indent after the previous patch group0: call setup_group0 only when needed	2026-05-10 14:46:21 +03:00
Raphael S. Carvalho	474e962e01	compaction: Restrict tombstone GC sstable set to repaired sstables for tombstone_gc=repair mode When tombstone_gc=repair, the repaired compaction view's sstable_set_for_tombstone_gc() previously returned all sstables across all three views (unrepaired, repairing, repaired). This is correct but unnecessarily expensive: the unrepaired and repairing sets are never the source of a GC-blocking shadow when tombstone_gc=repair, for base tables. The key ordering guarantee that makes this safe is: - topology_coordinator sends send_tablet_repair RPC and waits for it to complete. Inside that RPC, mark_sstable_as_repaired() runs on all replicas, moving D from repairing → repaired (repaired_at stamped on disk). - Only after the RPC returns does the coordinator commit repair_time + sstables_repaired_at to Raft. - gc_before = repair_time - propagation_delay only advances once that Raft commit applies. Therefore, when a tombstone T in the repaired set first becomes GC-eligible (its deletion_time < gc_before), any data D it shadows is already in the repaired set on every replica. This holds because: - The memtable is flushed before the repairing snapshot is taken (take_storage_snapshot calls sg->flush()), capturing all data present at repair time. - Hints and batchlog are flushed before the snapshot, ensuring remotely-hinted writes arrive before the snapshot boundary. - Legitimate unrepaired data has timestamps close to 'now', always newer than any GC-eligible tombstone (USING TIMESTAMP to write backdated data is user error / UB). Excluding the repairing and unrepaired sets from the GC shadow check cannot cause any tombstone to be wrongly collected. The memtable check is also skipped for the same reason: memtable data is either newer than the GC-eligible tombstone, or was flushed into the repairing/repaired set before gc_before advanced. Safety restriction — materialized views: The optimization IS applied to materialized view tables. Two possible paths could inject D_view into the MV's unrepaired set after MV repair: view hints and staging via the view-update-generator. Both are safe: (1) View hints: flush_hints() creates a sync point covering BOTH _hints_manager (base mutations) AND _hints_for_views_manager (view mutations). It waits until ALL pending view hints — including D_view entries queued in _hints_for_views_manager while the target MV replica was down — have been replayed to the target node before take_storage_snapshot() is called. D_view therefore lands in the MV's repairing sstable and is promoted to repaired. When a repaired compaction then checks for shadows it finds D_view in the repaired set, keeping T_mv non-purgeable. (2) View-update-generator staging path: Base table repair can write a missing D_base to a replica via a staging sstable. The view-update-generator processes the staging sstable ASYNCHRONOUSLY: it may fire arbitrarily later, even after MV repair has committed repair_time and T_mv has been GC'd from the repaired set. However, the staging processor calls stream_view_replica_updates() which performs a READ-BEFORE-WRITE via as_mutation_source_excluding_staging(): it reads the CURRENT base table state before building the view update. If T_base was written to the base table (as it always is before the base replica can be repaired and the MV tombstone can become GC-eligible), the view_update_builder sees T_base as the existing partition tombstone. D_base's row marker (ts_d < ts_t) is expired by T_base, so the view update is a no-op: D_view is never dispatched to the MV replica. No resurrection can occur regardless of how long staging is delayed. A potential sub-edge-case is T_base being purged BEFORE staging fires (leaving D_base as the sole survivor, so stream_view_replica_updates would dispatch D_view). This is blocked by an additional invariant: for tablet-based tables, the repair writer stamps repaired_at on staging sstables (repair_writer_impl::create_writer sets mark_as_repaired = true and perform_component_rewrite writes repaired_at = sstables_repaired_at + 1 on every staging sstable). After base repair commits sstables_repaired_at to Raft, the staging sstable satisfies is_repaired(sstables_repaired_at, staging_sst) and therefore appears in make_repaired_sstable_set(). Any subsequent base repair that advances sstables_repaired_at further still includes the staging sstable (its repaired_at ≤ new sstables_repaired_at). D_base in the staging sstable thus shadows T_base in every repaired compaction's shadow check, keeping T_base non-purgeable as long as D_base remains in staging. A base table hint also cannot bypass this. A base hint is replayed as a base mutation. The resulting view update is generated synchronously on the base replica and sent to the MV replica via _hints_for_views_manager (path 1 above), not via staging. USING TIMESTAMP with timestamps predating (gc_before + propagation_delay) is explicitly UB and excluded from the safety argument. For tombstone_gc modes other than repair (timeout, immediate, disabled) the invariant does not hold for base tables either, so the full storage-group set is returned. Implementation: - Add compaction_group::is_repaired_view(v): pointer comparison against _repaired_view. - Add compaction_group::make_repaired_sstable_set(): iterates _main_sstables and inserts only sstables classified as repaired (repair::is_repaired(sstables_repaired_at, sst)). - Add storage_group::make_repaired_sstable_set(): collects repaired sstables across all compaction groups in the storage group. - Add table::make_repaired_sstable_set_for_tombstone_gc(): collects repaired sstables from all compaction groups across all storage groups (needed for multi-tablet tables). - Add compaction_group_view::skip_memtable_for_tombstone_gc(): returns true iff the repaired-only optimization is active; used by get_max_purgeable_timestamp() in compaction.cc to bypass the memtable shadow check. - is_tombstone_gc_repaired_only() private helper gates both methods: requires is_repaired_view(this) && tombstone_gc_mode == repair. No is_view() exclusion. - Add error injection "view_update_generator_pause_before_processing" in process_staging_sstables() to support testing the staging-delay scenario. - New test test_tombstone_gc_mv_optimization_safe_via_hints: stops servers[2], writes D_base + T_base (view hints queued for servers[2]'s MV replica), restarts, runs MV tablet repair (flush_hints delivers D_view + T_mv before snapshot), triggers repaired compaction, and asserts the MV row is NOT visible — T_mv preserved because D_view landed in the repaired set via the hints-before-snapshot path. - New test test_tombstone_gc_mv_safe_staging_processor_delay: runs base repair before writing T_base so D_base is staged on servers[0] via row-sync; blocks the view-update-generator with an error injection; writes T_base + T_mv; runs MV repair (fast path, T_mv GC-eligible); triggers repaired compaction (T_mv purged — no D_view in repaired set); asserts no resurrection; releases injection; waits for staging to complete; asserts no resurrection after a second flush+compaction. Demonstrates that the read-before-write in stream_view_replica_updates() makes the optimization safe even when staging fires after T_mv has been GC'd. The expected gain is reduced bloom filter and memtable key-lookup I/O during repaired compactions: the unrepaired set is typically the largest (it holds all recent writes), yet for tombstone_gc=repair it never influences GC decisions. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-20 16:59:09 -03:00
Gleb Natapov	133768a1f0	db/system_distributed_keyspace: remove system_distributed_everywhere since it is unused	2026-04-20 12:52:25 +03:00
Botond Dénes	57f8be49e9	Merge 'Move ignore_component_digest_mismatch flag on sstables_manager' from Pavel Emelyanov The PR serves two purposes. First, it makes the flag usage be consistent across multiple ways to load sstables components. For example, the sstable::load_metadata() doesn't set it (like .load() does) thus potentially refusing to load "corrupted" components, as the flag assumes. Second, it removes the fanout of db.get_config().ignore_component_digest_mismatch() over the code. This thing is called pretty much everywhere to initialize the sstable_open_config, while the option in question is "scylla state" parameter, not "sstable opening" one. Code cleanup, not backporting Closes scylladb/scylladb#29513 * github.com:scylladb/scylladb: sstables: Remove ignore_component_digest_mismatch from sstable_open_config sstables: Move ignore_component_digest_mismatch initialization to constructor sstables: Add ignore_component_digest_mismatch to sstables_manager config	2026-04-17 12:54:17 +03:00
Botond Dénes	d006c4c476	Merge 'Untie (partially) cql3/statements from db::config' from Pavel Emelyanov There's a bunch of db::config options that are used by cql3/statements/ code. For that they use data_dictionary/database as a proxy to get db::config reference. This PR moves most of these accessed options onto cql_config Options migrated to cql_config: 1. select_internal_page_size 2. strict_allow_filtering 3. enable_parallelized_aggregation 4. batch_size_warn_threshold_in_kb 5. batch_size_fail_threshold_in_kb 6. 7 keyspace replication restriction options 7. 2 TWCS restriction options 8. restrict_future_timestamp 9. strict_is_not_null_in_views (with view_restrictions struct) 10. enable_create_table_with_compact_storage Some options need special treatment and are still abused via database, namely: 1. enable_logstor 2. cluster_name 3. partitioner 4. endpoint_snitch Fixing components inter-dependencies, not backporting Closes scylladb/scylladb#29424 * github.com:scylladb/scylladb: cql3: Move enable_create_table_with_compact_storage to cql_config cql3: Move strict_is_not_null_in_views to cql_config cql3: Move restrict_future_timestamp to cql_config cql3: Move TWCS restriction options to cql_config cql3: Move keyspace restriction options to cql_config cql3: Move batch_size_fail_threshold_in_kb to cql_config cql3: Move batch_size_warn_threshold_in_kb to cql_config cql3: Move enable_parallelized_aggregation to cql_config cql3: Move strict_allow_filtering to cql_config cql3: Move select_internal_page_size to cql_config test: Fix cql_test_env to use updateable cql_config from db::config cql3: Add cql_config parameter to parsed_statement::prepare()	2026-04-16 14:04:43 +03:00
Pavel Emelyanov	4d352c7cf5	sstables: Remove ignore_component_digest_mismatch from sstable_open_config The ignore_component_digest_mismatch flag is now initialized at sstable construction time from sstables_manager::config (which is populated from db::config at boot time). Remove the flag from sstable_open_config struct and all call sites that were setting it explicitly. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:49:14 +03:00
Pavel Emelyanov	8abfd9af00	sstables: Add ignore_component_digest_mismatch to sstables_manager config Copy the ignore_component_digest_mismatch flag from db::config to sstables_manager::config during database initialization. This makes the flag available early in the boot process, before SSTables are loaded, enabling later commits to move the flag initialization from load-time to construction-time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 13:48:49 +03:00
Pavel Emelyanov	f20ede79f9	cql3: Move strict_is_not_null_in_views to cql_config Move strict_is_not_null_in_views option from db::config to cql_config via new view_restrictions sub-struct. This improves separation of concerns by keeping view-specific validation policies with other CQL configuration. Update prepare_view() to take view_restrictions reference instead of reaching into db::config, and update all callsites to pass the sub-struct. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 08:52:19 +03:00
Benny Halevy	d92cd42fe6	sstables: add LargeDataRecords metadata type (tag 13) Add a new scylla metadata component LargeDataRecords (tag 13) that stores per-SSTable top-N large data records. Each record carries: - large_data_type (partition_size, row_size, cell_size, etc.) - binary serialized partition key and clustering key - column name (for cell records) - value (size in bytes) - element count (rows or collection elements, type-dependent) - range tombstones and dead rows (partition records only) The struct uses disk_string<uint32_t> for key/name fields and is serialized via the existing describe_type framework into the SSTable Scylla metadata component. Add JSON support in scylla-sstable and format documentation.	2026-04-16 08:49:01 +03:00
Benny Halevy	85e2c6f2a7	sstables: add fmt::formatter for large_data_type Add a fmt::formatter specialization for sstables::large_data_type and use it in scylla-sstable.cc instead of the local to_string() overload, which is removed.	2026-04-16 08:42:54 +03:00
Pavel Emelyanov	60a834d9fa	cql3: Add cql_config parameter to parsed_statement::prepare() Pass cql_config to prepare() so that statement preparation can use CQL-specific configuration rather than reaching into db::config directly. Callers that use default_cql_config: - db/view/view.cc: builds a SELECT statement internally to compute view restrictions, not in response to a user query - cql3/statements/create_view_statement.cc: same -- parses the view's WHERE clause as a synthetic SELECT to extract restrictions - tools/schema_loader.cc: offline schema loading tool, no runtime config available - tools/scylla-sstable.cc: offline sstable inspection tool, no runtime config available Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>	2026-04-16 07:57:25 +03:00
Avi Kivity	0ae22a09d4	LICENSE: Update to version 1.1 Updated terms of non-commercial use (must be a never-customer).	2026-04-12 19:46:33 +03:00
Israel Fruchter	79c736455e	cqlsh: update to v6.0.34-scylla Update cqlsh to version v6.0.34-scylla. Notable fix: - Fix vector type formatting error (scylladb/scylla-cqlsh#165) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Closes scylladb/scylladb#29401	2026-04-12 14:54:50 +03:00
Nikos Dragazis	8837dac2f9	scylla-nodetool: Add migrate-to-tablets subcommand The vnodes-to-tablets migration is a manual procedure, so orchestration must be done via nodetool. This patch adds the following new commands: * nodetool migrate-to-tablets start {ks} * nodetool migrate-to-tablets upgrade * nodetool migrate-to-tablets downgrade * nodetool migrate-to-tablets status {ks} * nodetool migrate-to-tablets finalize {ks} The commands are just wrappers over the REST API. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-25 19:11:29 +02:00
Piotr Dulikowski	d8b283e1fb	Merge 'Add CQL forwarding for strongly consistent tables' from Wojciech Mitros In this series we add support for forwarding strongly consistent CQL requests to suitable replicas, so that clients can issue reads/writes to any node and have the request executed on an appropriate tablet replica (and, for writes, on the Raft leader). We return the same CQL response as what the user would get while sending the request to the correct replica and we perform the same logging/stats updates on the request coordinator as if the coordinator was the appropriate replica. The core mechanism of forwarding a strongly consistent request is sending an RPC containing the user's cql request frame to the appropriate replica and returning back a ready, serialized `cql_transport::response`. We do this in the CQL server - it is most prepared for handling these types and forwarding a request containing a CQL frame allows us to reuse near-top-level methods for CQL request handling in the new RPC handler (such as the general `process`) For sending the RPC, the CQL server needs to obtain the information about who should it forward the request to. This requires knowledge about the tablet raft group members and leader. We obtain this information during the execution of a `cql3/strong_consistency` statement, and we return this information back to the CQL server using the generalized `bounce_to_shard` `response_message`, where we now store the information about either a shard, or a specific replica to which we should forward to. Similarly to `bounce_to_shard`, we need to handle this `result_message` in a loop - a replica may move during statement execution, or the Raft leader can change. We also use it for forwarding strongly consistent writes when we're not a member of the affected tablet raft group - in that case we need to forward the statement twice - once to any replica of the affected tablet, then that replica can find the leader and return this information to the coordinator, which allows the second request to be directed to the leader. This feature also allows passing through exception messages which happened on the target replica while executing the statement. For that, many methods of the `cql_transport::cql_server::connection` for creating error responses needed to be moved to `cql_transport::cql_server`. And for final exception handling on the coordinator, we added additional error info to the RPC response, so that the handling can be performed without having the `result_message::exception` or `exception_ptr` itself. Fixes [SCYLLADB-71](https://scylladb.atlassian.net/browse/SCYLLADB-71) [SCYLLADB-71]: https://scylladb.atlassian.net/browse/SCYLLADB-71?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27517 * github.com:scylladb/scylladb: test: add tests for CQL forwarding transport: enable CQL forwarding for strong consistency statements transport: add remote statement preparation for CQL forwarding transport: handle redirect responses in CQL forwarding transport: add exception handling for forwarded CQL requests transport: add basic CQL request forwarding idl: add a representation of client_state for forwarding cql_server: handle query, execute, batch in one case transport: inline process_on_shard in cql_server::process transport: extract process() to cql_server transport: add messaging_service to cql_server transport: add response reconstruction helpers for forwarding transport: generalize the bounce result message for bouncing to other nodes strong consistency: redirect requests to live replicas from the same rack transport: pass foreign_ptr into sleep_until_timeout_passes and move it to cql_server transport: extract the error handling from process_request_one transport: move error response helpers from connection to cql_server	2026-03-13 15:03:10 +01:00
Botond Dénes	fc8cebd671	Merge 'Verify components digests during component load and scrub in validate mode' from Taras Veretilnyk This PR adds integrity verification for SSTable component files during loading. When component digests are present in Scylla metadata, the loader now validates each component's CRC32 digest against the stored expected value, catching silent corruption of component files. Index, Rows and Partitions components digests are also validated duriung scrub in validate mode Added corruption tests that write an SSTable, flip a bit in a specific component file, then verify that reloading the SSTable detects the corruption and throws the expected exception. Depends on https://github.com/scylladb/scylladb/pull/28338 Backport is not required, this is new feature Fixes https://github.com/scylladb/scylladb/issues/20103 Closes scylladb/scylladb#28761 * github.com:scylladb/scylladb: test/cqlpy: test --ignore-component-digest-mismatch flag in scylla sstable upgrade docs: document --ignore-component-digest-mismatch flag for scylla sstable upgrade sstables: propagate ignore_component_digest_mismatch config to all load sites sstables: add option to ignore component digest mismatches sstable_compaction_test: Add scrub validate test for corrupted index sstables: add tests for component digest validation on corrupted SSTables sstables: validate index components digests during SSTable scrub in validate mode sstables: verify component digests on SSTable load sstables: add digest_file_random_access_reader for CRC32 digest computation	2026-03-13 09:55:55 +02:00
Avi Kivity	b228eb26e6	Merge 'dbuild: Use slirp4netns network in dbuild nested containers' from Calle Wilund Fixes #25084 Add slirp4netns and use for nested containers. This will allow nested container port aliasing, helping CI stability. Note: this contains and updated Dockerfile for dbuild image, but since chicken and eggs, right now will force install slirp4netns before anything in dbuild script. Updates the mock server handling to use ephemeral ports and query from container, ensuring we don't get port collisions. (boost as well as pytest). Includes a timeout up, and a tweak to our scylla_cluster handling, ensuring we don't deadlock when pipe size is less than requires for our sys notify messages. Closes scylladb/scylladb#28727 * github.com:scylladb/scylladb: gcs_fixture: Change to use docker helper aws_kms_fixture: Modify to use docker helper test/lib/proc_util: Add docker helper pytest: use ephemeral port publish for docker mock servers dbuild: Use container network in dbuild nested containers scylla_cluster: Read notify sock in background to prevent deadlock	2026-03-12 23:49:25 +02:00
Wojciech Mitros	e44820ba1f	transport: generalize the bounce result message for bouncing to other nodes In the following patches, we'll start allowing forwarding requests to strongly consistent tables so that they'll get executed on the suitable tablet Raft group members. For that we'll reuse the approach that we already have for bouncing requests to other shards - we'll try to execute a request locally, and the result of that will be a bounce message with another replica as the target. In this patch we generalize the former bounce_to_shard result message so that it will be able to specify the target of the bounce as another shard or specific replica. We also rename it to result_message::bounce so that it stops implying that only another shard may be its target. Aside from the host_id and the shard, the new message also includes the timeout, because in the service handling the forwarding we won't have the access to it, and it's needed for specifying how long we should wait for the forwarded requests. It also includes an information whether this is a write request to return correct timeout response in case the deadline is exceeded. We will return other hosts in the new bounce message when executing requests to strongly consistent tables when we can't handle the request because we aren't a suitable replica. We can't handle this message yet, so we don't return it anywhere and we still assume that every bounce message is a bounce to the same host.	2026-03-12 17:48:57 +01:00
Aleksandra Martyniuk	2e68f48068	nodetool: cluster repair: do not fail if a table was dropped nodetool cluster repair without additional params repairs all tablet keyspaces in a cluster. Currently, if a table is dropped while the command is running, all tables are repaired but the command finishes with a failure. Modify nodetool cluster repair. If a table wasn't specified (i.e. all tables are repaired), the command finishes successfully even if a table was dropped. If a table was specified and it does not exist (e.g. because it was dropped before the repair was requested), then the behavior remains unchanged. Fixes: SCYLLADB-568. Closes scylladb/scylladb#28739	2026-03-11 16:35:04 +02:00
Calle Wilund	e3e940bc47	dbuild: Use container network in dbuild nested containers Remove the host network setting, ensuring we use private networks (slirp4netns). This will allow nested container port aliasing, helping CI stability (can use ephemeral ports and container introspection). This also makes the nested podman setup non-conditional, since we only run podman containers inside dbuild, and need the setup regardless if host container is docker or not.	2026-03-11 12:05:51 +01:00
Taras Veretilnyk	c123f637ea	sstables: add option to ignore component digest mismatches Add `ignore_component_digest_mismatch` option to `sstable_open_config` that logs a warning instead of throwing `malformed_sstable_exception` on component digest mismatch. This is useful for recovering sstables with corrupted non-vital components or working around bugs in digest calculation. Expose the option in scylla-sstable via the `--ignore-component-digest-mismatch` flag for the upgrade operation.	2026-03-10 19:24:05 +01:00
Botond Dénes	81e214237f	Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk This pull request adds support for calculation and storing CRC32 digests for all SSTable components. This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream. Several test cases where introduced to verify expected behaviour. Additionally, this PR adds new rewrite component mechanism for safe sstable component rewriting. Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allowed crash recovery by simply removing the temporary file on startup. However, with component digests stored in scylla_metadata (#20100), replacing a component like Statistics requires atomically updating both the component and scylla_metadata with the new digest - impossible with POSIX rename. The new mechanism creates a clone sstable with a fresh generation: - Hard-links all components from the source except the component being rewritten and scylla_metadata - Copies original sstable components pointer and recognized components from the source - Invokes a modifier callback to adjust the new sstable before rewriting - Writes the modified component along with updated scylla_metadata containing the new digest - Seals the new sstable with a temporary TOC - Replaces the old sstable atomically, the same way as it is done in compaction This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair). In case of any failure durning the whole process, sstable will be automatically deleted on the node startup due to temporary toc persistence. Backport is not required, it is a new feature Fixes https://github.com/scylladb/scylladb/issues/20100, https://github.com/scylladb/scylladb/issues/27453 Closes scylladb/scylladb#28338 * github.com:scylladb/scylladb: docs: document components_digests subcomponent and trailing digest in Scylla.db sstable_compaction_test: Add tests for perform_component_rewrite sstable_test: add verification testcases of SSTable components digests persistance sstables: store digest of all sstable components in scylla metadata sstables: replace rewrite_statistics with new rewrite component mechanism sstables: add new rewrite component mechanism for safe sstable component rewriting compaction: add compaction_group_view method to specify sstable version sstables: add null_data_sink and serialized_checksum for checksum-only calculation sstables: extract default write open flags into a constant sstables: Add write_simple_with_digest for component checksumming sstables: Extract file writer closing logic into separate methods sstables: Implement CRC32 digest-only writer	2026-03-10 16:02:53 +02:00
Taras Veretilnyk	54af4a26ca	sstables: store digest of all sstable components in scylla metadata This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in scylla metadata component. This also extends new rewrite component mechanism, to rewrite metadata with updated digest together with the component.	2026-03-06 21:58:10 +01:00
Calle Wilund	ab3d3d8638	build: add slirp4netns to dependencies Needed for port forwarded podman-in-podman containers [avi: - move from Dockerfile to install-dependencies.sh so non-container builds also get it - regenerate frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-x86_64.tar.gz ] Closes scylladb/scylladb#28870	2026-03-05 17:44:17 +02:00
Botond Dénes	3c34598d88	treewide: move away from tombstone_gc_state(nullptr) ctor It is ambigous, use the appropriate no-gc or gc-all factories instead, as appropriate. A special note for mutation::compacted(): according to the comment above it, it doesn't drop expired tombstones but as it is currently, it actually does. Change the tombstone gc param for the underlying call to compact_for_compaction() to uphold the comment. This is used in tests mostly, so no fallout expected. Tests are handled in the next commit, to reduce noise. Two tests in mutation_test.cc have to be updated: * test_compactor_range_tombstone_spanning_many_pages has to be updated in this commit, as it uses mutation_partition::compact_for_query() as well as compact_for_query(). The test passes default constructed tombstone_gc() to the latter while the former now uses no-gc creating a mismatch in tombstone gc behaviour, resulting in test failure. Update the test to also pass no-gc to compact_for_query(). * test_query_digest similarly uses mutation_partition::query_mutation() and another compaction method, having to match the no-gc now used in query_mutation().	2026-03-03 14:09:28 +02:00
Botond Dénes	f3ee6a0bd1	compaction: use tombstone_gc_state with value semantics Instead of passing around references to it, pass around values. This object is now designed to be used as a value-type, after recent refactoring.	2026-03-03 14:09:27 +02:00
Botond Dénes	ab532882db	tools/scylla-sstable: introduce scylla sstable split Split input sstable(s) into multiple output sstables based on the provided token boundaries. The input sstable(s) are divided according to the specified split tokens, creating one output sstable per token range. Fixes: SCYLLADB-10 Closes scylladb/scylladb#28741	2026-03-02 15:19:17 +01:00
Botond Dénes	bf3edaf220	tools/scylla-sstable: filter_operation(): use deferred_close() to close reader Manual closing is bypassed with exceptions, promoting an exception to a crash due to unclosed reader. Closes scylladb/scylladb#28797	2026-03-02 14:16:08 +01:00
Marcin Maliszkiewicz	6bf706ef1b	Merge 'scylla-sstable: query: handle nested UDTs' from Botond Dénes The query (and in certain modes the write) operations uses virtual table facility inside `cql_test_env`. The schema of the sstable is created as a table in `cql_test_env`. This involves registering all UDTs with the keyspace, so they are available for lookups. This was done with a flat loop over all column types, but this is not enough. UDTs might be nested in other types, like collections. One has to do a traversal of the type tree and register every UDT on the way. This PR changes the flat loop to a recursive traversal of the type tree. The query operation now works with UDTs, no matter how deeply nested they are. Backport: Implements missing functionality of a tool, no backport. Closes scylladb/scylladb#28798 * github.com:scylladb/scylladb: tools/scylla-sstable: create_table_in_cql_env(): register UDTs recursively tools/scylla-sstable: generalize dump_if_user_type tools/scylla-sstable: move dump_if_user_type() definition	2026-03-02 14:14:43 +01:00
Marcin Maliszkiewicz	a83ee6cf66	Merge 'db/batchlog_manager: re-add v1 support for mixed clusters' from Botond Dénes `3f7ee3ce5d` introduced system.batchlog_v2, with a schema designed to speed up batchlog replays and make post-replay cleanups much more effective. It did not introduce a cluster feature for the new table, because it is node local table, so the cluster can switch to the new table gradually, one node at a time. However, https://github.com/scylladb/scylladb/issues/27886 showed that the switching causes timeouts during upgrades, in mixed clusters. Furthermore, switching to the new table unconditionally on upgrades nodes, means that on rollback, the batches saved into the v2 table are lost. This PR introduces re-introduces v1 (`system.batchlog`) support and guards the use of the v2 table with a cluster feature, so mixed clusters keep using v1 and thus be rollback-compatible. The re-introduced v1 support doesn't support post-replay cleanups for simplicity. The cleanup in v1 was never particularly effective anyway and we ended up disabling it for heavy batchlog users, so I don't think the lack of support for cleanup is a problem. Fixes: https://github.com/scylladb/scylladb/issues/27886 Needs backport to 2026.1, to fix upgrades for clusters using batches Closes scylladb/scylladb#28736 * github.com:scylladb/scylladb: test/boost/batchlog_manager_test: add tests for v1 batchlog test/boost/batchlog_manager_test: make prepare_batches() work with both v1 and v2 test/boost/batchlog_manager_test: fix indentation test/boost/batchlog_manager_test: extract prepare_batches() method test/lib/cql_assertions: is_rows(): add dump parameter tools/scylla-sstable: extract query result printers tools/scylla-sstable: add std::ostream& arg to query result printers repair/row_level: repair_flush_hints_batchlog_handler(): add all_replayed to finish log db/batchlog_manager: re-add v1 support db/batchlog_manager: return all_replayed from process_batch() db/batchlog_manager: process_bath() fix indentation db/batchlog_manager: make batch() a standalone function db/batchlog_manager: make structs stats public db/batchlog_manager: allocate limiter on the stack db/batchlog_manager: add feature_service dependency gms/feature_service: add batchlog_v2 feature	2026-03-02 12:09:10 +01:00
Taras Veretilnyk	4aa0a3acf9	compaction: add compaction_group_view method to specify sstable version Add make_sstable() overload that accepts sstable_version_types parameter to compaction_group_view interface and all implementations. This will be useful in rewrite component mechanism, as we need to preserve sstable version when creating the new one for the replacement.	2026-02-26 22:38:55 +01:00
Avi Kivity	5baf16005f	build: install antlr3 from maven + source, not rpm packages Fedora removed the C++ backend from antlr3 [1], citing incompatible license. The license in question (the Unicode license) is fine for us. To be able to continue using antlr3, build it ourselves. The main executable can be used as is from Maven, since we don't need any patches for the parser. The runtime needs to be patched, so we download the source and patch it. Regenerated frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-x86_64.tar.gz Fixes https://scylladb.atlassian.net/browse/SCYLLADB-773 Closes scylladb/scylladb#28765	2026-02-25 11:03:19 +02:00
Botond Dénes	8dbcd8a0b3	tools/scylla-sstable: create_table_in_cql_env(): register UDTs recursively It is not enough to go over all column types and register the UDTs. UDTs might be nested in other types, like collections. One has to do a traversal of the type tree and register every UDT on the way. That is what this patch does. This function is used by the query and write operations, which should now both work with nested UDTs. Add a test which fails before and passes after this patch.	2026-02-25 08:51:25 +02:00
Botond Dénes	cf39a5e610	tools/scylla-sstable: generalize dump_if_user_type Rename to invoke_on_user_type() and make the action taken on user types a function parameter. Enables reuse of the traverse logic by other code.	2026-02-25 08:51:25 +02:00
Botond Dénes	80049c88e9	tools/scylla-sstable: move dump_if_user_type() definition So it can be used by create_table_in_cql_env() code.	2026-02-25 08:51:25 +02:00
Calle Wilund	bac81df20f	scylla-nodetool: Add "cluster snapshot" command Similar to "normal" snapshot, but will use the cluster-wide, topolgy coordinated snapshot API and path.	2026-02-23 11:37:16 +01:00
Avi Kivity	92bc5568c5	tools: toolchain: build sanitizers for future toolchain The future toolchain did not build the sanitizers, so debug executables did not link. Fix by not disabling the sanitizers. Closes scylladb/scylladb#28733	2026-02-20 15:44:24 +02:00
Botond Dénes	48e9b3d668	tools/scylla-sstable: extract query result printers To cql3/query_result_printer.hh. Allowing for other users, outside of tools.	2026-02-20 07:03:46 +02:00
Botond Dénes	978627c4e1	tools/scylla-sstable: add std::ostream& arg to query result printers Make them more general-purpose, in preparation to extracting them to their own header.	2026-02-20 07:03:46 +02:00
Avi Kivity	66bef0ed36	lua, tools: adjust for lua 5.5 lua_newstate seed parameter Lua 5.5 adds a seed parameter to lua_newstate(), provide it with a strong random seed. Closes scylladb/scylladb#28734	2026-02-20 06:52:37 +02:00
Avi Kivity	c5a1f44731	tools: toolchain: switch from ccache to sccache sccache combines the functions of ccache and distcc, and promises to support C++20 modules in the future. Switch to sccache in anticipation of modules support. The documentation is adjusted since cache will be persistent for sccache without further work. Closes scylladb/scylladb#28524	2026-02-18 12:23:12 +02:00
Ernest Zaslavsky	196f7cad93	nodetool: fix handling of "--primary-replica-only" argument The "--primary-replica-only" ("-pro") flag was previously ignored by the `restore` operation. This patch ensures the argument is parsed and applied correctly. Closes scylladb/scylladb#28490	2026-02-18 12:21:27 +02:00
Pawel Pery	81d11a23ce	Revert "Merge 'vector_search: add validator tests' from Pawel Pery" This reverts commit `bcd1758911`, reversing changes made to `b2c2a99741`. There is a design decision to not introduce additional test orchestration tool for scylladb.git (see comments for #27499). One commit has already been reverted in `55c7bc7`. Last CI runs made validator test flaky, so it is a time to remove all remaining validator tests. It needs a backport to 2026.1 to remove remaining validator tests from there. Fixes: VECTOR-497 Closes scylladb/scylladb#28568	2026-02-08 16:29:58 +02:00
Avi Kivity	acc54cf304	tools: toolchain: adapt future toolchain to loss of toxiproxy in Fedora Next Fedora will likely not have toxiproxy packaged [1]. Adapt by installing it directly. To avoid changing the current toolchain, add a ./install-dependencies --future option. This will allow us to easily go back to the packages if the Fedora bug is fixed. [1] https://bugzilla.redhat.com/show_bug.cgi?id=2426954 Closes scylladb/scylladb#28444	2026-02-02 17:02:19 +02:00
Avi Kivity	347c69b7e2	build: add clang-tools-extra (for clang-include-cleaner) to frozen toolchain clang-include-cleaner is used in the iwyu.yaml github workflow (include- what-you-use). Add it to the frozen toolchain so it can be made part of the regular build process. The corresponding install command is removed from iwyu.yaml. Regenerated frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-21.1.8-Fedora-43-x86_64.tar.gz Closes scylladb/scylladb#28413	2026-01-29 08:44:49 +02:00
Botond Dénes	f375288b58	tools/scylla-sstable: introduce filter command Filter the content of sstable(s), including or excluding the specified partitions. Partitions can be provided on the command line via `--partition`, or in a file via `--partitions-file`. Produces one output sstable per input sstable -- if the filter selects at least one partition in the respective input sstable. Output sstables are placed in the path provided via `--oputput-dir`. Use `--merge` to filter all input sstables combined, producing one output sstable.	2026-01-22 17:20:07 +02:00
Botond Dénes	21900c55eb	tools/scylla-sstable: remove --unsafe-accept-nonempty-output-dir This flag was added to operations which have an --output-dir command-line arguments. These operations write sstables and need a directory where to write them. Back in the numeric-generation world this posed a problem: if the directory contained any sstable, generation clash was almost guaranteed, because each scylla-sstable command invokation would start output generations from 1. To avoid this, empty output directory was a requirement, with the --unsafe-accept-nonempty-output-dir allowing for a force-override. Now in the timeuuid generation days, all this is not necessary anymore: generations are unique, so it is not a problem if the output directory already contains sstables: the probability of generation clash is almost 0. Even if it happens, the tool will just simply fail to write the new sstable with the clashing generation. Remove this historic relic of a flag and the related logic, it is just a pointless nuissance nowadays.	2026-01-22 13:55:59 +02:00
Botond Dénes	a1ed73820f	tools/scylla-sstable: make partition_set ordered Next patch will want partitions to be ordered. Remove unused partition_map type.	2026-01-22 13:55:59 +02:00

1 2 3 4 5 ...

1306 Commits