scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-14 03:42:14 +00:00

Author	SHA1	Message	Date
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Pavel Emelyanov	896de77b99	docs: Update topology_over_raft.md with `restore` transition kind Add some text about how the new transition works. It doesn't include full feature description, just concentrates on the new transition and the way it interacts with the rest of topology coordinator machinery. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Robert Bindar	2f19d84ad7	Add system_distributed_keyspace.snapshot_sstables This patch adds the snapshot_sstables table with the following schema: ```cql CREATE TABLE system_distributed.snapshot_sstables ( snapshot_name text, keyspace text, table text, datacenter text, rack text, id uuid, first_token bigint, last_token bigint, toc_name text, prefix text) PRIMARY KEY ((snapshot_name, keyspace, table, datacenter, rack), first_token, id); ``` The table will be populated by the coordinator node during the restore phase (and later on during the backup phase to accomodate live-restore). The content of this table is meant to be consumed by the restore worker nodes which will use this data to filter and file-based download sstables. Fixes SCYLLADB-263 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00
Michał Jadwiszczak	396d4b17a0	docs: document tombstone avoidance in view_building_tasks	2026-04-22 09:10:14 +02:00
Tomasz Grabiec	cddde464ca	Merge 'service: Support adding/removing a datacenter with tablets by changing RF' from Aleksandra Martyniuk With this change, you can add or remove a DC(s) in a single ALTER KEYSPACE statement. It requires the keyspace to use rack list replication factor. In existing approach, during RF change all tablet replicas are rebuilt at once. This isn't the case now. In global_topology_request::keyspace_rf_change the request is added to a ongoing_rf_changes - a new column in system.topology table. In a new column in system_schema.keyspaces - next_replication - we keep the target RF. In make_rf_change_plan, load balancer schedules necessary migrations, considering the load of nodes and other pending tablet transitions. Requests from ongoing_rf_changes are processed concurrently, independently from one another. In each request racks are processed concurrently. No tablet replica will be removed until all required replicas are added. While adding replicas to each rack we always start with base tables and won't proceed with views until they are done (while removing - the other way around). The intermediary steps aren't reflected in schema. When the Rf change is finished: - in system_schema.keyspaces: - next_replication is cleared; - new keyspace properties are saved; - request is removed from ongoing_rf_changes; - the request is marked as done in system.topology_requests. Until the request is done, DESCRIBE KEYSPACE shows the replication_v2. If a request hasn't started to remove replicas, it can be aborted using task manager. system.topology_requests::error is set (but the request isn't marked as done) and next_replication = replication_v2. This will be interpreted by load balancer, that will start the rollback of the request. After the rollback is done, we set the relevant system.topology_requests entry as done (failed), clear the request id from system.topology::ongoing_rf_changes, and remove next_replication. Fixes: SCYLLADB-567. No backport needed; new feature. Closes scylladb/scylladb#24421 * github.com:scylladb/scylladb: service: fix indentation docs: update documentation test: test multi RF changes service: tasks: allow aborting ongoing RF changes cql3: allow changing RF by more than one when adding or removing a DC service: handle multi_rf_change service: implement make_rf_change_plan service: add keyspace_rf_change_plan to migration_plan service: extend tablet_migration_info to handle rebuilds service: split update_node_load_on_migration service: rearrange keyspace_rf_change handler db: add columns to system_schema.keyspaces db: service: add ongoing_rf_changes to system.topology gms: add keyspace_multi_rf_change feature	2026-04-22 01:46:11 +02:00
Radosław Cybulski	74b523ea20	treewide: fix spelling errors. Fix various spelling errors. Closes scylladb/scylladb#29574	2026-04-21 18:20:26 +03:00
Aleksandra Martyniuk	72bb3113ac	db: add columns to system_schema.keyspaces Add a new next_replication column to system_schema.keyspaces table. While there is an ongoing RF change: - next_replication keeps the target RF values; - existing replication_v2 column keeps initial RF values - the ones we started the RF change with. DESCRIBE KEYSPACE statement shows replication_v2. When there is no ongoing RF change for this keyspace, its next_replication is empty. In this commit no data is kept in the new column.	2026-04-17 09:58:07 +02:00
Botond Dénes	88a8324e68	erge 'db: store large data records in SSTable metadata and serve via virtual tables' from Benny Halevy `system.large_partitions`, `system.large_rows`, and `system.large_cells` store records keyed by SSTable name. When SSTables are migrated between shards or nodes (resharding, streaming, decommission), the records are lost because the destination never writes entries for the migrated SSTables. This patch series moves the source of truth for large data records into the SSTable's scylla metadata component (new `LargeDataRecords` tag 13) and reimplements the three `system.large_` tables as virtual tables that query live SSTables on demand. A cluster feature flag (`LARGE_DATA_VIRTUAL_TABLES`) gates the transition for safe rolling upgrades. When the cluster feature is enabled, each node drops the old system large_ tables and starts serving the corresponding tables using virtual tables that represent the large data records now stored on the sstables. Note that the virtual tables will be empty after upgrade until the sstables that contained large data are rewritten, therefore it is recommended to run upgrade sstables compaction or major compaction to repopulate the sstables scylla-metadata with large data records. 1. keys: move key_to_str() to keys/keys.hh — make the helper reusable across large_data_handler, virtual tables, and scylla-sstable 2. sstables: add LargeDataRecords metadata type (tag 13) — new struct with binary-serialized key fields, scylla-sstable JSON support, format documentation 3. large_data_handler: rename partition_above_threshold to above_threshold_result — generalize the struct for reuse 4. large_data_handler: return above_threshold_result from maybe_record_large_cells — separate booleans for cell size vs collection elements thresholds 5. sstables: populate LargeDataRecords from writer — bounded min-heaps (one per large_data_type), configurable top-N via `compaction_large_data_records_per_sstable` 6. test: add LargeDataRecords round-trip unit tests — verify write/read, top-N bounding, below-threshold behavior 7. db: call initialize_virtual_tables from shard 0 only — preparatory refactoring to enable cross-shard coordination 8. db: implement large_data virtual tables with feature flag gating — three virtual table classes, feature flag activation, legacy SSTable fallback, dual-threshold dedup, cross-shard collection Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1276 * Although this fixes a bug where large data entries are effectively lost when sstables are renamed or migrated, the changes are intrusive and do not warrant a backport Closes scylladb/scylladb#29257 * github.com:scylladb/scylladb: db: implement large_data virtual tables with feature flag gating db: call initialize_virtual_tables from shard 0 only test: add LargeDataRecords round-trip unit tests sstables: populate LargeDataRecords from writer large_data_handler: return above_threshold_result from maybe_record_large_cells large_data_handler: rename partition_above_threshold to above_threshold_result sstables: add LargeDataRecords metadata type (tag 13) sstables: add fmt::formatter for large_data_type keys: move key_to_str() to keys/keys.hh	2026-04-16 14:03:31 +03:00
Michael Litvak	43c76aaf2b	logstor: split log record to header and data Split the `log_record` to `log_record_header` type that has the record metadata fields and the mutation as a separate field which is the actual record data: struct log_record { log_record_header header; canonical_mutation mut; }; Both the header and mutation have variable serialized size. When a record is serialized in a write_buffer, we first put a small `record_header` that has the header size and data size, then the serialized header and data follow. The `log_location` of a record points to the beginning of the `record_header`, and the size includes the `record_header`. This allows us to read a record header without reading the data when it's not needed and avoid deserializing it: * on recovery, when scanning all segments, we read only the record headers. * on compaction, we read the record header first to determine if the record is alive, if yes then we read the data. Closes scylladb/scylladb#29457	2026-04-16 10:00:35 +03:00
Benny Halevy	d92cd42fe6	sstables: add LargeDataRecords metadata type (tag 13) Add a new scylla metadata component LargeDataRecords (tag 13) that stores per-SSTable top-N large data records. Each record carries: - large_data_type (partition_size, row_size, cell_size, etc.) - binary serialized partition key and clustering key - column name (for cell records) - value (size in bytes) - element count (rows or collection elements, type-dependent) - range tombstones and dead rows (partition records only) The struct uses disk_string<uint32_t> for key/name fields and is serialized via the existing describe_type framework into the SSTable Scylla metadata component. Add JSON support in scylla-sstable and format documentation.	2026-04-16 08:49:01 +03:00
Avi Kivity	59ec93b86b	Merge 'Allow arbitrary tablet boundaries and count' from Tomasz Grabiec There are several reasons we want to do that. One is that it will give us more flexibility in distributing the load. We can subdivide tablets at any token, and achieve more evenly-sized tablets. In particular, we can isolate large partitions into separate tablets. We can also split and merge incrementally individual tablets. Currently, we do it for the whole table or nothing, which makes splits and merges take longer and cause wide swings of the count. This is not implemented in this PR yet, we still split/merge the whole table. Another reason is vnode to tablets migration. We now could construct a tablet map which matches exactly the vnode boundaries, so migration can happen transparently from CQL-coordinator point of view. Tablet count is still a power-of-two by default for newly created tables. It may be different if tablet map is created by non-standard means, or if per-table tablet option "pow2_count" is set to "false". build/release/scylla perf-tablets: Memory footprint for 131k tablets increased from 56 MiB to 58.1 MiB (+3.5%) Before: ``` Generating tablet metadata Total tablet count: 131072 Size of tablet_metadata in memory: 57456 KiB Copied in 0.014346 [ms] Cleared in 0.002698 [ms] Saved in 1234.685303 [ms] Read in 445.577881 [ms] Read mutations in 299.596313 [ms] 128 mutations Read required hosts in 247.482742 [ms] Size of canonical mutations: 33.945053 [MiB] Disk space used by system.tablets: 1.456761 [MiB] Tablet metadata reload: full 407.69ms partial 2.65ms ``` After: ``` Generating tablet metadata Total tablet count: 131072 Size of tablet_metadata in memory: 59504 KiB Copied in 0.032475 [ms] Cleared in 0.002965 [ms] Saved in 1093.877441 [ms] Read in 387.027100 [ms] Read mutations in 255.752121 [ms] 128 mutations Read required hosts in 211.202805 [ms] Size of canonical mutations: 33.954453 [MiB] Disk space used by system.tablets: 1.450162 [MiB] Tablet metadata reload: full 354.50ms partial 2.19ms ``` Closes scylladb/scylladb#28459 * github.com:scylladb/scylladb: test: boost: tablets: Add test for merge with arbitrary tablet count tablets, database: Advertise 'arbitrary' layout in snapshot manifest tablets: Introduce pow2_count per-table tablet option tablets: Prepare for non-power-of-two tablet count tablets: Implement merged tablet_map constructor on top of for_each_sibling_tablets() tablets: Prepare resize_decision to hold data in decisions tablets: table: Make storage_group handle arbitrary merge boundaries tablets: Make stats update post-merge work with arbitrary merge boundaries locator: tablets: Support arbitrary tablet boundaries locator: tablets: Introduce tablet_map::get_split_token() dht: Introduce get_uniform_tokens()	2026-04-15 18:57:22 +03:00
Nadav Har'El	1eb8d170dd	Merge 'vector_index: allow recreating vector indexes on the same column' from Dawid Pawlik This series allows creating multiple vector indexes on the same column so users can rebuild an index without losing query availability. The intended flow is: 1. Create a new vector index on a column that already has one. 2. Keep serving ANN queries from the old index while the new one is being built. 3. Verify the new index is ready. 4. Automatically switch to the remaining index. 5. Drop the old index. To make that deterministic, `index_version` is changed from the base table schema version to a real creation timeuuid. When multiple vector indexes exist on the same column, ANN query planning now picks the index according to the routing implemented in Vector Store (newest serving index). This keeps queries on the old index until it the new one is up and ready. This patch also removes the create-time restriction that rejected a second vector index on the same column. Name collisions are still rejected as before. Test coverage is updated accordingly: - Scylla now verifies that two vector indexes can coexist on the same column. - Cassandra/SAI behavior is still covered and is still expected to reject duplicate indexes on the same column. Fixes: VECTOR-610 Closes scylladb/scylladb#29407 * github.com:scylladb/scylladb: docs: document vector index metadata and duplicate handling test/cqlpy: cover vector index duplicate creation rules vector_index: allow multiple named indexes on one column vector_index: store `index_version` as creation timeuuid	2026-04-15 14:40:15 +03:00
Botond Dénes	a9c86fc2e4	docs: document schema subcomponent in sstable-scylla-format.md Commit `234f905` (sstables: scylla_metadata: add schema member) added a new Schema subcomponent (tag 11) to scylla_metadata. Document it in the sstable Scylla format reference: - Add schema to the subcomponent grammar enumeration - Add a summary entry describing the subcomponent (tag 11) and its purpose - Add a detailed ## schema subcomponent section with the binary grammar, covering table_id, table_schema_version, keyspace_name, table_name and the column_description array (column_kind, column_name, column_type) Fixes https://github.com/scylladb/scylladb/issues/27960 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Closes scylladb/scylladb#28983	2026-04-15 14:40:15 +03:00
Pavel Emelyanov	a428472e50	db: Remove redundant enable_logstor config option The enable_logstor configuration option is redundant with the 'logstor' experimental feature flag. Consolidate to a single gate: use the experimental feature to control both whether logstor is available for table creation and whether it is initialized at database startup. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#29427	2026-04-15 14:40:15 +03:00
Tomasz Grabiec	7af9f5366d	tablets, database: Advertise 'arbitrary' layout in snapshot manifest Currently, the manifest advertises "powof2", which is wrong for arbitrary count and boundaries. Introduce a new kind of layout called "arbitrary", and produce it if the tablet map doesn't conform to "powof2" layout. We should also produce tablet boundaries in this case, but that's worked on in a different PR: https://github.com/scylladb/scylladb/pull/28525	2026-04-15 10:40:56 +02:00
Dawid Pawlik	f40ab83d02	docs: document vector index metadata and duplicate handling Document the new vector index behavior in the user-facing and developer docs. Describe `index_version` as a creation timeuuid stored in `system_schema.indexes`, clarify that recreating an index changes it while ALTER TABLE does not, and document that Scylla allows multiple named vector indexes on the same column while still rejecting unnamed duplicates.	2026-04-14 12:21:38 +02:00
Avi Kivity	22949bae52	Merge 'logstor: implement tablet split/merge and migration' from Michael Litvak implement tablet split, tablet merge and tablet migration for tables that use the experimental logstor storage engine. * tablet merge simply merges the histograms of segments of one compaction group with another. * for tablet split we take the segments from the source compaction group, read them and write all live records to separate segments according to the split classifier, and move separated segments to the target compaction groups. * for tablet migration we use stream_blob, similarly to file streaming of sstables. we add a new op type for streaming a logstor segment. on the source we take a snapshot of the segments with an input stream that reads the segment, and on the target we create a sink that allocates a new segment on the target shard and writes to it. * we also do some improvements for recovery and loading of segments. we add a segment header that contains useful information for non-mixed segments, such as the table and token range. Refs SCYLLADB-770 no backport - still a new and experimental feature Closes scylladb/scylladb#29207 * github.com:scylladb/scylladb: test: logstor: additional logstor tests docs/dev: add logstor on-disk format section logstor: add version and crc to buffer header test: logstor: tablet split/merge and migration logstor: enable tablet balancing logstor: streaming of logstor segments using stream_blob logstor: add take_logstor_snapshot logstor: segment input/output stream logstor: implement compaction_group::cleanup logstor: tablet split logstor: tablet merge logstor: add compaction reenabler logstor: add segment header logstor: serialize writes to active segment replica: extend compaction_group functions for logstor replica: add compaction_group_for_logstor_segment logstor: code cleanup	2026-04-12 16:11:12 +03:00
Piotr Dulikowski	3bd770d4d9	Merge 'counters: reuse counter IDs by rack' from Michael Litvak For counter updates, use a counter ID that is constructed from the node's rack instead of the node's host ID. A rack can have at most two active tablet replicas at a time: a single normal tablet replica, and during tablet migration there are two active replicas, the normal and pending replica. Therefore we can have two unique counter IDs per rack that are reused by all replicas in the rack. We construct the counter ID from the rack UUID, which is constructed from the name "dc:rack". The pending replica uses a deterministic variation of the rack's counter ID by negating it. This improves the performance and size of counter cells by having less unique counter IDs and less counter shards in a counter cell. Previously the number of counter shards was the number of different host_id's that updated the counter, which can be typically the number of nodes in the cluster and continue growing indefinitely when nodes are replaced. with the rack-based counter id the number of counter shards will be at most twice the number of different racks (including removed racks, which should not be significant). Fixes SCYLLADB-356 backport not needed - an enhancement Closes scylladb/scylladb#28901 * github.com:scylladb/scylladb: docs/dev: add counters doc counters: reuse counter IDs by rack	2026-04-10 12:24:18 +02:00
Botond Dénes	5886d1841a	Merge 'cmake: align CMake build system with configure.py and add comparison script' from Ernest Zaslavsky Every time someone modifies the build system — adding a source file, changing a compilation flag, or wiring a new test — the change tends to land in only one of our two build systems (configure.py or CMake). Over time this causes three classes of problems: 1. CMake stops compiling entirely. Missing defines, wrong sanitizer flags, or misplaced subdirectory ordering cause hard build failures that are only discovered when someone tries to use CMake (e.g. for IDE integration). 2. Missing build targets. Tests or binaries present in configure.py are never added to CMake, so `cmake --build` silently skips them. This PR fixes several such cases (e.g. `symmetric_key_test`, `auth_cache_test`, `sstable_tablet_streaming`). 3. Missing compilation units in targets. A `.cc` file is added to a test binary in one system but not the other, causing link errors or silently omitted test coverage. To fix the existing drift and prevent future divergence, this series: Adds a build-system comparison script (`scripts/compare_build_systems.py`) that configures both systems into a temporary directory, parses their generated `build.ninja` files, and compares per-file compilation flags, link target sets, and per-target libraries. configure.py is treated as the baseline; CMake must match it. The script supports a `--ci` mode suitable for gating PRs that touch build files. Fixes all current mismatches found by the script: - Mode flag alignment in `mode.common.cmake` and `mode.Coverage.cmake` (sanitizer flags, `-fno-lto`, stack-usage warnings, coverage defines). - Global define alignment (`SEASTAR_NO_EXCEPTION_HACK`, `XXH_PRIVATE_API`, `BOOST_ALL_DYN_LINK`, `SEASTAR_TESTING_MAIN` placement). - Seastar build configuration (shared vs static per mode, coverage sanitizer link options). - Abseil sanitizer flags (`-fno-sanitize=vptr`). - Missing test targets in `test/boost/CMakeLists.txt`. - Redundant per-test flags now covered by global settings. - Lua library resolution via a custom `cmake/FindLua.cmake` using pkg-config, matching configure.py's approach. Adds documentation (`docs/dev/compare-build-systems.md`) describing how to run the script and interpret its output. No backport needed — this is build infrastructure improvement only. Closes scylladb/scylladb#29273 * github.com:scylladb/scylladb: scripts: remove lua library rename workaround from comparison script cmake: add custom FindLua using pkg-config to match configure.py test/cmake: add missing tests to boost test suite test/cmake: remove per-test LTO disable cmake: add BOOST_ALL_DYN_LINK and strip per-component defines cmake: move SEASTAR_TESTING_MAIN after seastar and abseil subdirs cmake: add -fno-sanitize=vptr for abseil sanitizer flags cmake: align Seastar build configuration with configure.py cmake: align global compile defines and options with configure.py cmake: fix Coverage mode in mode.Coverage.cmake cmake: align mode.common.cmake flags with configure.py configure.py: add sstable_tablet_streaming to combined_tests docs: add compare-build-systems.md scripts: add compare_build_systems.py to compare ninja build files	2026-04-09 15:46:09 +03:00
Michael Litvak	3964040008	docs/dev: add counters doc Add a documentation of the counters feature implementation in docs/dev/counters.md. The documentation is taken from the wiki and updated according to the current state of the code - legacy details are removed, and a section about the counter id is added.	2026-04-09 13:08:02 +02:00
Geoff Montee	7d7ec7025e	docs: Document system keyspaces for developers / internal usage Fixes #29043 with the following docs changes: - docs/dev/system-keyspaces.md: Added a new file that documents all keyspaces created internally Closes scylladb/scylladb#29044	2026-04-09 11:49:58 +03:00
Anna Stuchlik	dd34d2afb4	doc: remove references to old versions from Docker Hub docs This commit removes references ScyllaDB versions ("Since x.y") from the ScyllaDB documentation on Docker Hub, as they are redundant and confusing (some versions are super ancient). Fixes SCYLLADB-1212 Closes scylladb/scylladb#29204	2026-04-09 11:43:40 +03:00
Nadav Har'El	22e7ef46a7	Merge 'vector_search: fix SELECT on local vector index' from Karol Nowacki Queries against local vector indexes were failing with the error: ```ANN ordering by vector requires the column to be indexed using 'vector_index'``` This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895 Backport to 2026.1 is required as this issue occurs also on this branch. Closes scylladb/scylladb#28862 * github.com:scylladb/scylladb: index: fix DESC INDEX for vector index vector_search: test: refactor boilerplate setup vector_search: fix SELECT on local vector index index: test: vector index target option serialization test index: test: secondary index target option serialization test	2026-04-07 17:43:35 +03:00
Avi Kivity	00409b61f1	Merge 'Add Vnodes to Tablets Migration Procedure' from Nikos Dragazis This PR introduces the vnodes-to-tablets migration procedure, which enables converting an existing vnode-based keyspace to tablets. The migration is implemented as a manual, operator-driven process executed in several stages. The core idea is to first create tablet maps with the same token boundaries and replica hosts as the vnodes, and then incrementally convert the storage of each node to the tablets layout. At a high level, the procedure is the following: 1. Create tablet maps for all tables in the keyspace. 2. Sequentially upgrade all nodes from vnodes to tablets: 1. Mark a node for upgrade in the topology state. 2. Restart the node. During startup, while the node is offline, it reshards the SSTables on vnode boundaries and switches to a tablet ERM. 3. Wait for the node to return online before proceeding to the next node. 4. Finalize the migration: 1. Update the keyspace schema to mark it as tablet-based. 2. Clear the group0 state related to the migration. From the client's perspective, the migration is online; the cluster can still serve requests on that keyspace, although performance may be temporarily degraded. During the migration, some nodes use vnode ERMs while others use tablet ERMs. Cluster-level algorithms such as load balancing will treat the keyspace's tables as vnode-based. Once migration is finalized, the keyspace is permanently switched to tablets and cannot be reverted back to vnodes. However, a rollback procedure is available before finalization. The patch series consists of: * Load balancer adjustments to ignore tablets belonging to a migrating keyspace. * A new vnode-based resharding mode, where SSTables are segregated on vnode boundaries rather than with the static sharder. * A new per-node `intended_storage_mode` column in `system.topology`. Represents migration intent (whether migration should occur on restart) and direction. * Four new REST endpoints for driving the migration (start, node upgrade/downgrade, finalize, status), along with `nodetool` wrappers. The finalization is implemented as a global topology request. * Wiring of the migration process into the startup logic: the `distributed_loader` determines a migrating table's ERM flavor from the `intended_storage_mode` and the ERM flavor determines the `table_populator`'s resharding mode. Token metadata changes have been adjusted to preserve the ERM flavor. * Cluster tests for the migration process. Fixes SCYLLADB-722. Fixes SCYLLADB-723. Fixes SCYLLADB-725. Fixes SCYLLADB-779. Fixes SCYLLADB-948. New feature, no backport is needed. Closes scylladb/scylladb#29065 * github.com:scylladb/scylladb: docs: Add ops guide for vnodes-to-tablets migration test: cluster: Add test for migration of multiple keyspaces test: cluster: Add test for error conditions test: cluster: Add vnodes->tablets migration test (rollback) test: cluster: Add vnodes->tablets migration test (1 table, 3 nodes) test: cluster: Add vnodes->tablets migration test (1 table, 1 node) scylla-nodetool: Add migrate-to-tablets subcommand api: Add REST endpoint for vnode-to-tablet migration status api: Add REST endpoint for migration finalization topology_coordinator: Add `finalize_migration` request database: Construct migrating tables with tablet ERMs api: Add REST endpoint for upgrading nodes to tablets api: Add REST endpoint for starting vnodes-to-tablets migration topology_state_machine: Add intended_storage_mode to system.topology distributed_loader: Wire vnode-based resharding into table populator replica: Pick any compaction group for resharding compaction: resharding_compaction: add vnodes_resharding option storage_service: Preserve ERM flavor of migrating tables tablet_allocator: Exclude migrating tables from load balancing feature_service: Add vnodes_to_tablets_migrations feature	2026-04-07 14:32:22 +03:00
Piotr Smaron	7d449a307c	docs: remove old audit design doc As discussed with @ScyllaPiotr in https://github.com/scylladb/scylladb/pull/29232, the doc about to be removed is just: > Looking at history, I think this audit.md is a design doc: scylladb/scylla-enterprise@87a5c19, for which the feature has been implemented differently, eventually, and was created around the time when design docs, apparently, where stored within the repository itself. So for me it's some trash (sorry for strong language) that can be safely removed. Closes scylladb/scylladb#29316	2026-04-07 14:11:53 +03:00
Michael Litvak	5b3e2a4ca2	docs/dev: add logstor on-disk format section	2026-03-31 18:45:08 +02:00
Karol Nowacki	6bc88e817f	vector_search: fix SELECT on local vector index Queries against local vector indexes were failing with the error: "ANN ordering by vector requires the column to be indexed using 'vector_index'" This was a regression introduced by `15788c3734`, which incorrectly assumed the first column in the targets list is always the vector column. For local vector indexes, the first column is the partition key, causing the failure. Previously, serialization logic for the target index option was shared between vector and secondary indexes. This is no longer viable due to the introduction of local vector indexes and vector indexes with filtering columns, which have different target format. This commit introduces a dedicated JSON-based serialization format for vector index targets, identifying the target column (tc), filtering columns (fc), and partition key columns (pk). This ensures unambiguous serialization and deserialization for all vector index types. This change is backward compatible for regular vector indexes. However, it breaks compatibility for local vector indexes and vector indexes with filtering columns created in version 2026.1.0. To mitigate this, usage of these specific index types will be blocked in the 2026.1.0 release by failing ANN queries against them in vector-store service. Fixes: SCYLLADB-895	2026-03-30 16:46:48 +02:00
Karol Nowacki	4dc28dfa52	index: test: secondary index target option serialization test Target option serialization must remain stable for backward compatibility. The index is restored from this property on startup, so unintentional changes to the serialization schema can break indexes after upgrade.	2026-03-30 16:46:47 +02:00
Ernest Zaslavsky	33bca2428a	docs: add compare-build-systems.md Document the purpose, usage, and examples for scripts/compare_build_systems.py which compares the configure.py and CMake build systems by parsing their ninja build files.	2026-03-29 16:17:44 +03:00
Nikos Dragazis	b7f4ae8218	topology_state_machine: Add intended_storage_mode to system.topology Part of the vnodes-to-tablets migration is to reshard the SSTables of each node on vnode boundaries. Resharding is a heavy operation that runs on startup while the node is offline. Since nodes can restart for unexpected reasons, we need a flag to do it in a controllable way. We also need the ability to roll back the migration, which requires resharding in the opposite direction. This means a node must be aware of the intended migration direction. To address both requirements, this patch introduces a new column, intended_storage_mode, in system.topology. A non-null value indicates that a node should perform a migration and specifies the migration direction. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>	2026-03-24 11:06:38 +02:00
Michael Litvak	ad87eda835	docs/dev: add logstor documentation	2026-03-18 19:24:28 +01:00
Botond Dénes	81e214237f	Merge 'Add digests for all sstable components in scylla metadata' from Taras Veretilnyk This pull request adds support for calculation and storing CRC32 digests for all SSTable components. This change replaces plain file_writer with crc32_digest_file_writer for all SSTable components that should be checksummed. The resulting component digests are stored in the sstable structure and later persisted to disk as part of the Scylla metadata component during writer::consume_end_of_stream. Several test cases where introduced to verify expected behaviour. Additionally, this PR adds new rewrite component mechanism for safe sstable component rewriting. Previously, rewriting an sstable component (e.g., via rewrite_statistics) created a temporary file that was renamed to the final name after sealing. This allowed crash recovery by simply removing the temporary file on startup. However, with component digests stored in scylla_metadata (#20100), replacing a component like Statistics requires atomically updating both the component and scylla_metadata with the new digest - impossible with POSIX rename. The new mechanism creates a clone sstable with a fresh generation: - Hard-links all components from the source except the component being rewritten and scylla_metadata - Copies original sstable components pointer and recognized components from the source - Invokes a modifier callback to adjust the new sstable before rewriting - Writes the modified component along with updated scylla_metadata containing the new digest - Seals the new sstable with a temporary TOC - Replaces the old sstable atomically, the same way as it is done in compaction This is built on the rewrite_sstables compaction framework to support batch operations (e.g., following incremental repair). In case of any failure durning the whole process, sstable will be automatically deleted on the node startup due to temporary toc persistence. Backport is not required, it is a new feature Fixes https://github.com/scylladb/scylladb/issues/20100, https://github.com/scylladb/scylladb/issues/27453 Closes scylladb/scylladb#28338 * github.com:scylladb/scylladb: docs: document components_digests subcomponent and trailing digest in Scylla.db sstable_compaction_test: Add tests for perform_component_rewrite sstable_test: add verification testcases of SSTable components digests persistance sstables: store digest of all sstable components in scylla metadata sstables: replace rewrite_statistics with new rewrite component mechanism sstables: add new rewrite component mechanism for safe sstable component rewriting compaction: add compaction_group_view method to specify sstable version sstables: add null_data_sink and serialized_checksum for checksum-only calculation sstables: extract default write open flags into a constant sstables: Add write_simple_with_digest for component checksumming sstables: Extract file writer closing logic into separate methods sstables: Implement CRC32 digest-only writer	2026-03-10 16:02:53 +02:00
Taras Veretilnyk	739dd59ebc	docs: document components_digests subcomponent and trailing digest in Scylla.db Document the new `components_digests` subcomponent (tag 12) added to the Scylla.db metadata component, which stores CRC32 digests of all checksummed SSTable component files. Also document the trailing CRC32 digest that stores digest of the scylla metadata itself.	2026-03-06 21:58:15 +01:00
Avi Kivity	85bd6d0114	Merge 'Add multiple-shard persistent metadata storage for strongly consistent tables' from Wojciech Mitros In this series we introduce new system tables and use them for storing the raft metadata for strongly consistent tables. In contrast to the previously used raft group0 tables, the new tables can store data on any shard. The tables also allow specifying the shard where each partition should reside, which enables the tablets of strongly consistent tables to have their raft group metadata co-located on the same shard as the tablet replica. The new tables have almost the same schemas as the raft group0 tables. However, they have an additional column in their partition keys. The additional column is the shard that specifies where the data should be located. While a tablet and its corresponding raft group server resides on some shard, it now writes and reads all requests to the metadata tables using its shard in addition to the group_id. The extra partition key column is used by the new partitioner and sharder which allow this special shard routing. The partitioner encodes the shard in the token and the sharder decodes the shard from the token. This approach for routing avoids any additional lookups (for the tablet mapping) during operations on the new tables and it also doesn't require keeping any state. It also doesn't interact negatively with resharding - as long as tablets (and their corresponding raft metadata) occupy some shard, we do not allow starting the node with a shard count lower than the id of this shard. When increasing the shard count, the routing does not change, similarly to how tablet allocation doesn't change. To use the new tables, a new implementation of `raft::persistence` is added. Currently, it's almost an exact copy of the `raft_sys_table_storage` which just uses the new tables, but in the future we can modify it with changes specific to metadata (or mutation) storage for strongly consistent tables. The new storage is used in the `groups_manager`, which combined with the removal of some `this_shard_id() == 0` checks, allows strongly consistent tables to be used on all shards. This approach for making sure that the reads/writes to the new tables end up on the correct shards won in the balance of complexity/usability/performance against a few other approaches we've considered. They include: 1. Making the Raft server read/write directly to the database, skipping the sharder, on its shard, while using the default partitioner/sharder. This approach could let us avoid changing the schema and there should be no problems for reads and writes performed by the Raft server. However, in this approach we would input data in tables conflicting with the placement determined by the sharder. As a result, any read going through the sharder could miss the rows it was supposed to read. Even when reading all shards to find a specific value, there is a risk of polluting the cache - the rows loaded on incorrect shards may persist in the cache for an unknown amount of time. The cache may also mistakenly remember that a row is missing, even though it's actually present, just on an incorrect shard. Some of the issues with this approach could be worked around using another sharder which always returns this_shard_id() when asked about a shard. It's not clear how such a sharder would implement a method like `token_for_next_shard`, and how much simpler it would be compared to the current "identity" sharder. 2. Using a sharder depending on the current allocation of tablets on the node. This approach relies on the knowledge of group_id -> shard mapping at any point in time in the cluster. For this approach we'd also need to either add a custom partitioner which encodes the group_id in the token, or we'd need to track the token(group_id) -> shard mapping. This approach has the benefit over the one used in the series of keeping the partition key as just group_id. However, it requires more logic, and the access to the live state of the node in the sharder, and it's not static - the same token may be sharded differently depending on the state of the node - it shouldn't occur in practice, but if we changed the state of the node before adjusting the table data, we would be unable to access/fix the stale data without artificially also changing the state of the node. 3. Using metadata tables co-located to the strongly consistent tables. This approach could simplify the metadata migrations in the future, however it would require additional schema management of all co-located metadata tables, and it's not even obvious what could be used as the partition key in these tables - some metadata is per-raft-group, so we couldn't reuse the partition key of the strongly consistent table for it. And finding and remembering a partition key that is routed to a specific shard is not a simple task. Finally, splits and merges will most likely need special handling for metadata anyway, so we wouldn't even make use of co-located table's splits and merges. Fixes [SCYLLADB-361](https://scylladb.atlassian.net/browse/SCYLLADB-361) [SCYLLADB-361]: https://scylladb.atlassian.net/browse/SCYLLADB-361?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28509 * github.com:scylladb/scylladb: docs: add strong consistency doc test/cluster: add tests for strongly-consistent tables' metadata persistence raft: enable multi-shard raft groups for strongly consistent tablets test/raft: add unit tests for raft_groups_storage raft: add raft_groups_storage persistence class db: add system tables for strongly consistent tables' raft groups dht: add fixed_shard_partitioner and fixed_shard_sharder raft: add group_id -> shard mapping to raft_group_registry schema: add with_sharder overload accepting static_sharder reference	2026-03-04 08:55:43 +02:00
Geoff Montee	0eb5603ebd	Docs: describe the system tables Fixes issue #12818 with the following docs changes: docs/dev/system_keyspace.md: Added missing system tables, added table of contents (TOC), added categories Closes scylladb/scylladb#27789	2026-03-04 08:55:43 +02:00
Dimitrios Symonidis	80b74d7df2	tablet options: Add max_tablet_count tablet option to enforce tablet count upper bounds Introduced a new max_tablet_count tablet option that caps the maximum number of tablets a table can have. This feature is designed primarily for backup and restore workflows. During backup, when load balancing is disabled for snapshot consistency, the current tablet count is recorded in the backup manifest. During restore, max_tablet_count is set to this recorded value, ensuring the restored table's tablet count never exceeds the original snapshot's tablet distribution. This guarantee enables efficient file-based SSTable streaming during restore, as each SSTable remains fully contained within a single tablet boundary. Closes scylladb/scylladb#28450	2026-03-03 11:19:24 +03:00
Wojciech Mitros	d1ff8f1db3	docs: add strong consistency doc Add a new docs/dev document for the strongly consistent tables feature. For now, it only contains information about the Raft metadata persistence, but it should be updated as more of the strong-consistency components are added.	2026-02-25 12:34:58 +01:00
Botond Dénes	7e90ed657c	Merge 'Fix `client_options` docs' from Karol Baryła https://github.com/scylladb/scylladb/pull/25746 added a new column to `system.clients`: `client_options frozen<map<text, text>>`. This column stores all options sent by the client in the `STARTUP` message. This PR also added `CLIENT_OPTIONS` to the list of values sent in `SUPPORTED` message, and documented that drivers can send their configuration (as JSON) in `STARTUP` under this key. Documentation for the new column was not added to the description of `system.clients` table, and documentation about the new `STARTUP` key was added in `protocol-extensions.md`, but in the section about shard awareness extension. This PR adds missing `system.clients` column description, moves the documentation of `CLIENT_OPTIONS` into its own section, and expands it a bit. Backport: none, because this fixes internal documentation. Closes scylladb/scylladb#28126 * github.com:scylladb/scylladb: protocol-extensions.md: Fix client_options docs system_keyspace.md: Add client_options column system_keyspace.md: Fix order in system.clients	2026-02-20 14:23:34 +02:00
Marcin Maliszkiewicz	9d9184e5b7	auth: use unified cache for permissions	2026-02-17 17:56:27 +01:00
Jakub Smolar	e978cc2a80	scylla_gdb: use persistent GDB - decrease test execution time This commit replaces the previous approach of running pytest inside GDB’s Python interpreter. Instead, tests are executed by driving a persistent GDB process externally using pexpect. - pexpect: Python library for controlling interactive programs (used here to send commands to GDB and capture its output) - persistent GDB: keep one GDB session alive across multiple tests instead of starting a new process for each test Tests can now be executed via `./test.py gdb` or with `pytest test/scylla_gdb`. This improves performance and makes failures easier to debug since pytest no longer runs hidden inside GDB subprocesses. Closes scylladb/scylladb#24804	2026-01-29 10:01:39 +02:00
Łukasz Paszkowski	2d3a40e023	permit_reader: Add a new state: preemptive_aborted A permit gets into the preemptive_aborted state when: - times out; - gets rejected from execution due to high chance its execution would not finalize on time; Being in this state means a permit was removed from the wait list, its internal timer was canceled and semaphore's statistic `total_reads_shed_due_to_overload` increased.	2026-01-28 14:20:01 +01:00
Pavel Emelyanov	cb6ee05391	Merge 'Extend snapshot manifest.json with tablet-aware metadata' from Benny Halevy This series extends the json manifest file we create when taking snapshots. It adds the following metadata: - manifesr version and scope - snapshot name - created_at and expires_at timestamps (#24061) - node metadata (host_id, dc, rack) - keyspace and table metadat - tablet_count (#26352) - per-sstable metadata (#26352) Fixes [SCYLLADB-189](https://scylladb.atlassian.net/browse/SCYLLADB-189) Fixes [SCYLLADB-195](https://scylladb.atlassian.net/browse/SCYLLADB-195) Fixes [SCYLLADB-196](https://scylladb.atlassian.net/browse/SCYLLADB-196) * Enhancement, no backport needed [SCYLLADB-189]: https://scylladb.atlassian.net/browse/SCYLLADB-189?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SCYLLADB-195]: https://scylladb.atlassian.net/browse/SCYLLADB-195?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ [SCYLLADB-196]: https://scylladb.atlassian.net/browse/SCYLLADB-196?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#27945 * github.com:scylladb/scylladb: snapshot: keep per-sstable metadata in manifest.json snapshot: add table info and tablet_count to manifest.json snapshot: add basic support for snapshot ttl in manifest.json table: snapshot_on_all_shards: take snapshot_options db: snapshot_ctl: move skip_flush to struct snapshot_options snapshot: add snapshot name in manifest.json test: lib: cql_test_env: apply db::config::tablets_mode_for_new_keyspaces snapshot: add node info to manifest.json snapshot: add manifest info to manifest.json test: database_test: snapshot_works: add validate_manifest	2026-01-22 15:19:11 +03:00
Patryk Jędrzejczak	67045b5f17	Merge 'raft_topology, tablets: Drain tablets in parallel with other topology operations' from Tomasz Grabiec Allows other topology operations to execute while tablets are being drained on decommission. In particular, bootstrap on scale-out. This is important for elasticity. Allows multiple decommission/removenode to happen in parallel, which is important for efficiency. Flow of decommission/removenode request: 1) pending and paused, has tablet replicas on target node. Tablet scheduler will start draining tablets. 2) No tablets on target node, request is pending but not paused 3) Request is scheduled, node is in transition 4) Request is done Nodes are considered draining as soon as there is a leave or remove request on them. If there are tablet replicas present on the target node, the request is in a paused state and will not be picked by topology coordinator. The paused state is computed from topology state automatically on reload. When request is not paused, its execution starts in write_both_read_old state. The old tablet_draining state is not entered (it's deprecated now). Tablet load balancing will yield the state machine as soon as some request is no longer paused and ready to be scheduled, based on standard preemption mechanics. Fixes #21452 Closes scylladb/scylladb#24129 * https://github.com/scylladb/scylladb: docs: Document parallel decommission and removenode and relevant task API test: Add tests for parallel decommission/removenode test: util: Introduce ensure_group0_leader_on() test: tablets: Check that there are no migrations scheduled on draining nodes test: lib: topology_builder: Introduce add_draining_request() topology_coordinator, tablets: Fail draining operations when tablet migration fails due to critical disk utilization tablets: topology_coordinator: Refactor to propagate reason for migration rollback tablet_allocator: Skip co-location on draining nodes node_ops: task_manager_module: Populate entity field also for active requests tasks: node_ops: Put node id in the entity field tasks, node_ops: Unify setting of task_stats in get_status() and get_stats() topology: Protect against empty cancelation reason tasks, topology: Make pending node operations abortable doc: topology-over-raft.md: Fix diagram for replacing, tablet_draining is not engaged raft_topology, tablets: Drain tablets in parallel with other topology operations virtual_tables: Show draining and excluded fields in system.cluster_status and system.load_by_node locator: topology: Add "draining" flag to a node topology_coordinator: Extract generate_cancel_request_update() storage_service: Drop dependency in topology_state_machine.hh in the header locator: Extract common code in assert_rf_rack_valid_keyspace() topology_coordinator, storage_service: Validate node removal/decommission at request submission time	2026-01-22 13:06:53 +01:00
Benny Halevy	d6557764b9	snapshot: keep per-sstable metadata in manifest.json Adds a "sstables" array member to manifest.json. For each sstables, keep the following metadata: id - a uuid for the sstable (the sstable identifier if the use-sstable-identifier option was used, otherwise the sstable uuid generation) toc_name - the name of the TOC.txt file data_size and index_size - in bytes first_token and last_token - of the sstable first and last keys. Fixes: SCYLLADB-196 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:42:52 +02:00
Benny Halevy	dc9093303d	snapshot: add table info and tablet_count to manifest.json Add a table member to manifest.json with the keyspace_name, table_name, table_id, tablets_type, and, for tablets-enabled tables, get tablet_count on each shard and write the minimum to manifest.json. For vnodes-based tables, tablet_count=0. For now, `tablets_type` may be either `none` for vnodes tables, or `powof2` for tablets tables. In the future, when we support arbitrary tablt boundaries, this will be reflected here, and it is likely we would backup the whole tablets map sperately to get all tablet boundaries. Fixes SCYLLADB-195 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:36:52 +02:00
Benny Halevy	91df129e21	snapshot: add basic support for snapshot ttl in manifest.json Store the snapshot `created_at` time and an optional `expires_at` time. Fixes SCYLLADB-189 Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:12:56 +02:00
Benny Halevy	d9fc3b1c11	snapshot: add snapshot name in manifest.json Store the snapshot tag in the manifest file. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:12:56 +02:00
Benny Halevy	0d82e56078	snapshot: add node info to manifest.json Add metadata about the node: host_id, datacenter, and rack. This enables dc- or rack- aware restore. Today this information is "encoded" into the snapshot hierarchy prefixes, but if all manifest files would be stored in a flat directory, we'd need to encode that metadata in the object name, but it'd be better for the manifest contents to be self descriptive. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:12:56 +02:00
Benny Halevy	24040efc54	snapshot: add manifest info to manifest.json Add metadata about the manifest itself: A version and the manifest scope (currently "node", but in the future, may also be "shard", or "tablet") Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:12:56 +02:00
Benny Halevy	9e0f5410ae	test: database_test: snapshot_works: add validate_manifest Validate the manifest.json format by loading it using rjson::parse and then validate its contents to ensure it lists exactly the SSTables present in the snapshot directory. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2026-01-22 09:12:56 +02:00

1 2 3 4 5 ...

371 Commits