scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-13 11:22:01 +00:00

Author	SHA1	Message	Date
Yaniv Michael Kaul	5d6f160129	test: update get_scylla_2025_1_executable() to use 2025.1.12 Update the hardcoded 2025.1.0 binary URL to the latest 2025.1.12 release for upgrade tests. The 2025.1.12 binary now supports and enforces the rf_rack_valid_keyspaces option which the test harness enables by default. Since test_sstable_compression_dictionaries_upgrade creates a 2-node cluster in a single rack with RF=2, it violates the constraint. Disable the option explicitly for this test. Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#29714	2026-05-12 23:20:55 +02:00
Wojciech Mitros	f3cf20803b	test: run test_mv_admission_control_exception on one shard In the test we perform 2 consecutive writes where the first write is supposed to increase the view update backlog above the mv admission control threshold and the second one is expected to be rejected because of that. On each node/shard we have 2 types of view update backlogs: 1. for deciding whether we should admit writes 2. for propagating the backlog information to other nodes/shards. For the second write to be rejected, it must be performed on a node and shard which updated its backlog of type 1. The view update backlog of type 2. is immediately increased on the base table replica. For this backlog to be registered as a backlog of type 1., it needs to be either carried by gossip (happening once every second) or by attaching it to a replica write response. We don't want to increase the runtime of tests unnecessarily, so we don't wait and we rely on the second mechanism. The response to the first base table write (the one causing increase in the backlog) carries the increased backlog to the coordinator of this write. So for the second write to observe the increased backlog, it needs to be coordinated on the same node+shard as the first write. We make sure that both writes are coordinated on the same node+shard by using prepared statements combined with setting the host in `run_async`. Both writes target the same partition and with prepared statements we route them directly to the correct shard. That was the idea, at least. In practice, for the driver to learn the correct shard, it first needs to learn the token->shard mapping from the server. For vnodes it can expect a shard by calculating the token of the affected partition, but for tablets, it had no opportunity to learn the tablet->shard mapping so the first write may route to any shard. Additionally, we aren't guaranteed that the driver established connections to all shards on all nodes at the point of any write. So if a connection finishes establishing between the two writes, this may also cause us to coordinate these 2 writes on different shards, leading to a missed view backlog growth and not-rejected second write. We fix this in this patch by running the test using one shard on each node. This way, as long as we perform both writes on the same node, they'll also be coordinated on the same shard. This also makes the prepared statement and BoundStatement unnecessary — we can use SimpleStatement with FallthroughRetryPolicy directly. Fixes: SCYLLADB-1901 Closes scylladb/scylladb#29862	2026-05-12 17:34:19 +02:00
Piotr Dulikowski	129f193116	Merge 'strong_consistency: implement basic coordinator metrics' from Michał Jadwiszczak Add per-shard metrics for strong consistency coordinator operations (latency, timeouts, bounces, status unknown) under the `"strong_consistency_coordinator"` category. These are analogous to the eventual consistency metrics in `storage_proxy_stats`, enabling direct performance comparison between the two consistency modes. The metrics are simplified compared to `storage_proxy_stats` — no breakdown by table, tablet, scheduling group, or DC, only per-shard. Fixes SCYLLADB-1343 Strong consistency is still in experimental phase, no need to backport. Closes scylladb/scylladb#29318 * github.com:scylladb/scylladb: test/strong_consistency: verify metrics strong_consistency: wire up metrics to operations strong_consistency: add stats struct and metrics registration	2026-05-12 16:15:51 +02:00
Botond Dénes	e95eb21a16	Merge 'Tablet-aware restore' from Pavel Emelyanov The mechanics of the restore is like this - A /storage_service/tablets/restore API is called with (keyspace, table, endpoint, bucket, manifests) parameters - First, it populates the system_distributed.snapshot_sstables table with the data read from the manifests - Then it emplaces a bunch of tablet transitions (of a new "restore" kind), one for each tablet - The topology coordinator handles the "restore" transition by calling a new RESTORE_TABLET RPC against all the current tablet replicas - Each replica handles the RPC verb by - Reading the snapshot_sstables table - Filtering the read sstable infos against current node and tablet being handled - Downloading and attaching the filtered sstables This PR includes system_distributed.snapshot_sstables table from @robertbindar and preparation work from @kreuzerkrieg that extracts raw sstables downloading and attaching from existing generic sstables loading code. This is first step towards SCYLLADB-197 and lacks many things. In particular - the API only works for single-DC cluster - the caller needs to "lock" tablet boundaries with min/max tablet count - not abortable - no progress tracking - sub-optimal (re-kicking API on restore will re-download everything again) - not re-attacheable (if API node dies, restoration proceeds, but the caller cannot "wait" for it to complete via other node) - nodes download sstables in maintenance/streaming sched gorup (should be moved to maintenance/backup) Other follow-up items: - have an actual swagger object specification for `backup_location` Closes #28436 Closes #28657 Closes #28773 Closes scylladb/scylladb#28763 * github.com:scylladb/scylladb: docs: Update topology_over_raft.md with `restore` transition kind test: Add test for backup vs migration race test: Restore resilience test sstables_loader: Fail tablet-restore task if not all sstables were downloaded sstables_loader: mark sstables as downloaded after attaching sstables_loader: return shared_sstable from attach_sstable db: add update_sstable_download_status method db: add downloaded column to snapshot_sstables db: extract snapshot_sstables TTL into class constant test: Add a test for tablet-aware restore tablets: Implement tablet-aware cluster-wide restore messaging: Add RESTORE_TABLET RPC verb sstables_loader: Add method to download and attach sstables for a tablet tablets: Add restore_config to tablet_transition_info sstables_loader: Add restore_tablets task skeleton test: Add rest_client helper to kick newly introduced API endpoint api: Add /storage_service/tablets/restore endpoint skeleton sstables_loader: Add keyspace and table arguments to manfiest loading helper sstables_loader_helpers: just reformat the code sstables_loader_helpers: generalize argument and variable names sstables_loader_helpers: generalize get_sstables_for_tablet sstables_loader_helpers: add token getters for tablet filtering sstables_loader_helpers: remove underscores from struct members sstables_loader: move download_sstable and get_sstables_for_tablet sstables_loader: extract single-tablet SST filtering sstables_loader: make download_sstable static sstables_loader: fix formating of the new `download_sstable` function sstables_loader: extract single SST download into a function sstables_loader: add shard_id to minimal_sst_info sstables_loader: add function for parsing backup manifests split utility functions for creating test data from database_test export make_storage_options_config from lib/test_services rjson: Add helpers for conversions to dht::token and sstable_id Add system_distributed_keyspace.snapshot_sstables add get_system_distributed_keyspace to cql_test_env code: Add system_distributed_keyspace dependency to sstables_loader storage_service: Export export handle_raft_rpc() helper storage_service: Export do_tablet_operation() storage_service: Split transit_tablet() into two tablets: Add braces around tablet_transition_kind::repair switch	2026-05-12 16:24:13 +03:00
Yaniv Michael Kaul	c359a09189	test: add UDF/UDA keyspace isolation and UDT tests Port 3 tests from scylla-dtest user_functions_test.py: - test_udf_with_udt: UDF taking frozen UDT arg, verifies DROP TYPE blocked - test_udf_with_udt_keyspace_isolation: cross-keyspace UDT references rejected - test_aggregate_with_udt_keyspace_isolation: cross-keyspace UDT in UDA rejected All tests use Lua (Scylla's supported UDF language). Reproduces CASSANDRA-9409. Closes scylladb/scylladb#1928 Closes scylladb/scylladb#29843	2026-05-12 14:57:14 +03:00
Yaniv Michael Kaul	f55a55fbf3	docker: fix coredump collection when host uses pipe-based core_pattern The container image inherits kernel.core_pattern from the host. When the host pipes core dumps to a handler (e.g. Ubuntu's apport), that handler does not exist or work correctly inside the container, so core dumps are silently lost. Override any pipe-based core_pattern with a file-based pattern that writes directly to /var/lib/scylla/coredump/. The override is attempted both from the entrypoint (scyllasetup.coredumpSetup) and from scylla-server.sh when running as root; it succeeds only when the container has write access to /proc/sys/kernel/core_pattern and is silently skipped otherwise. Fixes: SCYLLADB-1366 Closes scylladb/scylladb#29337	2026-05-12 14:16:22 +03:00
Piotr Smaron	1018710e38	test/cqlpy: un-xfail oversized indexed value build test Issue #8627 is fixed, so test_too_large_indexed_value_build now passes and should run normally instead of XPASSing under strict xfail. Fixes: SCYLLADB-1938 Closes scylladb/scylladb#29853	2026-05-12 11:40:53 +02:00
Avi Kivity	ddb1181103	Merge 'load_balance: fix drain with forced capacity-based balancing' from Ferenc Szili When `force_capacity_based_balancing` is enabled and a node is being drained/excluded, the tablet allocator incorrectly aborts balancing due to incomplete tablet stats - even though capacity-based balancing doesn't depend on tablet sizes. The tablet allocator normally waits for complete load stats before balancing. An exception exists for drained+excluded nodes (they're unreachable and won't return stats). However, when forced capacity-based balancing is active, this exception was not being applied, causing the balancer to reject the drain plan. Adjust the condition in `tablet_allocator.cc` so that the "ignore missing data for drained nodes" logic applies regardless of whether capacity-based balancing is forced. Added a Boost unit test that forces capacity-based balancing and verifies a drained/excluded node gets its tablets migrated even when tablet size stats are missing. This bug was introduced in 2026.1, so this needs to be backported to 2026.1 and 2026.2 Fixes: SCYLLADB-1803 Closes scylladb/scylladb#29791 * github.com:scylladb/scylladb: test: boost: add drain test for forced capacity-based balancing service: allow draining with forced capacity-based balancing	2026-05-12 12:38:25 +03:00
Andrzej Jackowski	89261bf759	test: wait for TTL scheduling sanity metric The test samples sl:default runtime before and after setup writes to prove that it measures the scheduling group used by regular CQL writes. The metric is exported in milliseconds, so a single 200-row batch may not be visible immediately, or may be too small in some environments. Keep the original 200-row table size, but wait up to 30 seconds for the metric to advance. If it does not, retry the same writes before TTL is enabled. The retries update the same keys, so the expiration part of the test still waits for exactly the original number of rows. In a local 100-run with N=200 rows, the observed delta of `ms_statement_before - ms_statement_before_write` was: min=4.0, max=16.0, mean=8.13, and median=8.0. Therefore, it looks possible that in a rare corner case the delta drops even to 0. Fixes SCYLLADB-1869 Closes scylladb/scylladb#29797	2026-05-12 12:38:25 +03:00
Avi Kivity	6fca064ac8	Merge 'alternator: a couple of small cleanups suggested by copilot' from Nadav Har'El The first patch improves the input validation of the CONTAINS operator. I believe this is not a critical fix, because RapidJSON already has exception-throwing RAPIDJSON_ASSERT() that check for unexpected JSON structure (like something we expect to be a list isn't actually a list), but it's cleaner to do these checks explicitly. The second patch just removes an unnecessary call to format() on a constant string. Closes scylladb/scylladb#28506 * github.com:scylladb/scylladb: alternator: remove unneeded call to format() alternator: improve CONTAINS operator's validity checking	2026-05-12 12:38:25 +03:00
Botond Dénes	8d6f031a4a	schema: fix DESCRIBE showing NullCompactionStrategy when compaction is disabled When a table's compaction is disabled via 'enabled': 'false', the DESCRIBE output incorrectly showed NullCompactionStrategy instead of the actual strategy. This happened because schema_properties() called compaction_strategy(), which returns compaction_strategy_type::null when compaction is disabled. Fix it by using configured_compaction_strategy(), which always returns the real strategy type - consistent with how schema_tables.cc serializes it to disk. Fixes SCYLLADB-1353 Closes scylladb/scylladb#29804	2026-05-12 12:38:25 +03:00
Piotr Dulikowski	7c2b1ea0b5	Merge 'view_building: fix tombstone_warn_threshold warnings' from Michał Jadwiszczak `system.view_building_tasks` is a single-partition Raft group0 table (pk = `"view_building"`, CK = timeuuid). When `clean_finished_tasks()` deletes hundreds of finished tasks, the physical rows remain in SSTables until compaction. Any subsequent read of the partition counts every column of every tombstoned row as a dead cell, triggering `tombstone_warn_threshold` warnings in large clusters. Two-part fix: 1. Range tombstones instead of row tombstones (commits 2–3) Instead of one row tombstone per finished task, find the minimum alive task UUID (`min_alive_uuid`) and emit a single range tombstone `[before_all, min_alive_uuid)` covering all tasks below that boundary. This reduces the tombstone count significantly and also benefits future compaction. 2. Bounded scan with `min_task_id` (commits 4–6) Even with range tombstones, physical rows remain until compaction and still count as dead cells during reads. The only way to avoid them is to not read them at all. - Add a `min_task_id timeuuid` static column to `system.view_building_tasks`. - On every GC, write `min_task_id = min_alive_uuid` atomically with the range tombstone (same Raft batch). - On reload, read `min_task_id` first using a static-only partition slice (empty `_row_ranges` + `always_return_static_content`): the SSTable reader stops immediately after the static row before processing any clustering tombstones — zero dead cells counted. - Use `AND id >= min_task_id` as a lower bound for the main task scan, skipping all tombstoned rows. The static-only read and the bounded scan are gated on the `VIEW_BUILDING_TASKS_MIN_TASK_ID` cluster feature so mixed-version clusters fall back to the full scan. The issue is not critical, so the fix shouldn't be backported. Fixes SCYLLADB-657 Closes scylladb/scylladb#28929 * github.com:scylladb/scylladb: test/cluster/test_view_building_coordinator: add reproducer for tombstone threshold warning docs: document tombstone avoidance in view_building_tasks view_building: add `task_uuid_generator` to `view_building_task_mutation_builder` view_building: introduce `task_uuid_generator` view_building: store `min_alive_uuid` in view building state view_building: set min_task_id when GC-ing finished tasks view_building: add min_task_id support to view_building_task_mutation_builder view_building: add min_task_id static column and bounded scan to system_keyspace view_building: use range tombstone when GC-ing finished tasks view_building: add range tombstone support to view_building_task_mutation_builder view_building: introduce VIEW_BUILDING_TASKS_MIN_TASK_ID cluster feature	2026-05-12 12:38:25 +03:00
Avi Kivity	cf50f0191a	encryption: fix deprecated input_stream/output_stream usage in KMIP connection Seastar deprecated default-constructing input_stream and output_stream (they are useless in that state), and also deprecated move-assigning them after the fact. Fix by wrapping both fields in std::optional, and using emplace() to construct them in-place once the connected socket is available. It would be nicer to make connect() a static method that returns a connection, but that's a larger change. Closes scylladb/scylladb#29627	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	1c0f8ab66e	Merge 'sstables: introduce --abort-on-malformed-sstable-error' from Botond Dénes When a malformed sstable error occurs, it is usually caused by actual sstable corruption — a cosmic ray, a bad disk write, etc. However, it can also be caused by memory corruption, where a data structure in memory happens to be read as sstable data. In the latter case, having a coredump of the process at the moment of the error is invaluable for post-mortem debugging, since the exception throwing/catching machinery destroys the stack frames that would point to the corruption site. This patch series introduces `--abort-on-malformed-sstable-error`, a new command-line option (with `LiveUpdate` support) that, when set, causes the server to call `std::abort()` instead of throwing an exception whenever any sstable parse error is detected. This covers all code paths: - Direct `throw malformed_sstable_exception(...)` sites (migrated to `throw_malformed_sstable_exception()`) - Direct `throw bufsize_mismatch_exception(...)` sites (migrated to `throw_bufsize_mismatch_exception()`) - `parse_assert()` failures (via `on_parse_error()`) - BTI parse errors (via `on_bti_parse_error()`) The implementation places the flag and helper functions in `sstables/sstables.cc`, next to the existing `on_parse_error()` / `on_bti_parse_error()` infrastructure. The flag defaults to `false`, preserving current behaviour. It is intended to be enabled temporarily when investigating suspected memory corruption. Commit breakdown: 1. Infrastructure: flag, getter/setter, and throw helpers in `sstables/sstables.cc`; config option wired up in `main.cc` 2. `on_parse_error()` and `on_bti_parse_error()` check the new flag 3. All ~50 `throw malformed_sstable_exception(...)` sites migrated 4. Both `throw bufsize_mismatch_exception(...)` sites migrated Refs: SCYLLADB-1087 Backport: new feature, no backport Closes scylladb/scylladb#29324 * github.com:scylladb/scylladb: sstables: migrate all bufsize_mismatch_exception throw sites to throw_bufsize_mismatch_exception() sstables: migrate all malformed_sstable_exception throw sites to throw_malformed_sstable_exception() sstables: make on_parse_error() and on_bti_parse_error() respect --abort-on-malformed-sstable-error sstables: disable abort-on-malformed-sstable-error in tests that corrupt sstables on purpose sstables: introduce --abort-on-malformed-sstable-error infrastructure sstables: refactor parse_path() to return std::expected<> instead of throwing	2026-05-12 12:38:25 +03:00
Pavel Emelyanov	150345cc52	Merge 'test: per-bucket isolation for S3/GCS object storage tests' from Ernest Zaslavsky This series adds per-test bucket isolation to all S3 and GCS object storage tests. Previously, every test shared a single pre-created bucket, which meant tests could interfere with each other through leftover objects and could not run concurrently across multiple `test.py` processes without risking collisions. New `create_bucket`, `delete_bucket`, and `delete_bucket_with_objects` methods on `s3::client`, following the existing `make_request` pattern. `create_bucket` handles the `BUCKET_ALREADY_OWNED_BY_YOU` error gracefully. A new `s3_test_fixture` RAII class for C++ Boost tests that creates a uniquely-named bucket on construction (derived from the Boost test name and pid) and tears down everything — objects, bucket, client — on destruction. All S3 tests in `s3_test.cc` are migrated to use it, removing manual `deferred_delete_object` and `deferred_close` boilerplate. The minio server policy is broadened to allow dynamic bucket creation/deletion. A `client::make` overload that accepts a custom `retry_strategy`, used in tests with a fast 1ms retry delay instead of exponential backoff, significantly reducing test runtime for transient errors during bucket lifecycle operations. Python-side (`test/cluster/object_store`): each pytest fixture (`object_storage`, `s3_storage`, `s3_server`) now creates a unique bucket per test function via `create_test_bucket()` and destroys it on teardown. Bucket names are sanitized from the pytest node name with a short UUID suffix for uniqueness. Object storage helpers (`S3Server`, `MinioWrapper`, `GSFront`, `GSServerImpl`, factory functions, CQL helpers, `s3_server` fixture) are extracted from `test/cluster/object_store/conftest.py` into a shared `test/pylib/object_storage.py` module, eliminating duplication across test suites. The conftest becomes a thin re-export wrapper. Old class names are preserved as aliases for backward compatibility. \| Test Name \| new test specific retry strategy execution time (ms) \| original execution time (ms) \| Δ (ms) \| Speedup \| \|--------------------------------------------------------------\|----------------:\|-------------:\|---------:\|--------:\| \| test_client_upload_file_multi_part_with_remainder_proxy \| 19,261 \| 61,395 \| −42,134 \| 3.2× \| \| test_client_upload_file_multi_part_without_remainder_proxy \| 16,901 \| 53,688 \| −36,787 \| 3.2× \| \| test_client_upload_file_single_part_proxy \| 3,478 \| 6,789 \| −3,311 \| 2.0× \| \| test_client_multipart_copy_upload_proxy \| 1,303 \| 1,619 \| −316 \| 1.2× \| \| test_client_put_get_object_proxy \| 150 \| 365 \| −215 \| 2.4× \| \| test_client_readable_file_stream_proxy \| 125 \| 327 \| −202 \| 2.6× \| \| test_small_object_copy_proxy \| 205 \| 389 \| −184 \| 1.9× \| \| test_client_put_get_tagging_proxy \| 181 \| 350 \| −169 \| 1.9× \| \| test_client_multipart_upload_proxy \| 1,252 \| 1,416 \| −164 \| 1.1× \| \| test_client_list_objects_proxy \| 729 \| 881 \| −152 \| 1.2× \| \| test_chunked_download_data_source_with_delays_proxy \| 830 \| 960 \| −130 \| 1.2× \| \| test_client_readable_file_proxy \| 148 \| 279 \| −131 \| 1.9× \| \| test_client_upload_file_multi_part_with_remainder_minio \| 3,358 \| 3,170 \| +188 \| 0.9× \| \| test_client_upload_file_multi_part_without_remainder_minio \| 3,131 \| 2,929 \| +202 \| 0.9× \| \| test_client_upload_file_single_part_minio \| 519 \| 421 \| +98 \| 0.8× \| \| test_download_data_source_proxy \| 180 \| 237 \| −57 \| 1.3× \| \| test_client_list_objects_incomplete_proxy \| 590 \| 641 \| −51 \| 1.1× \| \| test_large_object_copy_proxy \| 952 \| 991 \| −39 \| 1.0× \| \| test_client_multipart_upload_fallback_proxy \| 148 \| 185 \| −37 \| 1.3× \| \| test_client_multipart_copy_upload_minio \| 641 \| 674 \| −33 \| 1.1× \| No backport needed — this is a test infrastructure improvement with no production code impact beyond the new `s3::client` methods. Closes scylladb/scylladb#29508 * github.com:scylladb/scylladb: test: extract object storage helpers to test/pylib/object_storage.py test: add per-test bucket isolation to object_store fixtures s3: add client::make overload with custom retry strategy test: add s3_test_fixture and migrate tests to per-bucket isolation s3: add create_bucket and delete_bucket to client	2026-05-12 12:38:24 +03:00
Dimitrios Symonidis	94bc0245f9	sstables, utils/s3: reuse caller-provided file in s3_storage::make_source s3_storage::make_source previously ignored its file f parameter and constructed a fresh s3::client::readable_file per call. The new file's _stats cache was empty, so the first dma_read_bulk issued a HEAD via maybe_update_stats just to learn the object size before the ranged GET -- one ~50 ms RTT per uncached read. The file f passed in by the two callers (sstable::data_stream for Data.db reads and index_reader::make_context for Index.db reads) already wraps the sstable's _data_file or _index_file. Those file objects had their stats populated at sstable open time by update_info_for_opened_data, and they were wrapped with the configured file_io_extensions when opened via open_component. Reusing them is exactly what filesystem_storage::make_source does (one-line make_file_data_source over f), so the s3 path simply matches it. readable_file::size() is also updated to route through maybe_update_stats(), so a .size() call populates the _stats cache the same way .stat() does -- preventing a redundant HEAD on the first subsequent read of components opened with .size() (Index, Partitions, Rows in update_info_for_opened_data). Closes scylladb/scylladb#29766 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 12:38:24 +03:00
Pavel Emelyanov	896de77b99	docs: Update topology_over_raft.md with `restore` transition kind Add some text about how the new transition works. It doesn't include full feature description, just concentrates on the new transition and the way it interacts with the rest of topology coordinator machinery. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	19820910f8	test: Add test for backup vs migration race The test starts regular backup+restore on a smaller cluster, but prior to it spawns tablet migration from one node to another and locks it in the middle with the help of block_tablet_streaming injection (even though tablets have no data and there's nothing to stream, the injection is located early enough to work). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	3bcefa42c5	test: Restore resilience test The test checks that losing one of nodes from the cluster while restore is handled. In particular: - losing an API node makes the task waiting API to throw (apparently) - losing coordinator or replica node makes the API call to fail, because some tablets should fail to get restored. If the coordinator is lost, it triggers coordinator re-election and new coordinator still notices that a tablet that was replicated to "old" coordinator failed to get restored and fails the restore anyway Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Pavel Emelyanov	69b8f76a32	sstables_loader: Fail tablet-restore task if not all sstables were downloaded When the storage_service::restore_tablets() resolves, it only means that tablet transitions are done, including restore transitions, but not necessarily that they succeeded. So before resolving the restoration task with success need to check if all sstables were downloaded and, if not, resolve the task with exception. Test included. It uses fault-injection to abort downloading of a single sstable early, then checks that the error was properly propagated back to the task waiting API Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:24 +03:00
Ernest Zaslavsky	bdc5976bcd	sstables_loader: mark sstables as downloaded after attaching After each SSTable is successfully attached to the local table in download_tablet_sstables(), update its downloaded status in system_distributed.snapshot_sstables to true. This enables tracking restore progress by counting how many SSTables have been downloaded.	2026-05-12 10:40:24 +03:00
Ernest Zaslavsky	0d8de9becd	sstables_loader: return shared_sstable from attach_sstable Change attach_sstable() return type from future<> to future<sstables::shared_sstable>, returning the SSTable that was attached. This will be used to extract the SSTable identifier and first token for updating the download status.	2026-05-12 10:40:24 +03:00
Ernest Zaslavsky	7eb921a142	db: add update_sstable_download_status method Add a method to update the downloaded status of a specific SSTable entry in system_distributed.snapshot_sstables. This will be used by the tablet restore process to mark SSTables as downloaded after they have been successfully attached to the local table.	2026-05-12 10:40:23 +03:00
Ernest Zaslavsky	83ec7e22b9	db: add downloaded column to snapshot_sstables Add a 'downloaded' boolean column to the snapshot_sstables table schema and the corresponding field to the snapshot_sstable_entry struct. Update insert_snapshot_sstable() and get_snapshot_sstables() to write and read this column. This column will be used to track which SSTables have been successfully downloaded during a tablet restore operation. Co-authored-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Ernest Zaslavsky	61c627a7c0	db: extract snapshot_sstables TTL into class constant Move the TTL value used for snapshot_sstables rows from a local variable in insert_snapshot_sstable() to a class-level constant SNAPSHOT_SSTABLES_TTL_SECONDS, making it reusable by other methods.	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	4137211cf4	test: Add a test for tablet-aware restore The test is derived from test_restore_with_streaming_scopes() one, with the excaption that it doesn't check for streaming directions, doesn't check mutations right after creation and doesn't loop over scoped sub-tests, because there's no scope concept here. Also it verifies just two topologies, it seems to be enough. The scopes test has many topologies because of the nature of the scoped restore, with cluster-wide restore such flexibility is not required. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	17384d42e3	tablets: Implement tablet-aware cluster-wide restore This patch adds - Changes in sstables_loader::restore_tablets() method It populates the system_distributed_keyspace.snapshot_sstables table with the information read from the manifest - Implementation of tablet_restore_task_impl::run() method It emplaces a bunch of tablet migrations with "restore" kind - Topology coordinator handling of tablet_transition_stage::restore When seen, the coordinator calls RESTORE_TABLET RPC against all tablet replicas Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	39ae59da9c	messaging: Add RESTORE_TABLET RPC verb The topology coordinator will need to call this verb against existing tablet replicas to ask them restore tablet sstables. Here's the RPC verb to do it. It now returns an empty restore_result to make it "synchronous" -- the co_await send_restore_tablets() won't resolve until client call finishes. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	8514b73f4b	sstables_loader: Add method to download and attach sstables for a tablet Extracts the data from snapshot_sstables tables and filters only sstables belonging to current node and tablet in question, then starts downloading the matched sstables Extracted from Ernest PR #28701 and piggy-backs the refactoring from another Ernest PR #28773. Will be used by next patches. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	cf21471391	tablets: Add restore_config to tablet_transition_info When doing cluster-wide restore using topology coordinator, the coordinator will need to serve a bunch of new tablet transition kinds -- the restore one. For that, it will need to receive information about from where to perform the restore -- the endpoint and bucket pair. This data can be grabbed from nowhere but the tablet transition itself, so add the "restore_config" member with this data. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	2eaa9035df	sstables_loader: Add restore_tablets task skeleton The new cluster-wide tablets restore API is going to be asynchronous, just like existing node-local one is. For that the task_manager tasks will be used. This patch adds a skeleton for tablets-restore task with empty run method. Next patches will populate it. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	dcd490666b	test: Add rest_client helper to kick newly introduced API endpoint Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Ernest Zaslavsky	5f235e105a	api: Add /storage_service/tablets/restore endpoint skeleton Withdrawn from #28701. The endpoint implementation from the PR is going to be reworked, but the swagger description and set/unset placeholders are very useful. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Co-authored-by: Ernest Zaslavsky <ernest.zaslavsky@scylladb.com>	2026-05-12 10:40:23 +03:00
Pavel Emelyanov	d280987f2c	sstables_loader: Add keyspace and table arguments to manfiest loading helper When restoring a backup into a keyspace under a different name, than the one at which it existed during backup, the snapshot_sstables table must be populated with the _new_ keyspace name, not the one taken from manifest. Same is true for table name. This patch makes it possible to override keyspace/table loaded from manifest file with the provided values. in the future it will also be good to check that if those values are not provided by user, then values read from different manifest files are the same. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:23 +03:00
Ernest Zaslavsky	e0f4813c2f	sstables_loader_helpers: just reformat the code Reformat get_sstables_for_tablet to wrap extremely long line	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	19554466f6	sstables_loader_helpers: generalize argument and variable names Rename arguments and local variables in get_sstables_for_tablet to avoid references to SSTable-specific terminology. This makes the function more generic and better suited for reuse with different range types.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	2e37f9dc90	sstables_loader_helpers: generalize get_sstables_for_tablet Generalize get_sstables_for_tablet by templating the return type so it produces vectors matching the input range’s value type. This makes the function more flexible and prepares it for reuse in tablet‑aware restore.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	17b415ccde	sstables_loader_helpers: add token getters for tablet filtering Add getters for the first and last tokens in get_sstables_for_tablet to make the function more generic and suitable for future use in the tablet-aware restore code.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	1150f7cf24	sstables_loader_helpers: remove underscores from struct members Remove underscores from minimal_sst_info struct members to comply with our coding guidelines.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	aa00048753	sstables_loader: move download_sstable and get_sstables_for_tablet Move the download_sstable and get_sstables_for_tablet static functions from sstables_loader into a new file to make them reusable by the tablet-aware restore code.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	991576ed73	sstables_loader: extract single-tablet SST filtering Extract single-tablet range filtering into a new get_sstables_for_tablet function, taken from the existing get_sstables_for_tablets. This will later be reused in the tablet-aware restore code.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	b0f6cbb2a4	sstables_loader: make download_sstable static Make the download_sstable function static to prepare it for extraction as a helper function that will later be reused in tablet-aware restore.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	60dd7de4b8	sstables_loader: fix formating of the new `download_sstable` function Just fix formatting of the new `download_sstable` function	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	9efc658bdd	sstables_loader: extract single SST download into a function Extract the logic for downloading a single SST into a dedicated function and reuse it in download_fully_contained_sstables. This supports upcoming changes that consolidate common code.	2026-05-12 10:40:22 +03:00
Ernest Zaslavsky	fd2043cad8	sstables_loader: add shard_id to minimal_sst_info Add a shard_id member to the minimal_sst_info struct as part of the tablet-aware restore refactoring. This will support upcoming changes that extract common code.	2026-05-12 10:40:22 +03:00
Robert Bindar	c97232bb7b	sstables_loader: add function for parsing backup manifests This change adds functionality for parsing backup manifests and populating system_distributed.snapshot_sstables with the content of the manifests. This change is useful for tablet-aware restore. The function introduced here will be called by the coordinator node when restore starts to populate the snapshot_sstables table with the data that workers need to execute the restore process. Signed-off-by: Robert Bindar <robert.bindar@scylladb.com> Co-authored-by: Pavel Emelyanov <xemul@scylladb.com>	2026-05-12 10:40:22 +03:00
Robert Bindar	f0e8d6c9dd	split utility functions for creating test data from database_test Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00
Robert Bindar	b52e40e512	export make_storage_options_config from lib/test_services Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00
Robert Bindar	9c3abbb8f5	rjson: Add helpers for conversions to dht::token and sstable_id Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00
Robert Bindar	2f19d84ad7	Add system_distributed_keyspace.snapshot_sstables This patch adds the snapshot_sstables table with the following schema: ```cql CREATE TABLE system_distributed.snapshot_sstables ( snapshot_name text, keyspace text, table text, datacenter text, rack text, id uuid, first_token bigint, last_token bigint, toc_name text, prefix text) PRIMARY KEY ((snapshot_name, keyspace, table, datacenter, rack), first_token, id); ``` The table will be populated by the coordinator node during the restore phase (and later on during the backup phase to accomodate live-restore). The content of this table is meant to be consumed by the restore worker nodes which will use this data to filter and file-based download sstables. Fixes SCYLLADB-263 Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>	2026-05-12 10:40:21 +03:00

1 2 3 4 5 ...

53837 Commits