Commit Graph

138 Commits

Author SHA1 Message Date
Michał Jadwiszczak
25c176c1b4 sstables_loader: fix missing include
Commit c97232b introduced use of `seastar::util::read_entire_stream()`,
however it didn't included relevant header which is causing compilation
error.

It probably went silently through CI because of precompiled headers.

Refs scylladb#28763

Closes scylladb/scylladb#29901
2026-05-14 15:16:34 +02:00
Pavel Emelyanov
3bcefa42c5 test: Restore resilience test
The test checks that losing one of nodes from the cluster while restore
is handled. In particular:

- losing an API node makes the task waiting API to throw (apparently)
- losing coordinator or replica node makes the API call to fail, because
  some tablets should fail to get restored. If the coordinator is lost,
  it triggers coordinator re-election and new coordinator still notices
  that a tablet that was replicated to "old" coordinator failed to get
  restored and fails the restore anyway

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:24 +03:00
Pavel Emelyanov
69b8f76a32 sstables_loader: Fail tablet-restore task if not all sstables were downloaded
When the storage_service::restore_tablets() resolves, it only means that
tablet transitions are done, including restore transitions, but not
necessarily that they succeeded. So before resolving the restoration
task with success need to check if all sstables were downloaded and, if
not, resolve the task with exception.

Test included. It uses fault-injection to abort downloading of a single
sstable early, then checks that the error was properly propagated back
to the task waiting API

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:24 +03:00
Ernest Zaslavsky
bdc5976bcd sstables_loader: mark sstables as downloaded after attaching
After each SSTable is successfully attached to the local table in
download_tablet_sstables(), update its downloaded status in
system_distributed.snapshot_sstables to true. This enables tracking
restore progress by counting how many SSTables have been downloaded.
2026-05-12 10:40:24 +03:00
Ernest Zaslavsky
0d8de9becd sstables_loader: return shared_sstable from attach_sstable
Change attach_sstable() return type from future<> to
future<sstables::shared_sstable>, returning the SSTable that was
attached. This will be used to extract the SSTable identifier and
first token for updating the download status.
2026-05-12 10:40:24 +03:00
Pavel Emelyanov
17384d42e3 tablets: Implement tablet-aware cluster-wide restore
This patch adds

- Changes in sstables_loader::restore_tablets() method

It populates the system_distributed_keyspace.snapshot_sstables table
with the information read from the manifest

- Implementation of tablet_restore_task_impl::run() method

It emplaces a bunch of tablet migrations with "restore" kind

- Topology coordinator handling of tablet_transition_stage::restore

When seen, the coordinator calls RESTORE_TABLET RPC against all tablet
replicas

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:23 +03:00
Pavel Emelyanov
39ae59da9c messaging: Add RESTORE_TABLET RPC verb
The topology coordinator will need to call this verb against existing
tablet replicas to ask them restore tablet sstables. Here's the RPC verb
to do it.

It now returns an empty restore_result to make it "synchronous" -- the
co_await send_restore_tablets() won't resolve until client call
finishes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:23 +03:00
Pavel Emelyanov
8514b73f4b sstables_loader: Add method to download and attach sstables for a tablet
Extracts the data from snapshot_sstables tables and filters only
sstables belonging to current node and tablet in question, then starts
downloading the matched sstables

Extracted from Ernest PR #28701 and piggy-backs the refactoring from
another Ernest PR #28773. Will be used by next patches.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:23 +03:00
Pavel Emelyanov
2eaa9035df sstables_loader: Add restore_tablets task skeleton
The new cluster-wide tablets restore API is going to be asynchronous,
just like existing node-local one is. For that the task_manager tasks
will be used.

This patch adds a skeleton for tablets-restore task with empty run
method. Next patches will populate it.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:23 +03:00
Pavel Emelyanov
d280987f2c sstables_loader: Add keyspace and table arguments to manfiest loading helper
When restoring a backup into a keyspace under a different name, than the
one at which it existed during backup, the snapshot_sstables table must
be populated with the _new_ keyspace name, not the one taken from
manifest. Same is true for table name.

This patch makes it possible to override keyspace/table loaded from
manifest file with the provided values. in the future it will also be
good to check that if those values are not provided by user, then values
read from different manifest files are the same.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:23 +03:00
Ernest Zaslavsky
17b415ccde sstables_loader_helpers: add token getters for tablet filtering
Add getters for the first and last tokens in get_sstables_for_tablet to
make the function more generic and suitable for future use in the
tablet-aware restore code.
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
1150f7cf24 sstables_loader_helpers: remove underscores from struct members
Remove underscores from minimal_sst_info struct members to comply with
our coding guidelines.
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
aa00048753 sstables_loader: move download_sstable and get_sstables_for_tablet
Move the download_sstable and get_sstables_for_tablet static functions
from sstables_loader into a new file to make them reusable by the
tablet-aware restore code.
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
991576ed73 sstables_loader: extract single-tablet SST filtering
Extract single-tablet range filtering into a new
get_sstables_for_tablet function, taken from the existing
get_sstables_for_tablets. This will later be reused in the
tablet-aware restore code.
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
b0f6cbb2a4 sstables_loader: make download_sstable static
Make the download_sstable function static to prepare it for extraction
as a helper function that will later be reused in tablet-aware restore.
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
60dd7de4b8 sstables_loader: fix formating of the new download_sstable function
Just fix formatting of the new `download_sstable` function
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
9efc658bdd sstables_loader: extract single SST download into a function
Extract the logic for downloading a single SST into a dedicated
function and reuse it in download_fully_contained_sstables. This
supports upcoming changes that consolidate common code.
2026-05-12 10:40:22 +03:00
Ernest Zaslavsky
fd2043cad8 sstables_loader: add shard_id to minimal_sst_info
Add a shard_id member to the minimal_sst_info struct as part of the
tablet-aware restore refactoring. This will support upcoming changes
that extract common code.
2026-05-12 10:40:22 +03:00
Robert Bindar
c97232bb7b sstables_loader: add function for parsing backup manifests
This change adds functionality for parsing backup manifests
and populating system_distributed.snapshot_sstables with
the content of the manifests.
This change is useful for tablet-aware restore. The function
introduced here will be called by the coordinator node
when restore starts to populate the snapshot_sstables table
with the data that workers need to execute the restore process.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Co-authored-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:40:22 +03:00
Pavel Emelyanov
90ff7c5de3 code: Add system_distributed_keyspace dependency to sstables_loader
The loader will need to populate and read data from
system_distributed.snapshot_sstables table added recently, so this
dependency is truly needed.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2026-05-12 10:17:40 +03:00
Taras Veretilnyk
784127c40b sstables_loader: synchronously unlink streamed sstables before returning
mark_for_deletion() only set an in-memory flag; the actual file
deletion ran lazily when the last shared_sstable reference dropped,
leaving a window in which a follow-up scan of the upload directory
(e.g. a second 'nodetool refresh --load-and-stream') could observe a
partially-deleted sstable and fail with malformed_sstable_exception.

Force the unlink to complete before stream() returns. For tablet
streaming, partially-contained sstables span multiple per-tablet
batches, so a defer_unlinking flag postpones the unlink until after
all sstables are streamed; for vnodes and fully-contained sstables are streamed
only once and could be removed just after being streamed.

Added a FIXME on object_storage_base::wipe and strengthened the doc on storage::wipe to
make the never-fails contract explicit
2026-04-28 14:52:28 +02:00
Ernest Zaslavsky
e5e6608f20 sstables_loader: prevent use-after-free on table drop during streaming
sstables_loader::load_and_stream holds a replica::table& reference via
the sstable_streamer for the entire streaming operation.  If the table
is dropped concurrently (e.g. DROP TABLE or DROP KEYSPACE), the
reference becomes dangling and the next access crashes with SEGV.

This was observed in a longevity-50gb-12h-master test run where a
keyspace was dropped while load_and_stream was still streaming SSTables
from a previous batch.

Fix by acquiring a stream_in_progress() phaser guard in load_and_stream
before creating the streamer.  table::stop() calls
_pending_streams_phaser.close() which blocks until all outstanding
guards are released, keeping the table alive for the duration of the
streaming operation.

Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-1352

Closes scylladb/scylladb#29403
2026-04-20 07:39:51 +03:00
Pavel Emelyanov
4d352c7cf5 sstables: Remove ignore_component_digest_mismatch from sstable_open_config
The ignore_component_digest_mismatch flag is now initialized at sstable construction
time from sstables_manager::config (which is populated from db::config at boot time).
Remove the flag from sstable_open_config struct and all call sites that were setting
it explicitly.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-04-16 13:49:14 +03:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Raphael S. Carvalho
05b11a3b82 sstables_loader: use new sstable add path
Use add_new_sstable_and_update_cache() when attaching SSTables
downloaded by the node-scoped local loader.

This is the correct variant for new SSTables: it can unlink the
SSTable on failure to add it, and it can split the SSTable if a
tablet split is in progress. The older
add_sstable_and_update_cache() helper is intended for preexisting
SSTables that are already stable on disk.

Additionally, downloaded SSTables are now left unsealed (TemporaryTOC)
until they are successfully added to the table's SSTable set. The
download path (download_fully_contained_sstables) passes
leave_unsealed=true to create_stream_sink, and attach_sstable opens
the SSTable with unsealed_sstable=true and seals it only inside the
on_add callback — matching the pattern used by stream_blob.cc and
storage_service.cc for tablet streaming.

This prevents a data-resurrection hazard: previously, if the process
crashed between download and attach_sstable, or if attach_sstable
failed mid-loop, sealed (TOC) SSTables would remain in the table
directory and be reloaded by distributed_loader on restart. With
TemporaryTOC, sstable_directory automatically cleans them up on
restart instead.

Fixes  https://scylladb.atlassian.net/browse/SCYLLADB-1085.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#29072
2026-03-23 10:33:04 +03:00
Taras Veretilnyk
7214f5a0b6 sstables: propagate ignore_component_digest_mismatch config to all load sites
Add ignore_component_digest_mismatch option to db::config (default false).
When set, sstable loading logs a warning instead of throwing on component
digest mismatches, allowing a node to start up despite corrupted non-vital
components or bugs in digest calculation.

Propagate the config to all production sstable load paths:
- distributed_loader (node startup, upload dir processing)
- storage_service (tablet storage cloning)
- sstables_loader (load-and-stream, download tasks, attach)
- stream_blob (tablet streaming)
2026-03-10 19:24:05 +01:00
Ernest Zaslavsky
13fb605edb streaming: enable direct download of contained sstables
Instead of streaming fully contained sstables, download them directly
and attach them to their corresponding table. This simplifies the
process and avoids unnecessary streaming overhead.
2026-01-25 13:27:44 +02:00
Ernest Zaslavsky
757e9d0f52 streaming: keep sharded database reference on tablet_sstable_streamer 2026-01-21 16:40:12 +02:00
Ernest Zaslavsky
4ffa070715 streaming: skip streaming empty sstable sets
Avoid invoking streaming when the sstable set is empty. This prevents
unnecessary calls and simplifies the streaming logic.
2026-01-21 16:40:12 +02:00
Ernest Zaslavsky
0fcd369ef2 streaming: inline streamer instance creation
Remove the `make_sstable_streamer` function and inline its usage where
needed. This change allows passing different sets of arguments
directly at the call sites.
2026-01-21 16:40:12 +02:00
Pavel Emelyanov
fb32e1c7ee Merge 'streaming: tablet_sstable_streamer::stream refactoring' from Ernest Zaslavsky
Refactor the way we decide the sstable belong to a tablet, fully or partially to simplify the flow and make it more readable. Also extract the logic and make it testable, add tests to cover changes

The change is purely aesthetic, no need to backport

Closes scylladb/scylladb#27101

* github.com:scylladb/scylladb:
  streaming: remove unnecessary lambda creating sstable token range
  streaming: simplify get_sstables_for_tablets logic
  streaming: switch to range-based for loop
  streaming: drop sstable skip microoptimization in tablet loop
  streaming: replace reverse iterators with reverse view in sstables scan
  streaming: return from get_sstables_for_tablets earlier
  streaming: add get_sstables_by_tablet_range tests
  test,sstables: add helper to set sstable first and last keys
  streaming: refactor get_sstables_for_tablets to make it accessible
  streaming: refactor get_sstables_for_tablets to make it testable
  streaming: refactor tablet_sstable_streamer::stream by extracting SST filtering logic
2025-12-09 10:53:57 +03:00
Ernest Zaslavsky
71834ce7dd streaming: remove unnecessary lambda creating sstable token range
The `sstable_token_range` lambda was only used once to create a token
range for an SSTable. Inline the construction directly where needed,
removing the extra lambda. This simplifies the code without changing
behavior.
2025-12-08 12:30:24 +02:00
Ernest Zaslavsky
df21112c39 streaming: simplify get_sstables_for_tablets logic
Remove the use of the `overlaps` helper and unnest nested conditionals
in get_sstables_for_tablets. Straightforward `before` and `after` checks
are sufficient to decide how each SSTable should be handled.
2025-12-08 12:30:24 +02:00
Ernest Zaslavsky
bd339cc4d8 streaming: switch to range-based for loop
Replace the explicit iterator loop with a range-based for loop. This
simplifies the code, enforces constness, and avoids the unnecessary
use of postfix increment. The behavior remains unchanged,but
readability and maintainability are improved.
2025-12-08 12:30:23 +02:00
Ernest Zaslavsky
91bf23eea1 streaming: drop sstable skip microoptimization in tablet loop
Remove the microoptimization that advanced over SSTables ending before a
tablet range. This approach is misleading since SSTables are not sorted
by their end token, and the extra logic adds complexity with little to
no benefit. The streaming path here is not performance‑critical, so the
simpler loop is preferable.
2025-12-08 12:30:23 +02:00
Ernest Zaslavsky
f925ed176b streaming: replace reverse iterators with reverse view in sstables scan
Use a reverse view over the SSTables vector instead of reverse iterators.
This avoids awkward rbegin/rend usage and the mental overhead of tracking
inverted sort order. With a view, we can use standard begin/end iteration
while preserving the intended scan direction.
2025-12-08 12:30:23 +02:00
Ernest Zaslavsky
68dcd1b1b2 streaming: return from get_sstables_for_tablets earlier
Check if tablets or sstables list is empty and if so, return immediately
2025-12-08 12:30:23 +02:00
Ernest Zaslavsky
6ef7ad9b5a streaming: refactor get_sstables_for_tablets to make it accessible
Create `get_sstables_for_tablets_for_tests` friend free function
for testing purposes. Adding this free function allows
direct testing without requiring the full streamer context.
2025-12-08 12:30:23 +02:00
Ernest Zaslavsky
581b8ace83 streaming: refactor get_sstables_for_tablets to make it testable
Make the `get_sstables_for_tablets` member function `static`. This
is a step toward improved testability, allowing the function to be
invoked directly without requiring a full instance of the streamer.
2025-12-08 12:30:23 +02:00
Pavel Emelyanov
6c115c691f sstables_loader: Provide endpoint type for get_sstables_from_object_store()
Currently the method scans db::config to find one. It has some
drawbacks. First, it's not very nice. Second, it needs to handle the
case when the endpoint is missing, while it relally never is. Third, the
type in config entry is not necessarily set.

It's nicer to get the type from storage manager.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-12-02 11:18:32 +03:00
Ernest Zaslavsky
f0e2941e34 streaming: refactor tablet_sstable_streamer::stream by extracting SST filtering logic
Extract the SST filtering logic into a dedicated member function. This
prepares the code for independent testing without requiring the entire
streamer to be initialized.
2025-11-30 18:27:15 +02:00
Ernest Zaslavsky
1d5f60baac streaming:: add more logging
Start logging all missed streaming options like `scope`, `primary_replica` and `skip_reshape` flags

Fixes: https://github.com/scylladb/scylladb/issues/27299

Closes scylladb/scylladb#27311
2025-11-28 12:50:33 +01:00
Piotr Dulikowski
22f22d183f Merge 'Refine sstables_loader vs database dependency' from Pavel Emelyanov
There are two issues with it.

First, a small RPC helper struct carries database reference on board just to get feature state from it.
Second, sstable_streamer uses database as proxy to feature service.

This PR improves both.

Services dependencies improvement, not need to backport

Closes scylladb/scylladb#26989

* github.com:scylladb/scylladb:
  sstables_loader: Get LOAD_AND_STREAM_ABORT_RPC_MESSAGE from messaging
  sstables_loader: Keep bool on send_meta, not database reference
2025-11-20 16:13:16 +01:00
Ernest Zaslavsky
dedc8bdf71 streaming: fix loop break condition in tablet_sstable_streamer::stream
Correct the loop termination logic that previously caused
certain SSTables to be prematurely excluded, resulting in
lost mutations. This change ensures all relevant SSTables
are properly streamed and their mutations preserved.
2025-11-19 17:32:49 +02:00
Pavel Emelyanov
4d5f7a57ea sstables_loader: Get LOAD_AND_STREAM_ABORT_RPC_MESSAGE from messaging
The feature in question is about the way streaming sink-and-source
operate. Since sink-and-source itself are obtained from messaging
service, the feature is better be fetched from it too.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-19 09:35:54 +03:00
Pavel Emelyanov
64e099f03b sstables_loader: Keep bool on send_meta, not database reference
The send_meta helper needs database reference to get feature_service
from it (to check some feature state). That's too much, features are
"immutable" throug the loader lifetime, it's enough to keep the boolean
member on send_meta.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-11-19 09:32:56 +03:00
Robert Bindar
817fdadd49 Improve choice distribution for primary replica
I noticed during tests that `maybe_get_primary_replica`
would not distribute uniformly the choice of primary replica
because `info.replicas` on some shards would have an order whilst
on others it'd be ordered differently, thus making the function choose
a node as primary replica multiple times when it clearly could've
chosen a different nodes.

This patch sorts the replica set before passing it through the
scope filter.

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:18:01 +02:00
Robert Bindar
136b45d657 Enable scoped primary replica only streaming
This patch removes the restriction for streaming
to primary replica only within a scope.
Node scope streaming to primary replica is dissallowed.

Fixes #26584

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:18:01 +02:00
Robert Bindar
965a16ce6f Support primary_replica_only for native restore API
Current native restore does not support primary_replica_only, it is
hard-coded disabled and this may lead to data amplification issues.

This patch extends the restore REST API to accept a
primary_replica_only parameter and propagates it to
sstables_loader so it gets correctly passed to
load_and_stream.

Fixes #26584

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>

Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
2025-11-11 09:17:52 +02:00
Raphael S. Carvalho
7f34366b9d sstables_loader: Don't bypass synchronization with busy topology
The patch c543059f86 fixed the synchronization issue between tablet
split and load-and-stream. The synchronization worked only with
raft topology, and therefore was disabled with gossip.
To do the check, storage_service::raft_topology_change_enabled()
but the topology kind is only available/set on shard 0, so it caused
the synchronization to be bypassed when load-and-stream runs on
any shard other than 0.

The reason the reproducer didn't catch it is that it was restricted
to single cpu. It will now run with multi cpu and catch the
problem observed.

Fixes #22707

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#26730
2025-11-03 18:10:08 +01:00