Commit Graph

75 Commits

Author SHA1 Message Date
Pavel Emelyanov
cf1ffd6086 Merge 'sstables_loader: fix the racing between get_progress() and release_resources()' from Kefu Chai
This change addresses a critical race condition in the sstables_loader where `get_progress()` could access invalid `progress_holder` instances after `release_resources()` destroyed them.

Problem:
- Progress tracking uses two components: `_progress_state` (tracks state) and `_progress_per_shard` (sharded service with actual progress data)
- `get_progress()` first checks if `_progress_state` is initialized, then accumulates progress from `_progress_per_shard`
- As both functions are coroutines, `get_progress()` could be preempted after state check but before accessing `_progress_per_shard`
- If `release_resources()` runs during this preemption, it destroys the `progress_holder` instances in `_progress_per_shard`, causing `get_progress()` to access invalid memory.

Solution:
- Implemented shared/exclusive locking to protect access to both state and sharded progress data
- Multiple `get_progress()` calls can execute in parallel (shared access)
- `release_resources()` acquires exclusive access before modifying resources
- This prevents potential memory corruption and ensures consistent progress reporting

Fixes #23801

---

this change addresses a racing related to tracking the restore progress from S3 using scylla's native API, which is not used in production yet, hence no need to backport.

Closes scylladb/scylladb#23808

* github.com:scylladb/scylladb:
  sstables_loader: fix the indent
  sstables_loader: fix the racing between get_progress() and release_resources()
2025-05-05 15:45:15 +03:00
Raphael S. Carvalho
c77f710a0c sstables: Fix quadratic space complexity in partitioned_sstable_set
Interval map is very susceptible to quadratic space behavior when
it's flooded with many entries overlapping all (or most of)
intervals, since each such entry will have presence on all
intervals it overlaps with.

A trigger we observed was memtable flush storm, which creates many
small "L0" sstables that spans roughly the entire token range.

Since we cannot rely on insertion order, solution will be about
storing sstables with such wide ranges in a vector (unleveled).

There should be no consequence for single-key reads, since upper
layer applies an additional filtering based on token of key being
queried.
And for range scans, there can be an increase in memory usage,
but not significant because the sstables span an wide range and
would have been selected in the combined reader if the range of
scan overlaps with them.

Anyway, this is a protection against storm of memtable flushes
and shouldn't be the common scenario.

It works both with tablets and vnodes, by adjusting the token
range spanned by compaction group accordingly.

Fixes #23634.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2025-04-29 15:47:33 -03:00
Kefu Chai
a2b46cbf45 sstables_loader: fix the indent
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-04-22 12:05:55 +08:00
Kefu Chai
6b3ecad467 sstables_loader: fix the racing between get_progress() and release_resources()
This change addresses a critical race condition in the sstables_loader
where `get_progress()` could access invalid `progress_holder` instances
after `release_resources()` destroyed them.

Problem:
- Progress tracking uses two components: `_progress_state` (tracks state)
  and `_progress_per_shard` (sharded service with actual progress data)
- `get_progress()` first checks if `_progress_state` is initialized, then
  accumulates progress from `_progress_per_shard`
- As both functions are coroutines, `get_progress()` could be preempted
  after state check but before accessing `_progress_per_shard`
- If `release_resources()` runs during this preemption, it destroys the
  `progress_holder` instances in `_progress_per_shard`, causing
  `get_progress()` to access invalid memory.

Solution:
- Implemented shared/exclusive locking to protect access to both state
  and sharded progress data
- Multiple `get_progress()` calls can execute in parallel (shared access)
- `release_resources()` acquires exclusive access before modifying resources
- This prevents potential memory corruption and ensures consistent
  progress reporting

Fixes #23801

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-04-22 12:05:54 +08:00
Benny Halevy
7969293dcf sstables_loader: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Pavel Emelyanov
832d83ae4b sstables_loader: Do not stop sharded<progress_monitor> unconditionally
The member in question is unconditionally .stop()-ed in task's
release_resources() method, however, it may happen that the thing wasn't
.start()-ed in the first place. Start happens in the middle of the
task's .run() method and there can be several reasons why it can be
skipped -- e.g. the task is aborted early, or collecting sstables from
S3 throws.

fixes: #23231

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23483
2025-04-02 12:09:02 +03:00
Pavel Emelyanov
24e5c30cc8 sstables: Do not print component filenames on load-and-stream wrap-up
When load-and-stream finishes it may call sstable::unlink() method to
drop the loaded (and streamed) sstable. Before calling it it prints a
log message about its intention that includes component_filenames()
vector. This log message is ugly in several ways.

First, it prints only recognized components, while unlink() method
unlinks all of them, so it's sort of misleading (it doesn't seem that
anyone ever read this message IRL though)

Next, that's the only place that is _that_ verbose about sstable
unlinking. "Common" unlinking paths don't print that much info.

Finally, the log message happen in debug level, so it's hardly ever
appears in any logs, but collecting several filenames takes time.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-03-19 12:45:21 +03:00
Kefu Chai
b448fea260 sstable_loader: fix cross-shard resource cleanup in download_task_impl
Previously, download_task_impl's destructor would destroy per-shard progress
elements on whatever shard the task was destroyed on. In multi-shard
environments, this caused "shared_ptr accessed on non-owner cpu" errors when
attempting to free memory allocated on a different shard.

Fix by:
- Convert progress_per_shard into a sharded service
- Stop the service on owner shards during cleanup using coroutines
- Add operator+= to stream_progress to leverage seastar's built-in adder
  instead of a custom adder struct

Alternative approaches considered:

1. Using foreign_ptr: Rejected as it would require interface changes
   that complicate stream delegation. foreign_ptr manages the underlying
   pointee with another smart pointer but does not expose the smart
   pointer instance in its APIs, making it impossible to use
   shared_ptr<stream_progress> in the interface.
2. Using vector<stream_progress>: Rejected for similar interface
   compatibility reasons.

This solution maintains the existing interfaces while ensuring proper
cross-shard cleanup.

Fixes scylladb/scylladb#22759
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2025-02-14 11:13:58 +08:00
Kefu Chai
1ef2d9d076 tree: migrate from boost::adaptors::transformed to std::views::transform
Replace remaining uses of boost::adaptors::transformed with std::views::transform
to reduce Boost dependencies, following the migration pattern established in
bab12e3a. This change addresses recently merged code that reintroduced Boost
header dependencies through boost::adaptors::transformed usage.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#22365
2025-01-17 16:56:40 +02:00
Kefu Chai
d815d7013c sstables_loader: report progress with the unit of batch
We restore a snapshot of table by streaming the sstables of
the given snapshot of the table using
`sstable_streamer::stream_sstable_mutations()` in batches. This function
reads mutations from a set of sstables, and streams them to the target
nodes. Due to the limit of this function, we are not able to track the
progress in bytes.

Previously, progress tracking used individual sstables as units, which caused
inaccuracies with tablet-distributed tables, where:
- An sstable spanning multiple tablets could be counted multiple times
- Progress reporting could become misleading (e.g., showing "40" progress
  for a table with 10 sstables)

This change introduces a more robust progress tracking method:
- Use "batch" as the unit of progress instead of individual sstables.
  Each batch represents a tablet when restoring a table snapshot if
  the tablet being restored is distributed with tablets. When it comes
  to tables distributed with vnode, each batch represents an sstable.
- Stream sstables for each tablet separately, handling both partially and
  fully contained sstables
- Calculate progress based on the total number of sstables being streamed
- Skip tablet IDs with no owned tokens

For vnode-distributed tables, the number of "batches" directly corresponds
to the number of sstables, ensuring:
- Consistent progress reporting across different table distribution models
- Simplified implementation
- Accurate representation of restore progress

The new approach provides a more reliable and uniform method of tracking
restoration progress across different table distribution strategies.

Also, Corrected the use of `_sstables.size()` in
`sstable_streamer::stream_sstables()`. It addressed a review comment
from Pavel that was inadvertently overlooked during previous rebasing
the commit of 5ab4932f34.

Fixes scylladb/scylladb#21816

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21841
2025-01-13 09:04:35 +03:00
Pavel Emelyanov
960041d4b4 sstables_loader: Propagate scope from API down
Semi-mechanical change that adds newly introduced "scope" parameter to
all the functions between API methods and the low-level streamer object.
No real functional changes. API methods set it to "all" to keep existing
behavior.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-12-23 19:28:05 +03:00
Pavel Emelyanov
e8201a7897 sstables_loader: Filter tablets based on scope
Loading and streaming tablets has pre-filtering loop that walks the
tablet map sorts sstables into three lists:
- fully contained in one of map ranges
- partially overlapping with the map
- not intersecting with the map

Sstables from the 3rd list is immediately dropped from the process and
for the remaining two core load-and-stream happens.

This filtering deserves more care from the newly introduced scope. When
a tablet replica set doesn't get in the scope, the whole entry can be
disregarded, because load-and-stream will only do its "load" part anyway
and all mutations from it will be ignored.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-12-23 19:28:05 +03:00
Pavel Emelyanov
93aed22cd5 streamer: Disable scoped streaming of primary replica only
There's been some discussions of how primary replica only streaming
schould interact with the scope. There are two options how to consider
this combination:

- find where the primary replica is and handle it if it's within the
  requested sope
- within the requested scope find the primary replica for that subset of
  nodes, then handle it

There's also some itermediate solution: suppoer "primary replica in DC"
and reject all other combinations.

Until decided which way is correct, let's disable this configuration.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-12-23 19:19:17 +03:00
Pavel Emelyanov
30aac0d1da sstables_loader: Introduce streaming scope
Currently load-and-stream sends mutations to whatever node is considered
to be a "replica" for it. One exception is the "primary-replica-only"
flag that can be requested by the user.

This patch introduces a "scope" parameter that limits streaming part in
where it can stream the data to with 4 options:

- all -- current way of doing things, stream to wherever needed
- dc -- only stream to nodes that live in the same datacenter
- rack -- only stream to nodes that live in the same rack
- node -- only "stream" to current node

It's not yet configurable and streamer object initializes itself with
"all" mode. Will be changed later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-12-23 19:19:17 +03:00
Pavel Emelyanov
7c1eaa427e sstables_loader: Wrap get_endpoints()
Preparational patch. Next will add more code to get_endpoints() that
will need to work for both if/else branches, this change helps having
less churn later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-12-23 19:19:17 +03:00
Pavel Emelyanov
bb094cc099 Merge 'Make restore task abortable' from Calle Wilund
Fixes #20717

Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.

Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".

Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.

There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.

Closes scylladb/scylladb#21567

* github.com:scylladb/scylladb:
  test_backup: Add restore abort test case
  sstables_loader: Make restore task abortable
  distributed_loader: Add optional abort_source to get_sstables_from_object_store
  s3_storage: Add optional abort_source to params/object
  s3::client: Make "readable_file" abortable
2024-12-19 12:23:33 +03:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Avi Kivity
5a849b0a6a Merge "Move more subsystems to use host ids instead of ips" from Gleb
"
This series converts repair, streaming and node_ops (and some parts of
alternator) to work on host ids instead of ips. This allows to remove
a lot of (but not all) functions that work on ips from effective
replication map.

CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13830/

Refs: scylladb/scylladb#21777
"

* 'gleb/move-to-host-id-more' of github.com:scylladb/scylla-dev:
  locator: topology: remove no longer use get_all_ips()
  gossiper: change get_unreachable_nodes to host ids
  locator: drop no longer used ip based functions from effective replication map and friends
  test: move network_topology_strategy_test and token_metadata_test to use host id based APIs
  replica/database: drop usage of ip in favor of host id in get_keyspace_local_ranges
  replica/mutation_dump: use host ids instead of ips
  alternator: move ttl to work with host ids instead of ips
  storage_service: move node_ops code to use host ids instead of host ips
  streaming: move streaming code to use host ids instead of host ips
  repair: move repair code to use host ids instead of host ips
  gossiper: add get_unreachable_host_ids() function
  locator: topology: add more function that return host ids to effective replication map
  locator: add more function that return host ids to effective replication map
2024-12-18 13:48:22 +02:00
Avi Kivity
fe9fcdfe30 task_manager.hh: replace boost ranges with std ranges
Standardize on one range library to reduce dependency load.

Unfortunately, std::views::concat (the replacement for boost::join),
is C++26 only. We use two separate inserts to the result vector to
compensate, and rationalize it by saying that boost::join() is likely
slow due to the need for type-erasure.

Closes scylladb/scylladb#21834
2024-12-16 13:08:02 +02:00
Gleb Natapov
41a57ed2e8 streaming: move streaming code to use host ids instead of host ips
The patch is rather large, but it is a straightforward conversion from
one type to another.
2024-12-15 11:31:11 +02:00
Calle Wilund
cbe255e736 sstables_loader: Make restore task abortable
Fixes #20717

Enables abortable interface and propagates abort_source to
all s3 objects used for reading the restore data.

Note: because restore is done on each shard, we have to maintain
a per-shard abort source proxy for each, and do a background
per-shard abort on abort call. This is synced at the end of "run()"

v2:
* Simplify abortability by using a function-local gate instead.
2024-12-02 12:36:44 +00:00
Kefu Chai
f436edfa22 mutation: remove unused "#include"s
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.

please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-29 14:01:44 +08:00
Kefu Chai
5ab4932f34 sstables_loader: Track download progress of download_task_impl
Previously, the progress of download_task_impl launched by the "restore" API
was not tracked. Since restore operations can involve large data transfers,
this makes it difficult for users to monitor progress.

The restore process happens in two sequential steps:
1. Open specified SSTables from object storage
2. Download and stream mutation fragments from the opened SSTables to
   mapped destinations

While both steps contribute to overall progress, they use different units
of measurement, making a unified progress metric challenging. Because
the load-and-stream step (step 2) is the largest time-consuming part of the
restore. This change implements progress tracking for this step as an
initial improvement to provide users with partial visibility into the
restore operation.

Fixes scylladb/scylladb#21427
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-28 10:00:45 +08:00
Kefu Chai
e57f674066 sstables_loader: improve batch tracking using ranges library
Replace manual vector iteration with ranges library to preserve batch size
information. When streaming SSTable mutations, we need to track progress
across batches. The previous implementation used a loop to move elements
from the vector's end, but this approach lost the batch size information
since the SSTable set was moved away during streaming.

Now use std::ranges to take elements from the vector's end instead of
manual iteration. This preserves the original batch size, enabling accurate
progress tracking which will be implemented in a follow-up commit.

Technical changes:
- Replace manual vector iteration with ranges::take_view
- Preserve batch size information for progress tracking
- Maintain existing batch processing behavior

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-28 10:00:45 +08:00
Kefu Chai
38af123e05 sstables_loader: print streaming progress with moving range
After d1db17d490, the processed SSTable counter remained at 0 while
streaming progress was still being displayed. This fix properly tracks
and displays streaming progress by:

- Moving SSTable counter (`nr_sst_current`) to `sstable_streamer::stream_sstables()`
- Generating UUID at the streaming initialization
- Relocating progress reporting to `stream_sstables()` for accurate tracking

This ensures the progress indicator correctly reflects the actual number
of processed SSTables during streaming operations.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-28 10:00:45 +08:00
Kefu Chai
83f55fdb84 sstables_loader: mark sstable_streamer::stream_sstable_mutations() private
the only user of `sstable_streamer::stream_sstable_mutations()` is
`sstable_st6reamer::stream_sstables()`, so mark this member function as
private.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-28 10:00:45 +08:00
Kefu Chai
5a39be9b8c sstables_loader: fix indentation in stream_sstable_mutations()
Fix indentation regression from d1db17d490 where the function body of
`sstable_streamer::stream_sstable_mutations()` was left incorrectly
indented after the function was extracted to decouple streaming from
sstable selection.

Pure style fix, no functional changes.

in this change, we correct the indent.

Refs d1db17d490
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-11-28 10:00:45 +08:00
Aleksandra Martyniuk
292d00463a tasks: add task_manager::task::is_user_task method 2024-11-25 14:21:53 +01:00
Pavel Emelyanov
2f9f76fddf sstables_loader: Mark to_replica_set() private
It's not called from outside

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#21210
2024-10-27 22:28:54 +02:00
Kefu Chai
c6bc5b2706 sstable_loader: Remove unused _snapshot_name from download_task_impl
in 787ea4b1, we introduced `_prefix` and `_sstables` member  variables
to `sstables_loader::download_task_impl`, replacing the functionality
of `_snapshot_name`. However, we overlooked removing the now-obsolete
`_snapshot_name` variable.

this commit removes the unused `_snapshot_name` member variable to
improve code cleanliness and prevent potential confusion.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#20969
2024-10-07 10:43:13 +03:00
Botond Dénes
af124993a4 Merge 'Do not remove objects from backup storage after restore' from Pavel Emelyanov
The restore-from-s3 task uses load-and-stream internally which, in turn, unlinks loaded sstables on success. That's not what user expects when it restores from backup, objects should remain in bucket afterwards.

Closes scylladb/scylladb#20947

* github.com:scylladb/scylladb:
  test: Add check that restored-from objects are not removed
  sstables_loader: Dont unlink sstables when restoring from S3
  sstables_loader: Make primary_replica_only bool_class RAII field
2024-10-04 14:59:40 +03:00
Pavel Emelyanov
7389f4275d sstables_loader: Dont unlink sstables when restoring from S3
When load_and_stream() completes, all sstables that were loaded (and
streamed) are unlinked. This is wrong for the restore-from-s3 task, as
removing objects from backup storage is not what user expects.

Fix it by adding a boolean to streamer class, and set it to false (well,
bool_class<>::no) for restore task.

fixes: #20938

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-03 10:15:22 +03:00
Pavel Emelyanov
7eb48358e9 sstables_loader: Make primary_replica_only bool_class RAII field
This boolean is currently passed all the way around as pure bool
argument. And it's only needed in a single get_endpoints() method that
calculates the target endpoints.

This patch places this bool on class streamer, so that the call chain
arguments are not polluted, and converts it to bool_class.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-03 10:13:37 +03:00
Kefu Chai
787ea4b1d4 treewide: accept list of sstables in "restore" API
before this change, we enumerate the sstables tracked by the
system.sstables table, and restore them when serving
requests to "storage_service/restore" API. this works fine with
"storage_service/backup" API. but this "restore" API cannot be
used as a drop-in replacement of the rclone based API currently
used by scylla-manager.

in order to fill the gap, in this change:

* add the "prefix" parameter for specifying the shared prefix of
  sstables
* add the "sstables" parameter for specifying the list of  TOC
  components of sstables
* remove the "snapshot" parameter, as we don't encode the prefix
  on scylla's end anymore.
* make the "table" parameter mandatory.

Fixes scylladb/scylladb#20461
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-01 23:24:56 +08:00
Pavel Emelyanov
11a04bfb66 code: Introduce restore API method
The method starts a task that uses sstables_loader load-and-stream
functionality to bring new sstables into the cluster. The existing
load-and-stream picks up sstables from upload/ directory, the newly
introduced task collects them from S3 bucket and given prefix (that
correspond to the path where backup API method put them).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-28 15:42:49 +03:00
Pavel Emelyanov
1f3f0b1926 sstable_loader: Add sstables::storage_manager dependency
The storage_manager maintains set of clients to configured object
storage(s). The sstables loader  is going to spawn tasks that will talk
to to those storages, thus it needs the storage manager to get the
clients clients from.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
06c3c53deb sstable_loader: Maintain task manager module
This service is going to start tasks managed by task manager. For that,
it should have its module set up and registered.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
9cf95e8a07 sstable_loader: Out-line constructor
It will grow and become more complicated. Better to have it outside the
header.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-08-27 16:15:41 +03:00
Pavel Emelyanov
ae622d711e sstables-loader: Run loading in its scheduling group
Now the loading code has two different paths, and only one of them
switches sched group. It's cleaner and more natural to switch the sched
group in the loader itself, so that all code paths run in it and don't
care switching.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-28 11:07:58 +03:00
Pavel Emelyanov
b728857954 distributed_loader: Remove system_distributed_keyspace and view_update_generator
Now all the code is happy with view_builder and can be shortened

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:41:47 +03:00
Pavel Emelyanov
0d946a5fdf distributed_loader: Propagate view_builder& via process_upload_dir()
Preparation to next patches, they'll make use of this new argument

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-05-23 13:32:28 +03:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Pavel Emelyanov
1c1004d1bd sstables_loader: Format list of sstables' filenames in place
Loader wants to print set of sstables' names. For that it collects names
into a dedicated vector, then prints it using fmt/ranges facility.

There's a way to achieve the same goal without allocating extra vector
with names -- use fmt::format() and pass it a range converting sstables
into their names.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18159
2024-04-04 12:09:52 +03:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Raphael S. Carvalho
6115c113fe sstables_loader: Don't discard sstable that is not fully exhausted
Affects load-and-stream for tablets only.

The intention is that only this loop is responsible for detecting
exhausted sstables and then discarding them for next iterations:
        while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) {
            sstable_it++;
        }

But the loop which consumes non exhausted sstables, on behalf of
each tablet, was incorrectly advancing the iterator, despite the
sstable wasn't considered exhausted.

Fixes #17733.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17899
2024-03-20 09:11:59 +02:00
Raphael S. Carvalho
771cbf9b79 sstables_loader: Stream to pending tablet replica if needed
Even though taking erm blocks migration, it cannot prevent the
load-and-stream to start while a migration is going on, erm
only prevents migration from advancing.

With tablets, new data will be streamed to pending replica too if
the write replica selector, in transition metadata, is set to both.
If migration is at a later stage where only new replica is written
to, then data is streamed only to new replica as selector is set
to next (== new replica set).

primary_replica_only flag is handled by only streaming to pending
if the primary replica is the one leaving through migration.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 15:17:05 -03:00
Raphael S. Carvalho
ab498489fe sstables_loader: Implement tablet based load-and-stream
Similar treatment to repair is given to load-and-stream.

Jumps into a new streaming session for every tablet, so we guarantee
data will be segregated into tablets co-habiting the same shard.

Fixes #17315.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 13:04:20 -03:00
Raphael S. Carvalho
b9158e36ef sstables_loader: Virtualize sstable_streamer for tablet
virtualization allows for tablet version of streaming.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:30:14 -03:00
Raphael S. Carvalho
3523cc8063 sstables_loader: Avoid reallocations in vector
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00
Raphael S. Carvalho
d1db17d490 sstable_loader: Decouple sstable streaming from selection
That will make it easy to introduce tablet-based load-and-stream.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2024-02-27 11:28:11 -03:00