scylladb

Author	SHA1	Message	Date
Pavel Emelyanov	cf1ffd6086	Merge 'sstables_loader: fix the racing between get_progress() and release_resources()' from Kefu Chai This change addresses a critical race condition in the sstables_loader where `get_progress()` could access invalid `progress_holder` instances after `release_resources()` destroyed them. Problem: - Progress tracking uses two components: `_progress_state` (tracks state) and `_progress_per_shard` (sharded service with actual progress data) - `get_progress()` first checks if `_progress_state` is initialized, then accumulates progress from `_progress_per_shard` - As both functions are coroutines, `get_progress()` could be preempted after state check but before accessing `_progress_per_shard` - If `release_resources()` runs during this preemption, it destroys the `progress_holder` instances in `_progress_per_shard`, causing `get_progress()` to access invalid memory. Solution: - Implemented shared/exclusive locking to protect access to both state and sharded progress data - Multiple `get_progress()` calls can execute in parallel (shared access) - `release_resources()` acquires exclusive access before modifying resources - This prevents potential memory corruption and ensures consistent progress reporting Fixes #23801 --- this change addresses a racing related to tracking the restore progress from S3 using scylla's native API, which is not used in production yet, hence no need to backport. Closes scylladb/scylladb#23808 * github.com:scylladb/scylladb: sstables_loader: fix the indent sstables_loader: fix the racing between get_progress() and release_resources()	2025-05-05 15:45:15 +03:00
Raphael S. Carvalho	c77f710a0c	sstables: Fix quadratic space complexity in partitioned_sstable_set Interval map is very susceptible to quadratic space behavior when it's flooded with many entries overlapping all (or most of) intervals, since each such entry will have presence on all intervals it overlaps with. A trigger we observed was memtable flush storm, which creates many small "L0" sstables that spans roughly the entire token range. Since we cannot rely on insertion order, solution will be about storing sstables with such wide ranges in a vector (unleveled). There should be no consequence for single-key reads, since upper layer applies an additional filtering based on token of key being queried. And for range scans, there can be an increase in memory usage, but not significant because the sstables span an wide range and would have been selected in the combined reader if the range of scan overlaps with them. Anyway, this is a protection against storm of memtable flushes and shouldn't be the common scenario. It works both with tablets and vnodes, by adjusting the token range spanned by compaction group accordingly. Fixes #23634. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-04-29 15:47:33 -03:00
Kefu Chai	a2b46cbf45	sstables_loader: fix the indent Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-04-22 12:05:55 +08:00
Kefu Chai	6b3ecad467	sstables_loader: fix the racing between get_progress() and release_resources() This change addresses a critical race condition in the sstables_loader where `get_progress()` could access invalid `progress_holder` instances after `release_resources()` destroyed them. Problem: - Progress tracking uses two components: `_progress_state` (tracks state) and `_progress_per_shard` (sharded service with actual progress data) - `get_progress()` first checks if `_progress_state` is initialized, then accumulates progress from `_progress_per_shard` - As both functions are coroutines, `get_progress()` could be preempted after state check but before accessing `_progress_per_shard` - If `release_resources()` runs during this preemption, it destroys the `progress_holder` instances in `_progress_per_shard`, causing `get_progress()` to access invalid memory. Solution: - Implemented shared/exclusive locking to protect access to both state and sharded progress data - Multiple `get_progress()` calls can execute in parallel (shared access) - `release_resources()` acquires exclusive access before modifying resources - This prevents potential memory corruption and ensures consistent progress reporting Fixes #23801 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-04-22 12:05:54 +08:00
Benny Halevy	7969293dcf	sstables_loader: use named gate Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-04-12 11:47:00 +03:00
Pavel Emelyanov	832d83ae4b	sstables_loader: Do not stop sharded<progress_monitor> unconditionally The member in question is unconditionally .stop()-ed in task's release_resources() method, however, it may happen that the thing wasn't .start()-ed in the first place. Start happens in the middle of the task's .run() method and there can be several reasons why it can be skipped -- e.g. the task is aborted early, or collecting sstables from S3 throws. fixes: #23231 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#23483	2025-04-02 12:09:02 +03:00
Pavel Emelyanov	24e5c30cc8	sstables: Do not print component filenames on load-and-stream wrap-up When load-and-stream finishes it may call sstable::unlink() method to drop the loaded (and streamed) sstable. Before calling it it prints a log message about its intention that includes component_filenames() vector. This log message is ugly in several ways. First, it prints only recognized components, while unlink() method unlinks all of them, so it's sort of misleading (it doesn't seem that anyone ever read this message IRL though) Next, that's the only place that is _that_ verbose about sstable unlinking. "Common" unlinking paths don't print that much info. Finally, the log message happen in debug level, so it's hardly ever appears in any logs, but collecting several filenames takes time. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-03-19 12:45:21 +03:00
Kefu Chai	b448fea260	sstable_loader: fix cross-shard resource cleanup in download_task_impl Previously, download_task_impl's destructor would destroy per-shard progress elements on whatever shard the task was destroyed on. In multi-shard environments, this caused "shared_ptr accessed on non-owner cpu" errors when attempting to free memory allocated on a different shard. Fix by: - Convert progress_per_shard into a sharded service - Stop the service on owner shards during cleanup using coroutines - Add operator+= to stream_progress to leverage seastar's built-in adder instead of a custom adder struct Alternative approaches considered: 1. Using foreign_ptr: Rejected as it would require interface changes that complicate stream delegation. foreign_ptr manages the underlying pointee with another smart pointer but does not expose the smart pointer instance in its APIs, making it impossible to use shared_ptr<stream_progress> in the interface. 2. Using vector<stream_progress>: Rejected for similar interface compatibility reasons. This solution maintains the existing interfaces while ensuring proper cross-shard cleanup. Fixes scylladb/scylladb#22759 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2025-02-14 11:13:58 +08:00
Kefu Chai	1ef2d9d076	tree: migrate from boost::adaptors::transformed to std::views::transform Replace remaining uses of boost::adaptors::transformed with std::views::transform to reduce Boost dependencies, following the migration pattern established in `bab12e3a`. This change addresses recently merged code that reintroduced Boost header dependencies through boost::adaptors::transformed usage. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22365	2025-01-17 16:56:40 +02:00
Kefu Chai	d815d7013c	sstables_loader: report progress with the unit of batch We restore a snapshot of table by streaming the sstables of the given snapshot of the table using `sstable_streamer::stream_sstable_mutations()` in batches. This function reads mutations from a set of sstables, and streams them to the target nodes. Due to the limit of this function, we are not able to track the progress in bytes. Previously, progress tracking used individual sstables as units, which caused inaccuracies with tablet-distributed tables, where: - An sstable spanning multiple tablets could be counted multiple times - Progress reporting could become misleading (e.g., showing "40" progress for a table with 10 sstables) This change introduces a more robust progress tracking method: - Use "batch" as the unit of progress instead of individual sstables. Each batch represents a tablet when restoring a table snapshot if the tablet being restored is distributed with tablets. When it comes to tables distributed with vnode, each batch represents an sstable. - Stream sstables for each tablet separately, handling both partially and fully contained sstables - Calculate progress based on the total number of sstables being streamed - Skip tablet IDs with no owned tokens For vnode-distributed tables, the number of "batches" directly corresponds to the number of sstables, ensuring: - Consistent progress reporting across different table distribution models - Simplified implementation - Accurate representation of restore progress The new approach provides a more reliable and uniform method of tracking restoration progress across different table distribution strategies. Also, Corrected the use of `_sstables.size()` in `sstable_streamer::stream_sstables()`. It addressed a review comment from Pavel that was inadvertently overlooked during previous rebasing the commit of `5ab4932f34`. Fixes scylladb/scylladb#21816 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21841	2025-01-13 09:04:35 +03:00
Pavel Emelyanov	960041d4b4	sstables_loader: Propagate scope from API down Semi-mechanical change that adds newly introduced "scope" parameter to all the functions between API methods and the low-level streamer object. No real functional changes. API methods set it to "all" to keep existing behavior. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:28:05 +03:00
Pavel Emelyanov	e8201a7897	sstables_loader: Filter tablets based on scope Loading and streaming tablets has pre-filtering loop that walks the tablet map sorts sstables into three lists: - fully contained in one of map ranges - partially overlapping with the map - not intersecting with the map Sstables from the 3rd list is immediately dropped from the process and for the remaining two core load-and-stream happens. This filtering deserves more care from the newly introduced scope. When a tablet replica set doesn't get in the scope, the whole entry can be disregarded, because load-and-stream will only do its "load" part anyway and all mutations from it will be ignored. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:28:05 +03:00
Pavel Emelyanov	93aed22cd5	streamer: Disable scoped streaming of primary replica only There's been some discussions of how primary replica only streaming schould interact with the scope. There are two options how to consider this combination: - find where the primary replica is and handle it if it's within the requested sope - within the requested scope find the primary replica for that subset of nodes, then handle it There's also some itermediate solution: suppoer "primary replica in DC" and reject all other combinations. Until decided which way is correct, let's disable this configuration. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:19:17 +03:00
Pavel Emelyanov	30aac0d1da	sstables_loader: Introduce streaming scope Currently load-and-stream sends mutations to whatever node is considered to be a "replica" for it. One exception is the "primary-replica-only" flag that can be requested by the user. This patch introduces a "scope" parameter that limits streaming part in where it can stream the data to with 4 options: - all -- current way of doing things, stream to wherever needed - dc -- only stream to nodes that live in the same datacenter - rack -- only stream to nodes that live in the same rack - node -- only "stream" to current node It's not yet configurable and streamer object initializes itself with "all" mode. Will be changed later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:19:17 +03:00
Pavel Emelyanov	7c1eaa427e	sstables_loader: Wrap get_endpoints() Preparational patch. Next will add more code to get_endpoints() that will need to work for both if/else branches, this change helps having less churn later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-12-23 19:19:17 +03:00
Pavel Emelyanov	bb094cc099	Merge 'Make restore task abortable' from Calle Wilund Fixes #20717 Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data. Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()". Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader. There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored. Closes scylladb/scylladb#21567 * github.com:scylladb/scylladb: test_backup: Add restore abort test case sstables_loader: Make restore task abortable distributed_loader: Add optional abort_source to get_sstables_from_object_store s3_storage: Add optional abort_source to params/object s3::client: Make "readable_file" abortable	2024-12-19 12:23:33 +03:00
Avi Kivity	f3eade2f62	treewide: relicense to ScyllaDB-Source-Available-1.0 Drop the AGPL license in favor of a source-available license. See the blog post [1] for details. [1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/	2024-12-18 17:45:13 +02:00
Avi Kivity	5a849b0a6a	Merge "Move more subsystems to use host ids instead of ips" from Gleb " This series converts repair, streaming and node_ops (and some parts of alternator) to work on host ids instead of ips. This allows to remove a lot of (but not all) functions that work on ips from effective replication map. CI: https://jenkins.scylladb.com/job/scylla-master/job/scylla-ci/13830/ Refs: scylladb/scylladb#21777 " * 'gleb/move-to-host-id-more' of github.com:scylladb/scylla-dev: locator: topology: remove no longer use get_all_ips() gossiper: change get_unreachable_nodes to host ids locator: drop no longer used ip based functions from effective replication map and friends test: move network_topology_strategy_test and token_metadata_test to use host id based APIs replica/database: drop usage of ip in favor of host id in get_keyspace_local_ranges replica/mutation_dump: use host ids instead of ips alternator: move ttl to work with host ids instead of ips storage_service: move node_ops code to use host ids instead of host ips streaming: move streaming code to use host ids instead of host ips repair: move repair code to use host ids instead of host ips gossiper: add get_unreachable_host_ids() function locator: topology: add more function that return host ids to effective replication map locator: add more function that return host ids to effective replication map	2024-12-18 13:48:22 +02:00
Avi Kivity	fe9fcdfe30	task_manager.hh: replace boost ranges with std ranges Standardize on one range library to reduce dependency load. Unfortunately, std::views::concat (the replacement for boost::join), is C++26 only. We use two separate inserts to the result vector to compensate, and rationalize it by saying that boost::join() is likely slow due to the need for type-erasure. Closes scylladb/scylladb#21834	2024-12-16 13:08:02 +02:00
Gleb Natapov	41a57ed2e8	streaming: move streaming code to use host ids instead of host ips The patch is rather large, but it is a straightforward conversion from one type to another.	2024-12-15 11:31:11 +02:00
Calle Wilund	cbe255e736	sstables_loader: Make restore task abortable Fixes #20717 Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data. Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()" v2: * Simplify abortability by using a function-local gate instead.	2024-12-02 12:36:44 +00:00
Kefu Chai	f436edfa22	mutation: remove unused "#include"s these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed. please note, because `mutation/mutation.hh` does not include `seastar/coroutine/maybe_yield.hh` anymore, and quite a few source files were relying on this header to bring in the declaration of `maybe_yield()`, we have to include this header in the places where this symbol is used. the same applies to `seastar/core/when_all.hh`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-29 14:01:44 +08:00
Kefu Chai	5ab4932f34	sstables_loader: Track download progress of download_task_impl Previously, the progress of download_task_impl launched by the "restore" API was not tracked. Since restore operations can involve large data transfers, this makes it difficult for users to monitor progress. The restore process happens in two sequential steps: 1. Open specified SSTables from object storage 2. Download and stream mutation fragments from the opened SSTables to mapped destinations While both steps contribute to overall progress, they use different units of measurement, making a unified progress metric challenging. Because the load-and-stream step (step 2) is the largest time-consuming part of the restore. This change implements progress tracking for this step as an initial improvement to provide users with partial visibility into the restore operation. Fixes scylladb/scylladb#21427 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-28 10:00:45 +08:00
Kefu Chai	e57f674066	sstables_loader: improve batch tracking using ranges library Replace manual vector iteration with ranges library to preserve batch size information. When streaming SSTable mutations, we need to track progress across batches. The previous implementation used a loop to move elements from the vector's end, but this approach lost the batch size information since the SSTable set was moved away during streaming. Now use std::ranges to take elements from the vector's end instead of manual iteration. This preserves the original batch size, enabling accurate progress tracking which will be implemented in a follow-up commit. Technical changes: - Replace manual vector iteration with ranges::take_view - Preserve batch size information for progress tracking - Maintain existing batch processing behavior Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-28 10:00:45 +08:00
Kefu Chai	38af123e05	sstables_loader: print streaming progress with moving range After `d1db17d490`, the processed SSTable counter remained at 0 while streaming progress was still being displayed. This fix properly tracks and displays streaming progress by: - Moving SSTable counter (`nr_sst_current`) to `sstable_streamer::stream_sstables()` - Generating UUID at the streaming initialization - Relocating progress reporting to `stream_sstables()` for accurate tracking This ensures the progress indicator correctly reflects the actual number of processed SSTables during streaming operations. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-28 10:00:45 +08:00
Kefu Chai	83f55fdb84	sstables_loader: mark sstable_streamer::stream_sstable_mutations() private the only user of `sstable_streamer::stream_sstable_mutations()` is `sstable_st6reamer::stream_sstables()`, so mark this member function as private. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-28 10:00:45 +08:00
Kefu Chai	5a39be9b8c	sstables_loader: fix indentation in stream_sstable_mutations() Fix indentation regression from `d1db17d490` where the function body of `sstable_streamer::stream_sstable_mutations()` was left incorrectly indented after the function was extracted to decouple streaming from sstable selection. Pure style fix, no functional changes. in this change, we correct the indent. Refs `d1db17d490` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-11-28 10:00:45 +08:00
Aleksandra Martyniuk	292d00463a	tasks: add task_manager::task::is_user_task method	2024-11-25 14:21:53 +01:00
Pavel Emelyanov	2f9f76fddf	sstables_loader: Mark to_replica_set() private It's not called from outside Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#21210	2024-10-27 22:28:54 +02:00
Kefu Chai	c6bc5b2706	sstable_loader: Remove unused _snapshot_name from download_task_impl in `787ea4b1`, we introduced `_prefix` and `_sstables` member variables to `sstables_loader::download_task_impl`, replacing the functionality of `_snapshot_name`. However, we overlooked removing the now-obsolete `_snapshot_name` variable. this commit removes the unused `_snapshot_name` member variable to improve code cleanliness and prevent potential confusion. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#20969	2024-10-07 10:43:13 +03:00
Botond Dénes	af124993a4	Merge 'Do not remove objects from backup storage after restore' from Pavel Emelyanov The restore-from-s3 task uses load-and-stream internally which, in turn, unlinks loaded sstables on success. That's not what user expects when it restores from backup, objects should remain in bucket afterwards. Closes scylladb/scylladb#20947 * github.com:scylladb/scylladb: test: Add check that restored-from objects are not removed sstables_loader: Dont unlink sstables when restoring from S3 sstables_loader: Make primary_replica_only bool_class RAII field	2024-10-04 14:59:40 +03:00
Pavel Emelyanov	7389f4275d	sstables_loader: Dont unlink sstables when restoring from S3 When load_and_stream() completes, all sstables that were loaded (and streamed) are unlinked. This is wrong for the restore-from-s3 task, as removing objects from backup storage is not what user expects. Fix it by adding a boolean to streamer class, and set it to false (well, bool_class<>::no) for restore task. fixes: #20938 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 10:15:22 +03:00
Pavel Emelyanov	7eb48358e9	sstables_loader: Make primary_replica_only bool_class RAII field This boolean is currently passed all the way around as pure bool argument. And it's only needed in a single get_endpoints() method that calculates the target endpoints. This patch places this bool on class streamer, so that the call chain arguments are not polluted, and converts it to bool_class. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-03 10:13:37 +03:00
Kefu Chai	787ea4b1d4	treewide: accept list of sstables in "restore" API before this change, we enumerate the sstables tracked by the system.sstables table, and restore them when serving requests to "storage_service/restore" API. this works fine with "storage_service/backup" API. but this "restore" API cannot be used as a drop-in replacement of the rclone based API currently used by scylla-manager. in order to fill the gap, in this change: * add the "prefix" parameter for specifying the shared prefix of sstables * add the "sstables" parameter for specifying the list of TOC components of sstables * remove the "snapshot" parameter, as we don't encode the prefix on scylla's end anymore. * make the "table" parameter mandatory. Fixes scylladb/scylladb#20461 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-01 23:24:56 +08:00
Pavel Emelyanov	11a04bfb66	code: Introduce restore API method The method starts a task that uses sstables_loader load-and-stream functionality to bring new sstables into the cluster. The existing load-and-stream picks up sstables from upload/ directory, the newly introduced task collects them from S3 bucket and given prefix (that correspond to the path where backup API method put them). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-28 15:42:49 +03:00
Pavel Emelyanov	1f3f0b1926	sstable_loader: Add sstables::storage_manager dependency The storage_manager maintains set of clients to configured object storage(s). The sstables loader is going to spawn tasks that will talk to to those storages, thus it needs the storage manager to get the clients clients from. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	06c3c53deb	sstable_loader: Maintain task manager module This service is going to start tasks managed by task manager. For that, it should have its module set up and registered. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	9cf95e8a07	sstable_loader: Out-line constructor It will grow and become more complicated. Better to have it outside the header. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-08-27 16:15:41 +03:00
Pavel Emelyanov	ae622d711e	sstables-loader: Run loading in its scheduling group Now the loading code has two different paths, and only one of them switches sched group. It's cleaner and more natural to switch the sched group in the loader itself, so that all code paths run in it and don't care switching. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-28 11:07:58 +03:00
Pavel Emelyanov	b728857954	distributed_loader: Remove system_distributed_keyspace and view_update_generator Now all the code is happy with view_builder and can be shortened Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:41:47 +03:00
Pavel Emelyanov	0d946a5fdf	distributed_loader: Propagate view_builder& via process_upload_dir() Preparation to next patches, they'll make use of this new argument Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-05-23 13:32:28 +03:00
Kefu Chai	a439ebcfce	treewide: include fmt/ranges.h and/or fmt/std.h before this change, we rely on the default-generated fmt::formatter created from operator<<, but fmt v10 dropped the default-generated formatter. in this change, we include `fmt/ranges.h` and/or `fmt/std.h` for formatting the container types, like vector, map optional and variant using {fmt} instead of the homebrew formatter based on operator<<. with this change, the changes adding fmt::formatter and the changes using ostream formatter explicitly, we are allowed to drop `FMT_DEPRECATED_OSTREAM` macro. Refs scylladb#13245 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-04-19 22:56:16 +08:00
Pavel Emelyanov	1c1004d1bd	sstables_loader: Format list of sstables' filenames in place Loader wants to print set of sstables' names. For that it collects names into a dedicated vector, then prints it using fmt/ranges facility. There's a way to achieve the same goal without allocating extra vector with names -- use fmt::format() and pass it a range converting sstables into their names. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#18159	2024-04-04 12:09:52 +03:00
Raphael S. Carvalho	6bdb456fad	sstables_loader: Fix loader when write selector is previous during tablet migration The loader is writing to pending replica even when write selector is set to previous. If migration is reverted, then the writes won't be rolled back as it assumes pending replicas weren't written to yet. That can cause data resurrection if tablet is later migrated back into the same replica. NOTE: write selector is handled correctly when set to next, because get_natural_endpoints() will return the next replica set, and none of the replicas will be considered leaving. And of course, selector set to both is also handled correctly. Fixes #17892. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17902	2024-03-24 01:20:50 +01:00
Raphael S. Carvalho	6115c113fe	sstables_loader: Don't discard sstable that is not fully exhausted Affects load-and-stream for tablets only. The intention is that only this loop is responsible for detecting exhausted sstables and then discarding them for next iterations: while (sstable_it != _sstables.rend() && exhausted(*sstable_it)) { sstable_it++; } But the loop which consumes non exhausted sstables, on behalf of each tablet, was incorrectly advancing the iterator, despite the sstable wasn't considered exhausted. Fixes #17733. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#17899	2024-03-20 09:11:59 +02:00
Raphael S. Carvalho	771cbf9b79	sstables_loader: Stream to pending tablet replica if needed Even though taking erm blocks migration, it cannot prevent the load-and-stream to start while a migration is going on, erm only prevents migration from advancing. With tablets, new data will be streamed to pending replica too if the write replica selector, in transition metadata, is set to both. If migration is at a later stage where only new replica is written to, then data is streamed only to new replica as selector is set to next (== new replica set). primary_replica_only flag is handled by only streaming to pending if the primary replica is the one leaving through migration. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 15:17:05 -03:00
Raphael S. Carvalho	ab498489fe	sstables_loader: Implement tablet based load-and-stream Similar treatment to repair is given to load-and-stream. Jumps into a new streaming session for every tablet, so we guarantee data will be segregated into tablets co-habiting the same shard. Fixes #17315. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 13:04:20 -03:00
Raphael S. Carvalho	b9158e36ef	sstables_loader: Virtualize sstable_streamer for tablet virtualization allows for tablet version of streaming. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:30:14 -03:00
Raphael S. Carvalho	3523cc8063	sstables_loader: Avoid reallocations in vector Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00
Raphael S. Carvalho	d1db17d490	sstable_loader: Decouple sstable streaming from selection That will make it easy to introduce tablet-based load-and-stream. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2024-02-27 11:28:11 -03:00

1 2

75 Commits