Files
scylladb/sstables_loader.hh
Kefu Chai d815d7013c sstables_loader: report progress with the unit of batch
We restore a snapshot of table by streaming the sstables of
the given snapshot of the table using
`sstable_streamer::stream_sstable_mutations()` in batches. This function
reads mutations from a set of sstables, and streams them to the target
nodes. Due to the limit of this function, we are not able to track the
progress in bytes.

Previously, progress tracking used individual sstables as units, which caused
inaccuracies with tablet-distributed tables, where:
- An sstable spanning multiple tablets could be counted multiple times
- Progress reporting could become misleading (e.g., showing "40" progress
  for a table with 10 sstables)

This change introduces a more robust progress tracking method:
- Use "batch" as the unit of progress instead of individual sstables.
  Each batch represents a tablet when restoring a table snapshot if
  the tablet being restored is distributed with tablets. When it comes
  to tables distributed with vnode, each batch represents an sstable.
- Stream sstables for each tablet separately, handling both partially and
  fully contained sstables
- Calculate progress based on the total number of sstables being streamed
- Skip tablet IDs with no owned tokens

For vnode-distributed tables, the number of "batches" directly corresponds
to the number of sstables, ensuring:
- Consistent progress reporting across different table distribution models
- Simplified implementation
- Accurate representation of restore progress

The new approach provides a more reliable and uniform method of tracking
restoration progress across different table distribution strategies.

Also, Corrected the use of `_sstables.size()` in
`sstable_streamer::stream_sstables()`. It addressed a review comment
from Pavel that was inadvertently overlooked during previous rebasing
the commit of 5ab4932f34.

Fixes scylladb/scylladb#21816

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21841
2025-01-13 09:04:35 +03:00

117 lines
3.7 KiB
C++

/*
* Copyright (C) 2021-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#include <vector>
#include <seastar/core/sharded.hh>
#include "schema/schema_fwd.hh"
#include "sstables/shared_sstable.hh"
#include "tasks/task_manager.hh"
using namespace seastar;
namespace replica {
class database;
}
namespace sstables { class storage_manager; }
namespace netw { class messaging_service; }
namespace db {
namespace view {
class view_builder;
}
}
struct stream_progress {
float total = 0.;
float completed = 0.;
virtual ~stream_progress() = default;
void start(float amount) {
assert(amount >= 0);
total = amount;
}
virtual void advance(float amount) {
// we should not move backward
assert(amount >= 0);
completed += amount;
assert(completed <= total);
}
};
// The handler of the 'storage_service/load_new_ss_tables' endpoint which, in
// turn, is the target of the 'nodetool refresh' command.
// Gets sstables from the upload directory and makes them available in the
// system. Built on top of the distributed_loader functionality.
class sstables_loader : public seastar::peering_sharded_service<sstables_loader> {
public:
enum class stream_scope { all, dc, rack, node };
class task_manager_module : public tasks::task_manager::module {
public:
task_manager_module(tasks::task_manager& tm) noexcept : tasks::task_manager::module(tm, "sstables_loader") {}
};
private:
sharded<replica::database>& _db;
netw::messaging_service& _messaging;
sharded<db::view::view_builder>& _view_builder;
shared_ptr<task_manager_module> _task_manager_module;
sstables::storage_manager& _storage_manager;
seastar::scheduling_group _sched_group;
// Note that this is obviously only valid for the current shard. Users of
// this facility should elect a shard to be the coordinator based on any
// given objective criteria
//
// It shouldn't be impossible to actively serialize two callers if the need
// ever arise.
bool _loading_new_sstables = false;
future<> load_and_stream(sstring ks_name, sstring cf_name,
table_id, std::vector<sstables::shared_sstable> sstables,
bool primary_replica_only, bool unlink_sstables, stream_scope scope,
shared_ptr<stream_progress> progress);
public:
sstables_loader(sharded<replica::database>& db,
netw::messaging_service& messaging,
sharded<db::view::view_builder>& vb,
tasks::task_manager& tm,
sstables::storage_manager& sstm,
seastar::scheduling_group sg);
future<> stop();
/**
* Load new SSTables not currently tracked by the system
*
* This can be called, for instance, after copying a batch of SSTables to a CF directory.
*
* This should not be called in parallel for the same keyspace / column family, and doing
* so will throw an std::runtime_exception.
*
* @param ks_name the keyspace in which to search for new SSTables.
* @param cf_name the column family in which to search for new SSTables.
* @return a future<> when the operation finishes.
*/
future<> load_new_sstables(sstring ks_name, sstring cf_name,
bool load_and_stream, bool primary_replica_only, stream_scope scope);
/**
* Download new SSTables not currently tracked by the system from object store
*/
future<tasks::task_id> download_new_sstables(sstring ks_name, sstring cf_name,
sstring prefix, std::vector<sstring> sstables,
sstring endpoint, sstring bucket, stream_scope scope);
class download_task_impl;
};