The cache of the hints and batchlog flush makes the exact repair time
check difficult in the test. Disabling it for two repair tests
that check the exact repair time.
The repair time returned by repair_service::repair_tablet considers the
hints and batchlog flush time, so it could be used for the tombstone gc
purpose.
The commit b39ca29b3c introduced detection of admission-waiter
anomaly and dumps permit diagnostics as soon as the semaphore did
not admit readers even though it could.
Later on, the commit bf3d0b3543 introduces the optimization where
the admission check is moved to the fiber processing the _read_list.
Since the semaphore no longer admits readers as soon as it can,
dumping diagnostic errors is not necessary as the situation is not
abnormal.
Closesscylladb/scylladb#22344
For several years now, we have seen a strange, and very rare, flakiness
in Alternator tests described in issue #17564: We see all the test pass,
pytest declares them to have passed, and while Python is existing, it
crashes with a signal 11 (SIGSEGV). Because this happens exclusively in
test/alternator and never in the test/cqlpy, we suspect that something
that the test/alternator leaves behind but test/cqlpy does not, causes
some race and crashes during shutdown.
The immediate suspect is the boto3 library, or rather, the urllib3 library
which it uses. This is more-or-less the only thing that test/alternator
does which test/cqlpy doesn't. The urllib3 library keeps around pools of
reusable connections, and it's possible (although I don't actually have any
proof for it) that these open connections may cause a crash during shutdown.
So in this patch I add to the "dynamodb" and "dynamodbstreams" fixtures
(which all Alternator tests use to connect to the server), a teardown which
calls close() for the boto3 client object. This close() call percolates
down to calling clear() on urllib3's PoolManager. Hopefully, this will
make some difference in the chance to crash during shutdown - and if it
doesn't, it won't hurt.
Refs #17564Closesscylladb/scylladb#22341
cmake doesn't set a `-ffile-prefix-map` for source files. Among other things,
this results in absolute paths in Scylla logs:
```
Jan 11 09:59:11.462214 longevity-tls-50gb-3d-master-db-node-2dcd4a4a-5 scylla[16339]: scylla: /jenkins/workspace/scylla-master/next/scylla/utils/refcounted.hh:23: utils::refcounted::~refcounted(): Assertion `_count == 0' failed.
```
And it results in absolute paths in gdb, which makes it a hassle to get gdb to display
source code during debugging. (A build-specific `substitute-path` has to be
configured for that).
There is a `-file-prefix-map` rule for `CMAKE_BINARY_DIR`,
but it's wrong.
Patch dbb056f4f7, which added it,
was misguided.
What we want is to strip the leading components of paths up to
the repository directory, both in __FILE__ macros and in debug info.
For example, we want to convert /home/michal/scylla/replica/table.cc to
replica/table.cc or ./replica/table.cc, both in Scylla logs and in gdb.
What the current rule does is it maps `/home/michal/scylla/build` to `.`,
which is wrong: it doesn't do anything about the paths outside of `build`,
which are the ones we actually care about.
This patch fixes the problem.
Closesscylladb/scylladb#22311
wasm32-wasi has been removed in Rust 1.84 (Jan 5th, 2025). if one
compiles the tree with Rust 1.84 or up, following build failure is
expected:
```
[2/305] Building WASM /home/kefu/dev/scylladb/build/wasm/return_input.wasm
FAILED: wasm/return_input.wasm /home/kefu/dev/scylladb/build/wasm/return_input.wasm
cd /home/kefu/dev/scylladb/test/resource/wasm/rust && /usr/bin/cargo build --target=wasm32-wasi --example=return_input --locked --manifest-path=Cargo.toml --target-dir=/home/kefu/dev/scylladb/build/test/resource/wasm/rust && wasm-opt /home/kefu/dev/scylladb/build/test/resource/wasm/rust/wasm32-wasi//debug/examples/return_input.wasm -Oz -o /home/kefu/dev/scylladb/build/wasm/return_input.wasm && wasm-strip /home/kefu/dev/scylladb/build/wasm/return_input.wasm
error: failed to run `rustc` to learn about target-specific information
Caused by:
process didn't exit successfully: `rustc - --crate-name ___ --print=file-names --target wasm32-wasi --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=split-debuginfo --print=crate-name --print=cfg` (exit status: 1)
--- stderr
error: Error loading target specification: Could not find specification for target "wasm32-wasi". Run `rustc --print target-list` for a list of built-in targets
```
in order to workaround this issue, let's check for supported target,
and use wasm32-wasip1 if wasm32-wasi is not listed as the supported
target.
Refs #20878
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22320
While this isn't strictly needed for anything, messaging_service
is supposed to clear its RPC connection objects on stop,
for debuggability reasons.
But a recent change in this area broke that.
std::bind creates copies of its arguments, so the `m.clear()`
statement in stop_client() only clears a copy of the vector of shared pointers,
instead of clearing the original vector. This patch fixes that.
Fixes#22245Closesscylladb/scylladb#22333
Currently, tests are reusing the cluster. This leads to the situation
when test passes and leaves the cluster broken, that the next tests will
try to clean up the Scylla working directory during starting the node.
Timeout for starting is set to two minutes by default and sometimes
cleaning the mess after several tests can take more time, so tests fails
during adding the node to the cluster. Current PR marks the cluster
dirty after the test, so no need to clean the Scylla working directory.
The disadvantage of this way is increasing the time for tests execution.
Observable increase is approximately one minutes for one repeat in dev
mode: 22 min 35s vs. 23 min 41s.
Closesscylladb/scylladb#22274
When a node is bootstrapped and joined a cluster as a non-voter and changes it's role to a voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried.
Fixesscylladb/scylladb#20814
Backport: This issue occurs frequently and disrupts the CI workflow to some extent. Backports are needed for versions 6.1 and 6.2.
Closesscylladb/scylladb#22253
* github.com:scylladb/scylladb:
raft: refactor `remove_from_raft_config` to use a timed `modify_config` call.
raft: Refactor functions using `modify_config` to use a common wrapper for retrying.
raft: Handle non-critical config update errors in when changing status to voter.
test: Add test to check that a node does not fail on unknown commit status error when starting up.
raft: Add run_op_with_retry in raft_group0.
When adding a new view for building, first write the status to the
system tables and then add the view building step that will start
building it.
Otherwise, if we start building it before the status is written to the
table, it may happen that we complete building the view, write the
SUCCESS status, and then overwrite it with the STARTED status. The
view_build_status table will remain in incorrect state indicating the
view building is not complete.
Fixes#20638
The PR contains few additional small fixes in separate commits related to the view build status table.
It addresses flakiness issues in tests that use the view build status table to determine when view building is complete. The table may be in incorrect state due to these issues, having a row with status STARTED when it actually finished building the view, which will cause us to wait in `wait_for_view` until it timeouts.
For testing I used a test similar to `test_view_build_status_with_replace_node`, but it only creates the views and calls `wait_for_view`. Without these commits it failed in 4/1024 runs, and with the commits it passed 2048/2048.
backport to fix the bugs that affects previous versions and improve CI stability
Closesscylladb/scylladb#22307
* github.com:scylladb/scylladb:
view_builder: hold semaphore during entire startup
view_builder: pass view name by value to write_view_build_status
view_builder: write status to tables before starting to build
In this PR, we pair draining the view builder with its start.
To better understand what was done and why, let's first look at the
situation before this commit and the context of it:
(a) The following things happened in order:
1. The view builder would be constructed.
2. Right after that, a deferred lambda would be created to stop the
view builder during shutdown.
3. group0_service would be started.
4. A deferred lambda stopping group0_service would be created right
after that.
5. The view builder would be started.
(b) Because the view builder depends on group0_client, it couldn't be
started before starting group0_service. On the other hand, other
services depend on the view builder, e.g. the stream manager. That
makes changing the order of initialization a difficult problem,
so we want to avoid doing that unless we're sure it's the right
choice.
(c) Since the view builder uses group0_client, there was a possibility
of running into a segmentation fault issue in the following
scenario:
1. A call to `view_builder::mark_view_build_success()` is issued.
2. We stop group0_service.
3. `view_builder::mark_view_build_success()` calls
`announce_with_raft()`, which leads to a use-after-free because
group0_service has already been destroyed.
This very scenario took place in scylladb/scylladb#20772.
Initially, we decided to solve the issue by initializing
group0_service a bit earlier (scylladb/scylladb@7bad8378c7).
Unfortunately, it led to other issues described in scylladb/scylladb#21534,
so we revert that patch. These changes are the second attempt
to the problem where we want to solve it in a safer manner.
The solution we came up with is to pair the start of the view builder
with a deferred lambda that deinitializes it by calling
`view_builder::drain()`. No other component of the system should be
able to use the view builder anymore, so it's safe to do that.
Furthermore, that pairing makes the analysis of
initialization/deinitialization order much easier. We also solve the
aformentioned use-after-free issue because the view builder itself
will no longer attempt to use group0_client.
Note that we still pair a deferred lambda calling `view_builder::stop()`
with the construction of the view builder; that function will also call
`view_builder::drain()`. Another notable thing is `view_builder::drain()`
may be called earlier by `storage_service::do_drain()`. In other words,
these changes cover the situation when Scylla runs into a problem when
starting up.
Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too.
Fixesscylladb/scylladb#20772Fixesscylladb/scylladb#21534Closesscylladb/scylladb#21909
* github.com:scylladb/scylladb:
test/topology_custom: Add test for Scylla with disabled view building
main, view: Pair view builder drain with its start
Revert "main,cql_test_env: start group0_service before view_builder"
To avoid potential hangs during the `remove_from_raft_config` operation, use a timed `modify_config` call.
This ensures the operation doesn't get stuck indefinitely.
for retrying.
There are several places in `raft_group0` where almost identical code is
used for retrying `modify_config` in case of `commit_status_unknown`
error. To avoid code duplication all these places were changed to
use a new wrapper `run_op_with_retry`.
to voter.
When a node is bootstrapped and joins a cluster as a non-voter, errors can occur while committing
a new Raft record, for instance, if the Raft leader changes during this time. These errors are not
critical and should not cause a node crash, as the action can be retried.
Fixesscylladb/scylladb#20814
Currently, our relocatable package doesn't contains p11-kit-trust.so
since it dynamically loaded, not showing on "ldd" results
(Relocatable packaging script finds dependent libraries by "ldd").
So we need to add it on create-relocatable-pacakge.py.
Also, we have two more problems:
1. p11 module load path is defined as "/usr/lib64/pkcs11", not
referencing to /opt/scylladb/libreloc
(and also RedHat variants uses different path than Debian variants)
2. ca-trust-source path is configured on build time (on Fedora),
it compatible with RedHat variants but not compatible with Debian
variants
To solve these problems, we need to override default p11-kit
configuration.
To do so, we need to add an configuration file to
/opt/scylladb/share/pkcs11/modules/p11-kit-trust.module.
Also, ofcause p11-kit doesn't reference /opt/scylladb by default, we
need to override load path by p11_kit_override_system_files().
On the configuration file, we can specify module load path by "modules: <path>",
and also we can specify ca-trust-source path by "x-init-reservied: paths=<path>".
Fixesscylladb/scylladb#13904Closesscylladb/scylladb#22302
error when starting up.
Test that a node is starting successfully if while joining a cluster and becoming a voter, it
receives an unknown commit status error.
Test for scylladb/scylladb#20814
Since when calling `modify_config` it's quite often we need to do
retries, to avoid code duplication, a function wrapper that allows
a function to be called with automatic retries in case of failures
was added.
- use "foo not in bar" instead of "not foo in bar"
- test/pylib: use foo instead of `'{}'.format(foo)`
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#22066
* github.com:scylladb/scylladb:
test/pylib: use `foo` instead of `'{}'.format(foo)`
test/pylib: use "foo not in bar" instead of "not foo in bar"
As part of #18750, we added a CQL statement CREATE ROLE WITH SALTED HASH that prevented hashing a password when creating a role, effectively leading to inserting a hash given by the user directly into the database. In #21350, we noticed that Cassandra had implemented a CQL statement of similar semantics but different syntax. We decided to rename Scylla's statement to be compatible with Cassandra. Unfortunately, we didn't notice one more difference between what we had in Scylla and what was part of Cassandra.
Scylla's statement was originally supposed to only be used when restoring the schema and the user needn't have to be aware of its existence at all: the database produced a sequence of CQL statements that the user saved to a file and when a need to restore the schema arose, they would execute the contents of the file. That's why that although we documented the feature, it was only done in the necessary places. Those that weren't related to the backup & restore procedure were deliberately skipped.
Cassandra, on the other hand, added the statement for a different purpose (for details, see the relevant issue) and it was supposed to be used by the user by design. The statement is also documented as such.
Since we want to preserve compatibility with Cassandra, we document the statement and its semantics in the user documentation, explicitly implying that it can be used by the user.
We also add a test verifying that logging in works correctly.
Fixesscylladb/scylladb#21691
Backport: not needed. The relevant code didn't make it to 6.2 or any previous version of OSS.
Closesscylladb/scylladb#21752
* github.com:scylladb/scylladb:
docs: Update documentation on CREATE ROLE WITH HASHED PASSWORD
test/boost: Add test for creating roles with hashed passwords
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Previously, we hardwire the container to a previous frozen toolchain
image. but at the time of writing, the tree does not compile in the
specified toolchain image anymore, after the required building
environment is updated, and toolchain was updated accordingly.
in order to improve the maintability, let's reuse `read-toolchain.yaml`
job which reads `tools/toolchain/image`, so we don't have to hardwire
the container used for building the tree with the latest seastar. this
should address the build failure surfaced recently.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22287
The "--experimental" option was removed in commit f6cca741ea. Using this
deprecated option now causes Scylla to fail with the error:
```
error: the argument ('on') for option '--experimental-features' is invalid
```
So, in this change, let's update the docker entry point script to use
`--experimental-features` command line option instead. The related
document is updated accordingly.
Fixesscylladb/scylladb#22207
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22283
memtable_flush_period test sets the flush period to 200ms and checks
whether the data is flushed after 500ms.
When flush period is set, the timer is armed with the given value.
On expiration, memtables are flushed and then the timer is rearmed.
There is no certainty that during 500ms the flush finishes, though.
Check if after 500ms flush has started. Wait until there is an sstable.
Fixes: #21965.
Closesscylladb/scylladb#22162
In main.cc storage_service is started before and stopped after
repair_service. storage_service keeps a reference to sharded
repair_service and calls its methods, but nothing ensures that
repair_service's local instance would be alive for the whole
execution of the method.
Add a gate to repair_service and enter it in storage_service
before executing methods on local instances of repair_service.
Fixes: #21964.
Closesscylladb/scylladb#22145
In Scylla there are two options that control IO bandwidth limit -- the /storage_service/(compaction|stream)_throughput REST API endpoints. The endpoints are partially implemented and have no counterparts in the nodetool.
This set implements the missing bits and adds tests for new functionality.
Closesscylladb/scylladb#21877
* github.com:scylladb/scylladb:
nodetool: Implement [gs]etstreamthroughput commands
nodetool: Implement [gs]etcompationthroughput commands
test: Add validation of how IO-updating endpoints work
api: Implement /storage_service/(stream|compaction)_throughput endpoints
api: Disqualify const config reference
api: Implement /storage_service/stream_throughput endpoint
api: Move stream throughput set/get endpoints from storage service block
api: Move set_compaction_throughput_mb_per_sec to config block
util: Include fmt/ranges.h in config_file.hh
Guard the whole view builder startup routine by holding the semaphore
until it's done instead of releasing it early, so that it's not
intercepted by migration notifications.
The function write_view_build_status takes two lambda functions and
chooses which of them to run depending on the upgrade state. It might
run both of them.
The parameters ks_name and view_name should be passed by value instead
of by reference because they are moved inside each lambda function.
Otherwise, if both lambdas are run, the second call operates on invalid
values that were moved.
When adding a new view for building, first write the status to the
system tables and then add the view building step that will start
building it.
Otherwise, if we start building it before the status is written to the
table, it may happen that we complete building the view, write the
SUCCESS status, and then overwrite it with the STARTED status. The
view_build_status table will remain in incorrect state indicating the
view building is not complete.
Fixesscylladb/scylladb#20638
They will be useful for hosts and DCs selection for the repair
scheduler. It is not implemented yet. Adding it earlier, so we do not
need to change the system tabler later.
Closesscylladb/scylladb#21985
This series adds WCU support for the Alternator update item.
This motivation behind it, is to have a rough estimation of what a similar operation would have taken from WCU perspective if used with DynamoDB.
The calculation is done while minimal overhead is the prime objective, the results are values that is less or equal to what it would have been in DynamoDB
** New feature, no need to backport. **
Closesscylladb/scylladb#21999
* github.com:scylladb/scylladb:
alternator/test_returnconsumedcapacity.py: update item
alternator/executor.cc: Add WCU for update_item
This update addresses an issue in the mutation diff calculation
algorithm used during read repair. Previously, the algorithm used
`token` as the hashmap key. Since `token` is calculated basing on the
Murmur3 hash function, it could generate duplicate values for different
partition keys, causing corruption in the affected rows' values.
Fixesscylladb/scylladb#19101
Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2
Closesscylladb/scylladb#21996
* github.com:scylladb/scylladb:
storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function.
storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap.
test: Add test case for checking read repair diff calculation when having conflicting keys.
Since we manage ip to id mapping directly in gossiper now we need to
load the mapping on boot. We already do it anyway, but only due to a bug
which checks raft topology mode config before it is set, so the code
thinks that it is in the gossiper mode and loads peers table into the
gossiper and token metadata. Fix the bug and load peers into the gossiper
only since token metadata is managed by raft.
The series also removes address map related test that no longer checks
anything and replace it with unit test.
It also adds the dc/rack check to "join node" rpc. The check is done
during shadow round now, but for it to work it requires dc/rack to be
propagated through the gossiper and we want to eventually drop it.
Ref: scylladb/scylladb#21777
* 'load-peers' of https://github.com/gleb-cloudius/scylla:
topology coordinator: reject replace request if topology does not match
gossiper: fix the logic of shadow_round parameter
storage_service: do not add endpoint to the gossiper during topology loading.
storage_service: load peers into gossiper on boot in raft topology mode
storage_service: set raft topology change mode before using it in join_cluster
locator: drop inet_address usage to figure out per dc/rack replication
test: drop test_old_ip_notification_repro.py
test: address_map: check generation handling during entry addition
Developers using asyncio.gather() often assume that it waits for all futures (awaitables) givens.
But this isn't true when the return_exceptions parameter is False, which is the default.
In that case, as soon as one future completes with an exception, the gather() call will return this exception immediately, and some of the finished tasks may continue to run in the background.
This is bad for applications that use gather() to ensure that a list of background tasks has all completed.
So such applications must use asyncio.gather() with return_exceptions=True, to wait for all given futures to complete either successfully or unsuccessfully.
Closesscylladb/scylladb#22252
Said fields in statistics are of type
`disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled
as array of regular strings. However these fields store exploded
clustering keys, so the elements store binary data and converting to
string can yield invalid UTF-8 characters that certain JSON parsers (jq,
or python's json) can choke on. Fix this by treating them as binary and
using `to_hex()` to convert them to string. This requires some massaging
of the json_dumper: passing field offset to all visit() methods and
using a caller-provided disk-string to sstring converter to convert disk
strings to sstring, so in the case of statistics, these fields can be
intercepted and properly handled.
While at it, the type of these fields is also fixed in the
documentation.
Before:
"min_column_names": [
"��Z���\u0011�\u0012ŷ4^��<",
"�2y\u0000�}\u007f"
],
"max_column_names": [
"��Z���\u0011�\u0012ŷ4^��<",
"}��B\u0019l%^"
],
After:
"min_column_names": [
"9dd55a92bc8811ef12c5b7345eadf73c",
"80327900e2827d7f"
],
"max_column_names": [
"9dd55a92bc8811ef12c5b7345eadf73c",
"7df79242196c255e"
],
Fixes: #22078Closesscylladb/scylladb#22225
scylla-sstable tries to read scylla.yaml via the following sequence:
1) Use user-provided location is provided (--scylla-yaml-file parameter)
2) Use the environment variables SCYLLA_HOME and/or SCYLLA_CONF if set
3) Use the default location ./conf/scylla.yaml
Step 3 is fine on dev machines, where the binaries are usually invoked
from scylla.git, which does have conf/scylla.yaml, but it doesn't work
on production machines, where the default location for scylla.yaml is
/etc/scylla/scylla.yaml. To reduce friction when used on production
machines, add another fallback in case (3) fails, which tries to read
scylla.yaml from /etc/scylla/scylla.yaml location.
Fixes: scylladb/scylladb#22202Closesscylladb/scylladb#22241
When destroying a test cluster, ScyllaCluster.stop() calls ScyllaServer.stop()
for each running server. Previously, non-zero exit status codes from scylla
servers were silently ignored during test teardown.
This change modifies the logging behavior to print the exit status code when
a scylla server exits with a non-zero status. This helps developers quickly
identify potential issues or unexpected terminations during test runs.
Differences in handling:
- Before: Non-zero exit codes were not logged
- After: Non-zero exit codes are printed, providing visibility into
server termination errors
This improvement aids in diagnosing intermittent test failures or
unexpected server shutdowns during test execution.
Refs #21742
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21934
Previously, we created a vector<utils_json::histogram> and returned it
by copying into a future. Since histogram is a JSON representation of
ihistogram, it can be heavyweight, making the vector copy overhead
significant.
Now we move the vector into the returned future instead of copying it,
eliminating the deep copy overhead. The APIs backed by this function
are marked deprecated, so this performance improvement is not that
important.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22004
In 0b0e661a85, we brought abseil back as a submodule, and we
added absl::headers as an interface library for importing
abseil headers' include directory. And:
```console
$ patchelf --print-rpath build/RelWithDebInfo/scylla
/home/kefu/dev/scylla/idl/absl::headers
```
In this change, we remove `absl::headers` from
`target_link_directories()` as it's an interface library that only
provides header files, not linkable libraries. This fixes the incorrect
inclusion of absl::headers in the rpath of the scylla executable.
Additionally, remove abseil library dependencies from the idl target
since none of the idl source files directly include abseil headers.
After this change,
```console
$ patchelf --print-rpath build/RelWithDebInfo/scylla
```
the output of `pathelf` is now empty.
Fixes#22265
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22266
We restore a snapshot of table by streaming the sstables of
the given snapshot of the table using
`sstable_streamer::stream_sstable_mutations()` in batches. This function
reads mutations from a set of sstables, and streams them to the target
nodes. Due to the limit of this function, we are not able to track the
progress in bytes.
Previously, progress tracking used individual sstables as units, which caused
inaccuracies with tablet-distributed tables, where:
- An sstable spanning multiple tablets could be counted multiple times
- Progress reporting could become misleading (e.g., showing "40" progress
for a table with 10 sstables)
This change introduces a more robust progress tracking method:
- Use "batch" as the unit of progress instead of individual sstables.
Each batch represents a tablet when restoring a table snapshot if
the tablet being restored is distributed with tablets. When it comes
to tables distributed with vnode, each batch represents an sstable.
- Stream sstables for each tablet separately, handling both partially and
fully contained sstables
- Calculate progress based on the total number of sstables being streamed
- Skip tablet IDs with no owned tokens
For vnode-distributed tables, the number of "batches" directly corresponds
to the number of sstables, ensuring:
- Consistent progress reporting across different table distribution models
- Simplified implementation
- Accurate representation of restore progress
The new approach provides a more reliable and uniform method of tracking
restoration progress across different table distribution strategies.
Also, Corrected the use of `_sstables.size()` in
`sstable_streamer::stream_sstables()`. It addressed a review comment
from Pavel that was inadvertently overlooked during previous rebasing
the commit of 5ab4932f34.
Fixesscylladb/scylladb#21816
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21841
Before this commit, there doesn't seem to have been a test verifying that
starting and shutting down Scylla behave correctly when the configuration
option `view_building` is set to false. In these changes, we add one.