Copy `commitlog_test.py` from scylla-dtest test suite and make it works with `test.py`
As a part of the porting process, remove unused imports and markers, remove non-next_gating tests and tests marked with `skip`, 'skip_if', and `xfail` markers.
test.py uses `commitlog` directory instead of dtest's `commitlogs`.
Also, add `commitlog_segment_size_in_mb: 32` option to test_stop_failure_policy to make _provoke_commitlog_failure
work.
Tests `test_total_space_limit_of_commitlog_with_large_limit` and `test_total_space_limit_of_commitlog_with_medium_limit` use too much disk space and have too big execution time. Keep them in scylla-dtest for now.
Enable the test in `suite.yaml` (run in dev mode only.)
Additional modifications to test.py/dtest shim code:
- add ScyllaCluster.flush() method
- add ScyllaNode.stress() method
- add tools/files.py::corrupt_file() function
- add tools/data.py::run_query_with_data_processing() function
- copy some assertions from dtest
Also add missed mode restriction for auth_test.py file.
Closesscylladb/scylladb#24946
* github.com:scylladb/scylladb:
test.py: dtest: remove slow and greedy tests from commitlog_test.py
test.py: dtest: make commitlog_test.py run using test.py
test.py: dtest: add ScyllaCluster.flush() method
test.py: dtest: add ScyllaNode.stress() method
test.py: dtest: add tools/data.py::run_query_with_data_processing() function
test.py: dtest: add tools/files.py::corrupt_file() function
test.py: dtest: copy some assertions from dtest
test.py: dtest: copy unmodified commitlog_test.py
This PR extends the `tmpdir` class with an option to preserve the directory if the destructor is called during stack unwinding. It also uses this feature in KMIP tests, where the tmpdir contains PyKMIP server logs, which may be useful when diagnosing test failures.
Fixes#25339.
Not so important to be backported.
Closesscylladb/scylladb#25367
* github.com:scylladb/scylladb:
encryption_at_rest_test: Preserve tmpdir from failing KMIP tests
test/lib: Add option to preserve tmpdir on exception
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.
Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.
This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:
The repaired_at on disk field of a SSTable is used.
- A 64-bit number increases sequentially
The sstables_repaired_at is added to the system.tablets table.
- repaired_at <= sstables_repaired_at means the sstable is repaired
The being_repaired in memory field of a SSTable is added.
- A repair UUID tells which sstable has participated in the repair
Initial test results:
1) Medium dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~500GB
Cluster pre-populated with ~500GB of data before starting repairs job.
Results for Repair Timings:
The regular repair run took 210 mins.
Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
The speedup is: 183 mins / 48s = 228X
2) Small dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~167GB
Cluster pre-populated with ~167GB of data before starting the repairs job.
Regular repair 1st run took 110s, 2nd and 3rd runs took 110s.
Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
The speedup is: 110s / 1.5s = 73X
3) Large dataset results
Node amount: 6
Instance type: i4i.2xlarge, 3 racks
50% of base load, 50% read/write
Dataset == Sum of data on each node
Dataset Non-incremental repair (minutes)
1.3 TiB 31:07
3.5 TiB 25:10
5.0 TiB 19:03
6.3 TiB 31:42
Dataset Incremental repair (minutes)
1.3 TiB 24:32
3.0 TiB 13:06
4.0 TiB 5:23
4.8 TiB 7:14
5.6 TiB 3:58
6.3 TiB 7:33
7.0 TiB 6:55
Fixes#22472Closesscylladb/scylladb#24291
* github.com:scylladb/scylladb:
replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
compaction: Move compaction_reenabler to compaction_reenabler.hh
topology_coordinator: Make rpc::remote_verb_error to warning level
repair: Add metrics for sstable bytes read and skipped from sstables
test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
test.py: Add tests for tablet incremental repair
repair: Add tablet incremental repair support
compaction: Add tablet incremental repair support
feature_service: Add TABLET_INCREMENTAL_REPAIR feature
tablet_allocator: Add tablet_force_tablet_count_increase and decrease
repair: Add incremental helpers
sstable: Add being_repaired to sstable
sstables: Add set_repaired_at to metadata_collector
mutation_compactor: Introduce add operator to compaction_stats
tablet: Add sstables_repaired_at to system.tablets table
test: Fix drain api in task_manager_client.py
When using automatic rust build tools in IDE,
the files generated in `rust/target/` directory
has been treated by git as unstaged changes.
After the change, the generated files will not
pollute the git changes interface.
Closesscylladb/scylladb#25389
endpoint_filter() is used by batchlog to select nodes to replicate
to.
It contains an unordered_multimap data structure that maps rack names
to nodes.
It misuses std::unordered_map::bucket_count() to count the number of
racks. While values that share a key in a multimap will definitly
be in the same bucket, it's possible for values that don't share a
key to share a bucket. Therefore bucket_count() undercounts the
number of racks.
Fix this by using a more accurate data structure: a map of a set.
The patch changes validated.bucket_count() to validated.size()
and validated.size() to a new variable nr_validated.
The patch does cause an extra two allocations per rack (one for the
unordered_map node, one for the unordered_set bucket vector), but
this is only used for logged batches, so it is amortized over all
the mutations in the logged batch.
Closesscylladb/scylladb#25493
When the user disables CDC on a table, the CDC log table is not removed.
Instead, it's detached from the base table, and it functions as a normal
table (with some differences). If that log table lives up to the point
when the user re-enabled CDC on the base table, instead of creating a new
log table, the old one is re-attached to the base.
For more context on that, see commit:
scylladb/scylladb@adda43edc7.
In this commit, we add validation tests that check whether the changes
on the base table after disabling CDC are reflected on the log table
after re-enabling CDC. The definition of the log table should be the same
as if CDC had never been disabled.
Closesscylladb/scylladb#25071
This pull request introduces minor code refactoring and aesthetic improvements to the S3 client and its associated test suite. The changes focus on enhancing readability, consistency, and maintainability without altering any functional behavior.
No backport is required, as the modifications are purely cosmetic and do not impact functionality or compatibility.
Closesscylladb/scylladb#25490
* github.com:scylladb/scylladb:
s3_client: relocate `req` creation closer to usage
s3_client: reformat long logging lines for readability
s3_test: extract file writing code to a function
Flush failure with seastar::named_gate_closed_exception is expected
if a respective compaction group was already stopped.
Lower the severity of a log in dirty_memory_manager::flush_one
for this exception.
Fixes: https://github.com/scylladb/scylladb/issues/25037.
Closesscylladb/scylladb#25355
Currently, when a container or smart pointer holds a const payload
type, utils::clear_gently does not detect the object's clear_gently
method as the method is non-const and requires a mutable object,
as in the following example in class tablet_metadata:
```
using tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>;
using table_to_tablet_map = std::unordered_map<table_id, tablet_map_ptr>;
```
That said, when a container is cleared gently the elements it holds
are destroyed anyhow, so we'd like to allow to clear them gently before
destruction.
This change still doesn't allow directly calling utils::clear_gently
an const objects.
And respective unit tests.
Fixes#24605Fixed#25026
* This is an optimization that is not strictly required to backport (as https://github.com/scylladb/scylladb/pull/24618 dealt with clear_gently of `tablet_map_ptr = foreign_ptr<lw_shared_ptr<const tablet_map>>` well enough)
Closesscylladb/scylladb#24606
* github.com:scylladb/scylladb:
utils: stall_free: detect clear_gently method of const payload types
utils: stall_free: clear gently a foreign shared ptr only when use_count==1
Tests test_total_space_limit_of_commitlog_with_large_limit and
test_total_space_limit_of_commitlog_with_medium_limit use too much
disk space and have too big execution time. Keep them in
scylla-dtest for now.
As a part of the porting process, remove unused imports and
markers, remove non-next_gating tests and tests marked with
`skip`, 'skip_if', and `xfail` markers.
test.py uses `commitlog` directory instead of dtest's
`commitlogs`.
Remove test_stop_failure_policy test because the way how it
provoke commitlog failure (change file permission) doesn't
work on CI.
Enable the test in suite.yaml (run in dev mode only)
Implement repetition of files using `pytest_collect_file` hook: run file collection as many times as needed to cover all `--mode`/`--repeat` combinations. Store build mode and run ID to the stash of repeated item.
Some additional changes done:
- Add `TestSuiteConfig` class to handle all operations with `test_config.yaml`
- Add support for `run_first` option in `test_config.yaml`
- Move disabled test logic to `pytest_collect_file` hook.
These changes allow to to remove custom logic for `--mode`, `--repeat`, and disabled tests in the code for C++ tests and prepare for switching of Python/CQLApproval/Topology tests to pytest runner.
Also, this PR includes required refactoring changes and fixes:
- Simplify support of C++ tests: remove redundant facade abstraction and put all code into 3 files: `base.py`, `boost.py`, and `unit.py`
- Remove unused imports in `test.py`
- Use the constant for `"suite.yaml"` string
- Some test suites have own test runners based on pytest, and they don't need all stuff we use for `test.py`. Move all code related to `test.py` framework to `test/pylib/runner.py` and use it as a plugin conditionally (by using `SCYLLA_TEST_RUNNER` env variable.)
- Add `cwd` parameter to `run_process()` methods in `resource_gather` module to avoid using of `os.chdir()` (and sort parameters in the same order as in `subprocess.Popen`.)
- `extra_scylla_cmdline_options` is a list of commandline arguments and, actually, each argument should be a separate item. Few configuration files have `--reactor-backend` option added in the format which doesn't follow this rule.
This PR is a refactoring step for https://github.com/scylladb/scylladb/pull/25443Closesscylladb/scylladb#25465
* github.com:scylladb/scylladb:
test.py: pytest: support --mode/--repeat in a common way for all tests
test.py: pytest: streamline suite configuration handling
test.py: refactor: remove unused imports in test.py
test.py: fix run with bare pytest after merge of scylladb/scylladb#24573
test.py: refactor: move framework-related code to test.pylib.runner
test.py: resource_gather: add cwd parameter to run_process()
test.py: refactor: use proper format for extra_scylla_cmdline_options
This is yet another part in the BTI index project.
Overarching issue: https://github.com/scylladb/scylladb/issues/19191
Previous part: https://github.com/scylladb/scylladb/pull/25396
Next part: implementing sstable index writers and readers on top of the abstract trie writers/readers.
The new code added in this PR isn't used outside of tests yet, but it's posted as a separate PR for reviewability.
This series provides translation routines for ring positions and clustering positions
from Scylla's native in-memory structures to BTI's byte-comparable encoding.
This translation is performed whenever a new decorated key or clustering block
are added to a BTI index, and whenever a BTI index is queried for a range of positions.
For a description of the encoding, see
fad1f74570/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md (multi-component-sequences-partition-or-clustering-keys-tuples-bounds-and-nulls)
The translation logic, with all the fragment awareness, lazy
evaluation and avoidable copies, is fairly bloated for the common cases
of simple and small keys. This is a potential optimization target for later.
No backports needed, new functionality.
Closesscylladb/scylladb#25506
* github.com:scylladb/scylladb:
sstables/trie: add BTI key translation routines
tests/lib: extract generate_all_strings to test/lib
tests/lib: extract nondeterministic_choice_stack to test/lib
sstables/trie/trie_traversal: extract comparable_bytes_iterator to its own file
sstables/mx: move clustering_info from writer.cc to types.hh
sstables/trie: allow `comparable_bytes_iterator` to return a mutable span
dht/ring_position: add ring_position_view::weight()
This could happen in case the peer node is in shutdown. This is not
something we can not recovery. The log level should be warning instead
of error which our dtest catches for failure of a test.
This was observed in test_repair_one_node_alter_rf dtest.
scylla_repair_inc_sst_skipped_bytes: Total number of bytes skipped from
sstables for incremental repair on this shard.
scylla_repair_inc_sst_read_bytes : Total number of bytes read from
sstables for incremental repair on this shard.
The following tests are added for tablet incremental repair:
- Basic incremental repair
- Basic incremental repair with error
- Minor compaction and incremental repair
- Major compaction and incremental repair
- Scrub compaction and incremental repair
- Cleanup/Upgrade compaction and incremental repair
- Tablet split and incremental repair
- Tablet merge and incremental repair
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.
Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.
This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:
The repaired_at on disk field of a SSTable is used.
- A 64-bit number increases sequentially
The sstables_repaired_at is added to the system.tablets table.
- repaired_at <= sstables_repaired_at means the sstable is repaired
The being_repaired in memory field of a SSTable is added.
- A repair UUID tells which sstable has participated in the repair
Initial test results:
1) Medium dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~500GB
Cluster pre-populated with ~500GB of data before starting repairs job.
Results for Repair Timings:
The regular repair run took 210 mins.
Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
The speedup is: 183 mins / 48s = 228X
2) Small dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~167GB
Cluster pre-populated with ~167GB of data before starting the repairs job.
Regular repair 1st run took 110s, 2nd and 3rd runs took 110s.
Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
The speedup is: 110s / 1.5s = 73X
3) Large dataset results
Node amount: 6
Instance type: i4i.2xlarge, 3 racks
50% of base load, 50% read/write
Dataset == Sum of data on each node
Dataset Non-incremental repair (minutes)
1.3 TiB 31:07
3.5 TiB 25:10
5.0 TiB 19:03
6.3 TiB 31:42
Dataset Incremental repair (minutes)
1.3 TiB 24:32
3.0 TiB 13:06
4.0 TiB 5:23
4.8 TiB 7:14
5.6 TiB 3:58
6.3 TiB 7:33
7.0 TiB 6:55
Fixes#22472
This patch addes incremental_repair support in compaction.
- The sstables are split into repaired and unrepaired set.
- Repaired and unrepaired set compact sperately.
- The repaired_at from sstable and sstables_repaired_at from
system.tablets table are used to decide if a sstable is repaired or
not.
- Different compactions tasks, e.g., minor, major, scrub, split, are
serialized with tablet repair.
Implement repetition of files using pytest_collect_file hook: run
file collection as many times as needed to cover all --mode/--repeat
combinations. Also move disabled test logic to this hook.
Store build mode and run_id in pytest item stashes.
Simplify support of C++ tests: remove redundant facade abstraction and put
all code into 3 files: base.py, boost.py, and unit.py
Add support for `run_first` option in test_config.yaml
To run tests with bare pytest command we need to have almost the
same set of options as test.py because we reuse code from test.py.
scylladb/scylladb#24573 added `--pytest-arg` option to test.py but
not to test/conftest.py which breaks running Python tests using
bare pytest command.
Some test suites have own test runners based on pytest, and they
don't need all stuff we use for test.py. Move all code related to
test.py framework to test/pylib/runner.py and use it as a plugin
conditionally (by using TEST_RUNNER variable.)
`extra_scylla_cmdline_options` is a list of commandline arguments
and, actually, each argument should be a separate item. Few configuration
files have `--reactor-backend` option added in the format which doesn't
follow this rule.
This file provides translation routines for ring positions and clustering positions
from Scylla's native in-memory structures to BTI's byte-comparable encoding.
This translation is performed whenever a new decorated key or clustering block
are added to a BTI index, and whenever a BTI index is queried for a range of positions.
For a description of the encoding, see
fad1f74570/src/java/org/apache/cassandra/utils/bytecomparable/ByteComparable.md (multi-component-sequences-partition-or-clustering-keys-tuples-bounds-and-nulls)
The translation logic, with all the fragment awareness, lazy
evaluation and avoidable copies, is fairly bloated for the common cases
of simple and small keys. This is a potential optimization target for later.
Before these changes, the logs in hinted handoff often didn't provide
crucial information like the identifier of the node that hints were
being sent to. Also, some of the logs were misleading and referred to
other places in the code than the one where an exception or some other
situation really occurred.
We modify those logs, extending them by more valuable information
and fixing existing issues. What's more, all of the logs in
`hint_endpoint_manager` and `hint_sender` follow a consistent format
now:
```
<class_name>[<destination host ID>]:<function_name>: <message>
```
This way, we should always have AT LEAST the basic information.
Fixesscylladb/scylladb#25466
Backport:
There is no risk in backporting these changes. They only have
impact on the logs. On the other hand, they might prove helpful
when debugging an issue in hinted handoff.
Closesscylladb/scylladb#25470
* github.com:scylladb/scylladb:
db/hints: Add new logs
db/hints: Adjust log levels
db/hints: Improve logs
The test creates all driver sessions by itself. As a consequence, all
sessions use the default request timeout of 10s. This can be too low for
the debug mode, as observed in scylladb/scylla-enterprise#5601.
In this commit, we change the test to use `cluster_con`, so that the
sessions have the request timeout set to 200s from now on.
Fixesscylladb/scylla-enterprise#5601
This commit changes only the test and is a CI stability improvement,
so it should be backported all the way to 2024.2. 2024.1 doesn't have
this test.
Closesscylladb/scylladb#25510
follow-up PR after fast fix https://github.com/scylladb/scylladb/pull/25394
should be merged only after - https://github.com/scylladb/scylla-pkg/pull/5414
Since boost tests run via pure pytest, we can finally run tests using
-k=EXPRESSION pytest argument. This expression will be applied to the "test
function". So it will be possible to run: subset of test functions that match patterns across all boosts tests(functions)
arguments --skip and -k are mutually exclusive
due to -k extends --skip functionality
examples:
```
./build/release/test/boost/auth_passwords_test --list_content
passwords_are_salted*
correct_passwords_authenticate*
incorrect_passwords_do_not_authenticate*
./test.py --mode=dev -k="correct" -vv test/boost/auth_passwords_test.cc
PASSED test/boost/auth_passwords_test.cc::incorrect_passwords_do_not_authenticate.dev.1
PASSED test/boost/auth_passwords_test.cc::correct_passwords_authenticate.dev.1
./test.py --mode=dev -k="not incorrect and not passwords_are_salted" -vv test/boost/auth_passwords_test.cc
PASSED test/boost/auth_passwords_test.cc::correct_passwords_authenticate.dev.1
./test.py --mode=dev --skip=incorrect --skip=passwords_are_salted -vv test/boost/auth_passwords_test.cc
PASSED test/boost/auth_passwords_test.cc::correct_passwords_authenticate.dev.1
./test.py --mode=dev -k="correct and not incorrect" -vv test/boost/auth_passwords_test.cc
ASSED test/boost/auth_passwords_test.cc::correct_passwords_authenticate.dev.1
```
Closesscylladb/scylladb#25400
* github.com:scylladb/scylladb:
test.py: add -k=EXPRESSION pytest argument support for boost tests.
test.py: small refactoring of how boost test arguments make
The test_drop_quarantined_sstables test could fail due to a race between
compaction and quarantining of SSTables. If compaction selects
an SSTable before it is moved to quarantine, and change_state is called during
compaction, the SSTable may already be removed, resulting in a
std::filesystem_error due to missing files.
This patch resolves the issue by wrapping the quarantine operation inside
run_with_compaction_disabled(). This ensures compaction is paused on the
compaction group view while SSTables are being quarantined, preventing the
race.
Additionally, updates the test to quarantine up to 1/5 SSTables instead
of one randomly and increases the number of sstables genereted to improve
test scenario.
Fixesscylladb/scylladb#25487Closesscylladb/scylladb#25494
Users with single-column partition keys that contain colon characters
were unable to use certain REST APIs and 'nodetool' commands, because the
API split key by colon regardless of the partition key schema.
Affected commands:
- 'nodetool getendpoints'
- 'nodetool getsstables'
Affected endpoints:
- '/column_family/sstables/by_key'
- '/storage_service/natural_endpoints'
Refs: #16596 - This does not fully fix the issue, as users with compound
keys will face the issue if any column of the partition key contains
a colon character.
Closesscylladb/scylladb#24829
Enable runtime updates of vector_store_uri configuration without
requiring server restart.
This allows to dynamically enable, disable, or switch the vector search service endpoint on the fly.
To improve the clarity the seastar::experimental::http::client is now wrapped in a private http_client class that also holds the host, address, and port information.
Tests have been added to verify that the client correctly handles transitions between enabled/disabled states and successfully switches traffic to a new endpoint after a configuration update.
Closes: VECTOR-102
No backport is needed as this is a new feature.
Closesscylladb/scylladb#25208
* github.com:scylladb/scylladb:
service/vector_store_client: Add live configuration update support
test/boost/vector_store_client_test.cc: Refactor vector store client test
service/vector_store_client: Refactor host_port struct created
service/vector_store_client: Refactor HTTP request creation
This change includes basic optimizations to
locator::describe_ring, mainly caching the per-endpoint information in an unordered_map instead of looking them up in every inner-loop.
This yields an improvement of 20% in cpu time.
With 45 nodes organized as 3 dcs, 3 racks per dc, 5 nodes per rack, 256 tokens per node, yielding 11520 ranges and 9 replicas per range, describe_ring took Before: 30 milliseconds (2.6 microseconds per range) After: 24 milliseconds (2.1 microseconds per range)
Add respective unit test for vnode keyspace
and for tablets.
Fixes#24887
* backport up to 2025.1 as describe_ring slowness was hit in the field with large clusters
Closesscylladb/scylladb#24889
* github.com:scylladb/scylladb:
locator: util: optimize describe_ring
locator: util: construct_range_to_endpoint_map: pass is_vnode=true to get_natural_replicas
vnode_effective_replication_map: do_get_replicas: throw internal error if token not found in map
locator: effective_replication_map: get_natural_replicas: get is_vnode param
test: cluster: test_repair: add test_vnode_keyspace_describe_ring