The handler in question when called for tablets-enabled keyspace, returns ranges that are inconsistent with those from system.tablets. Like this:
system.tablets:
```
TabletReplicas(last_token=-4611686018427387905, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 0)])
TabletReplicas(last_token=-1, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
TabletReplicas(last_token=4611686018427387903, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 1), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)])
TabletReplicas(last_token=9223372036854775807, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 1), ('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0)])
```
range_to_endpoint_map:
```
{'key': ['-9069053676502949657', '-8925522303269734226'], 'value': ['127.110.40.2', '127.110.40.3']}
{'key': ['-8925522303269734226', '-8868737574445419305'], 'value': ['127.110.40.2', '127.110.40.3']}
...
{'key': ['-337928553869203886', '-288500562444694340'], 'value': ['127.110.40.1', '127.110.40.3']}
{'key': ['-288500562444694340', '105026475358661740'], 'value': ['127.110.40.1', '127.110.40.3']}
{'key': ['105026475358661740', '611365860935890281'], 'value': ['127.110.40.1', '127.110.40.3']}
...
{'key': ['8307064440200319556', '9117218379311179096'], 'value': ['127.110.40.2', '127.110.40.1']}
{'key': ['9117218379311179096', '9125431458286674075'], 'value': ['127.110.40.2', '127.110.40.1']}
```
Not only the number of ranges differs, but also separating tokens do not match (e.g. tokens -2 and 0 belong to different tablets according to system.tablets, but fall into the same "range" in the API result).
The source of confusion is that despite storage_service::get_range_to_address_map() is given correct e.r.m. pointer from the table, it still uses token_metadata::sorted_token() to work with. The fix is -- when the e.r.m. is per-table, the tokens should be get from token_metadata's tablet_map (e.g. compare this to storage_service::effective_ownership() -- it grabs tokens differently for vnodes/tables cases).
This PR fixes the mentioned problem and adds validation test. The test also checks /storage_service/describe_ring endpoint that happens to return correct set of values.
The API is very ancient, so the bug is present in all versions with tablets
Fixes#26331Closesscylladb/scylladb#26231
* github.com:scylladb/scylladb:
test: Add validation of data returned by /storage_service endpoints
test,lib: Add range_to_endpoint_map() method to rest client
api: Indentation fix after previous patches
storage_service: Get tablet tokens if e.r.m. is per-table
storage_service,api: Get e.r.m. inside get_range_to_address_map()
storage_service: Calculate tokens on stack
Before this patch, the test `cluster/test_alternator::test_localnodes_joining_nodes` was one of the slowest tests in the test/cluster framework, taking over two minutes to run.
As comments in the test already acknowledged, there was no good reason why this test had to be so slow. The test needed to, intentionally, boot a server which took a long time (2 minutes) to fail its boot. But it didn't really need to wait for this failure - the right thing to do was to just kill the server at the end of the test. But we just didn't have the test-framework API to do it. So in this series, the first patch introduces the missing API, and the second patch uses it to fix test_localnodes_joining_nodes to kill the (unsuccessfully) booting server.
After this patch, the test takes just 7 seconds to run.
This is a test speedup only, so no real need to backport it - old release anyway get fewer test runs and the latency of these runs is less important.
Closesscylladb/scylladb#25312
* github.com:scylladb/scylladb:
test/cluster: greatly speed up test_localnodes_joining_nodes
test/pylib: add the ability to stop currently-starting servers
1. Remove dumping cluster logs and print only the link to the log.
2. Fail the test (to fail CI and not ignore the problem) and mark the cluster as dirty (to avoid affecting subsequent tests) in case setup/teardown fails.
3. Add 2 cqlpy tests that fail after applying step 2 to the dirties_cluster list so the cluster is discarded afterward.
Closesscylladb/scylladb#26183
Some tests need the ability to abruptly stop a server in the test
cluster before it fully booted - e.g., because the test knows (and
perhaps even expects) that the boot is hung. But before this patch,
manager.server_stop() could only kill servers in "running" state.
This patch adds to pylib tracking of "starting" servers - servers which
we are starting but haven't finished booting - their list can be
returned by the manager.starting_servers(). The manage.server_stop
function can now kill a server which is just starting - not just
"running" servers.
To avoid breaking existing tests, manager.all_servers() continues to
return just running and stopped servers - not "starting" servers.
By the way, when a starting server is killed, it is not listed as stopped -
it just behaves as a normal failure to add the server, and not as a
server which successfully joined the cluster but was later stopped.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Instead of re-inventing empty param handling, use the built-in
keep_blank_values=True param of the urllib.parse.parse_qs().
Handles correctly the case where the `=` is also present but no value
follows, this is the sytnax used by the new query_params in
seastar::http::request.
Also add an exception to build_POST_response(). Better than a cryptic
message about encode() not callable on NoneType.
test_two_tablets_concurrent_repair_and_migration_repair_writer_level waits
for the first node that logs info about repair_writer using asyncio.wait.
The done group is never awaited, so we never learn about the error.
The test itself is incorrect and the log about repair_writer is never
printed. We never learn about that and tests finishes successfully
after 10 minutes timeout.
Fix the test:
- disable hinted handoff;
- repair tablets of the whole table:
- new table is added so that concurrent migration is possible;
- use wait_for_first_completed that awaits done group;
- do some cleanups.
Remove nightly mark.
Fixes: #26148.
Closesscylladb/scylladb#26209
Sometimes it's convenient to use slashes in injection names,
for example my_component/my_method/my_condition. Without quote()
we get 'handler not found' error from Scylla.
Add another level of verbosity: quiet.
Before this it was used as a default one, but it provides not enough information.
These changes should be coupled with pytest-sugar plugin to have an intended information for each level.
Invoke the pytest as a module, instead of a separate process, to get access to the terminal to be able to it interactively.
Framework change only, so backporting in to 2025.3
Fixes: #25403Closesscylladb/scylladb#25698
* github.com:scylladb/scylladb:
test.py: add additional level of verbosity for output
test.py: start pytest as a module instead of subprocess
This patch introduces a new `incremental_mode` parameter to the tablet
repair REST API, providing more fine-grained control over the
incremental repair process.
Previously, incremental repair was on and could not be turned off. This
change allows users to select from three distinct modes:
- `regular`: This is the default mode. It performs a standard
incremental repair, processing only unrepaired sstables and skipping
those that are already repaired. The repair state (`repaired_at`,
`sstables_repaired_at`) is updated.
- `full`: This mode forces the repair to process all sstables, including
those that have been previously repaired. This is useful when a full
data validation is needed without disabling the incremental repair
feature. The repair state is updated.
- `disabled`: This mode completely disables the incremental repair logic
for the current repair operation. It behaves like a classic
(pre-incremental) repair, and it does not update any incremental
repair state (`repaired_at` in sstables or `sstables_repaired_at` in
the system.tablets table).
The implementation includes:
- Adding the `incremental_mode` parameter to the
`/storage_service/repair/tablet` API endpoint.
- Updating the internal repair logic to handle the different modes.
- Adding a new test case to verify the behavior of each mode.
- Updating the API documentation and developer documentation.
Fixes#25605Closesscylladb/scylladb#25693
Following up on 6129411a5e
improve test_vnode_keyspace_describe_ring be verifying that the
endpoints listed by describe_ring match those returned by the
`natural_endpoints` api (for random tokens).
The latter are calculated using an independent code path
directly from the effective_replication_map.
* test exists currently only on master, no backport required
Closesscylladb/scylladb#25610
* github.com:scylladb/scylladb:
test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints
test/pylib/rest_client: add natural_endpoints function
The storage submodule contains tests that require mounted volumes
to be executed. The volumes are created automatically with the
`volumes_factory` fixture.
The tests in this suite are executed with the custom launcher
`unshare -mr pytest`
Test scenarios (when one node reaches critical disk utilization):
1. Reject user table writes
2. Disable/Enabled compaction
3. Reject split compactions
4. New split compactions not triggered
5. Abort tablet repair
6. Disable/Enabled incoming tablet migrations
7. Restart a node while a tablet split is triggered
Currently, workdir is set in ScyllaCluster constructor and it does
not take into accout that the value could be overridden via cmdline
arguments. When this happens, then some data (logs, configs) are
stored under one path and other (data) is stored under a different.
The patch allows overriding the value when passed via cmdline arguments
leading to all files being stored under the same path.
After changing the type of the `recovery_leader` config option from
`sstring` to `UUID` in #25032, setting `recovery_leader` to an empty
string became an incorrect way to unset it. The following error started
to appear in the recovery procedure tests:
```
init - marshaling error: UUID string size mismatch: '' : recovery_leader
```
We unset `recovery_leader` properly in this PR. To do it, we introduce
a simple way to remove config options in tests.
Backport is unneeded. This error was harmless, and Scylla ignored
`recovery_leader` after logging the error as expected by the tests.
Closesscylladb/scylladb#25365
* github.com:scylladb/scylladb:
test: properly unset recovery_leader in the recovery procedure tests
test: manager_client: allow removing a config option
test: manager_client: add docstring to server_update_config
There is the stash item REPEATED_FILES for directory items which used to cut
recursion. But if multiple tests from one directory added to ./test.py
commandline this solution prevents handling non-first tests well because
it was already collected for the first one. Change behavior to not store
all repeated files in the stash but just files which are in the process
of repetition. Rename the stash item to REPEATING_FILES to reflect this
change.
Closesscylladb/scylladb#25611
CI can run several test.py sessions on different machines (builders) for one build and, and to be not overwritten, .db file with metrics need to have some unique name: add host_id as we already do for .xml report in `run_pytest()`
Also add host_id columns to metric tables in case we will somehow aggregate .db files.
Add host_id suffix to `toxiproxy_server.log` for the same reason.
Fixes: https://github.com/scylladb/scylladb/issues/25462Closesscylladb/scylladb#25542
* github.com:scylladb/scylladb:
test.py: add host_id suffix to toxiproxy_server.log
test.py: metrics: add host_id suffix to .db file
CI can run several test.py sessions on different machines (builders) for
one build and, and to be not overwritten, .db file with metrics need to
have some unique name: add host_id as we already do for .xml report in
run_pytest()
Also add host_id columns to metric tables in case we will somehow
aggregate .db files.
The central idea of incremental repair is to allow repair participants
to select and repair only a portion of the dataset to speed up the
repair process. All repair participants must utilize an identical
selection method to repair and synchronize the same selected dataset.
There are two primary selection methods: time-based and file-based. The
time-based method selects data within a specified time frame. It is
versatile but it is less efficient because it requires reading all of
the dataset and omitting data beyond the time frame. The file-based
method selects data from unrepaired SSTables and is more efficient
because it allows the entire SSTable to be omitted. This document patch
implements the file-based selection method.
Incremental repair will only be supported for tablet tables; it will not
be supported for vnode tables. On one hand, the legacy vnode is less
important to support. On the other hand, the incremental repair for
vnode is much harder to implement. With vnodes, a SSTalbe could contain
data for multiple vnode ranges. When a given vnode range is repaired,
only a portion of the SSTable is repaired. This complicates the
manipulation of SSTables significantly during both repair and
compaction. With tablets, an entire tablet is repaired so that a
sstable is either fully repaired or not repaired which is a huge
simplification.
This patch uses the repaired_at from sstables::statistics component to
mark a sstable as repaired. It uses a virtual clock as the repair
timestamp, i.e., using a monotonically increasing number for the
repaired_at field of a SSTable and sstables_repaired_at column in
system.tablets table. Notice that when a sstable is not repaired, the
repaired_at field will be set to the default value 0 by default. The
being_repaired in memory field of a SSTable is used to explicitly mark
that a SSTable is being selected. The following variables are used for
incremental repair:
The repaired_at on disk field of a SSTable is used.
- A 64-bit number increases sequentially
The sstables_repaired_at is added to the system.tablets table.
- repaired_at <= sstables_repaired_at means the sstable is repaired
The being_repaired in memory field of a SSTable is added.
- A repair UUID tells which sstable has participated in the repair
Initial test results:
1) Medium dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~500GB
Cluster pre-populated with ~500GB of data before starting repairs job.
Results for Repair Timings:
The regular repair run took 210 mins.
Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s
The speedup is: 183 mins / 48s = 228X
2) Small dataset results
Node amount: 3
Instance type: i4i.2xlarge
Disk usage per node: ~167GB
Cluster pre-populated with ~167GB of data before starting the repairs job.
Regular repair 1st run took 110s, 2nd and 3rd runs took 110s.
Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds.
The speedup is: 110s / 1.5s = 73X
3) Large dataset results
Node amount: 6
Instance type: i4i.2xlarge, 3 racks
50% of base load, 50% read/write
Dataset == Sum of data on each node
Dataset Non-incremental repair (minutes)
1.3 TiB 31:07
3.5 TiB 25:10
5.0 TiB 19:03
6.3 TiB 31:42
Dataset Incremental repair (minutes)
1.3 TiB 24:32
3.0 TiB 13:06
4.0 TiB 5:23
4.8 TiB 7:14
5.6 TiB 3:58
6.3 TiB 7:33
7.0 TiB 6:55
Fixes#22472Closesscylladb/scylladb#24291
* github.com:scylladb/scylladb:
replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair
compaction: Move compaction_reenabler to compaction_reenabler.hh
topology_coordinator: Make rpc::remote_verb_error to warning level
repair: Add metrics for sstable bytes read and skipped from sstables
test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair
test.py: Add tests for tablet incremental repair
repair: Add tablet incremental repair support
compaction: Add tablet incremental repair support
feature_service: Add TABLET_INCREMENTAL_REPAIR feature
tablet_allocator: Add tablet_force_tablet_count_increase and decrease
repair: Add incremental helpers
sstable: Add being_repaired to sstable
sstables: Add set_repaired_at to metadata_collector
mutation_compactor: Introduce add operator to compaction_stats
tablet: Add sstables_repaired_at to system.tablets table
test: Fix drain api in task_manager_client.py
The following tests are added for tablet incremental repair:
- Basic incremental repair
- Basic incremental repair with error
- Minor compaction and incremental repair
- Major compaction and incremental repair
- Scrub compaction and incremental repair
- Cleanup/Upgrade compaction and incremental repair
- Tablet split and incremental repair
- Tablet merge and incremental repair
Implement repetition of files using pytest_collect_file hook: run
file collection as many times as needed to cover all --mode/--repeat
combinations. Also move disabled test logic to this hook.
Store build mode and run_id in pytest item stashes.
Simplify support of C++ tests: remove redundant facade abstraction and put
all code into 3 files: base.py, boost.py, and unit.py
Add support for `run_first` option in test_config.yaml
To run tests with bare pytest command we need to have almost the
same set of options as test.py because we reuse code from test.py.
scylladb/scylladb#24573 added `--pytest-arg` option to test.py but
not to test/conftest.py which breaks running Python tests using
bare pytest command.
Some test suites have own test runners based on pytest, and they
don't need all stuff we use for test.py. Move all code related to
test.py framework to test/pylib/runner.py and use it as a plugin
conditionally (by using TEST_RUNNER variable.)
In the ScyllaMetrics `get` function, when requesting the value for a
specific shard, it is expected to return the sum of all values of
metrics for that shard that match the labels.
However, it would return the value of the first matching line it finds
instead of summing all matching lines.
For example, if we have two lines for one shard like:
some_metric{scheduling_group_name="compaction",shard="0"} 1
some_metric{scheduling_group_name="sl:default",shard="0"} 2
The result of this call would be 1 instead of 3:
get('some_metric', shard="0")
We fix this to sum all matching lines.
The filtering of lines by labels is fixed to allow specifying only some
of the labels. Previously, for the line to match the filter, either the
filter needs to be empty, or all the labels in the metric line had to be
specified in the filter parameter and match its value, which is
unexpected, and breaks when more labels are added.
We also simplify the function signature and the implementation - instead
of having the shard as a separate parameter, it can be specified as a
label, like any other label.
In commit 44a1daf we added the ability to read Scylla system tables with Alternator. This feature is useful, among other things, in tests that want to read Scylla's configuration through the system table system.config. But tests often want to modify system.config, e.g., to temporarily reduce some threshold to make tests shorter. Until now, this was not possible
This series add supports for writing to system tables through Alternator, and examples of tests using this capability (and utility functions to make it easy).
Because the ability to write to system tables may have non-obvious security consequences, it is turned off by default and needs to be enabled with a new configuration option "alternator_allow_system_table_write"
No backports are necessary - this feature is only intended for tests. We may later decide to backport if we want to backport new tests, but I think the probability we'll want to do this is low.
Fixes#12348Closesscylladb/scylladb#19147
* github.com:scylladb/scylladb:
test/alternator: utility functions for changing configuration
alternator: add optional support for writing to system table
test/alternator: reduce duplicated code
Currently, there is no simple way to remove an option from the server's
config file in tests. One example when this is needed is removing the
`recovery_leader` option on all servers during the recovery procedure.
In this commit, we add a new method to `ManagerClient` that removes
an option from the given server's config file.
Previous way of execution repeat was to launch pytest for each repeat.
That was resource consuming, since each time pytest was doing discovery
of the tests. Now all repeats are done inside one pytest process.
Backport for 2025.3 is needed, since this functionality is framework only, and 2025.3 affected with this slow repeats as well.
Closesscylladb/scylladb#25073
* github.com:scylladb/scylladb:
test.py: add repeats in pytest
test.py: add directories and filename to the log files
test.py: rename log sink file for boost tests
test.py: better error handling in boost facade
In commit 44a1daf we added the ability to read system tables through
the DynamoDB API (actually, the Scan and Query requests only).
This ability is useful for tests, and can also be useful to users who
want to read information that is only available through system tables.
This patch adds support also for *writing* into system tables. This will
be useful for Alternator tests, were we want to temporarily change
some live-updatable configuration option - and so far haven't been
able to do that like we did do in some cql-pytest tests.
For reasons explained in issue #23218, only superuser roles are allowed to
write to system tables - it is not enough for the role to be granted
MODIFY permissions on the system table or on ALL KEYSPACES. Moreover,
the ability to modify system tables carries special risks, so this
patch only allows writes to the system tables if a new configuration
option "alternator_allow_system_table_write" turned on. This option is
turned off by default.
This patch also includes a test for this new configuration-writing
capability. The test scripts test/alternator/run and test.py now
run Scylla with alternator_allow_system_table_write turned on, but
the new test can also run without this option, and will be skipped
in that case (to allow running the test suite against some manually-
run instance of Scylla).
Fixes: #12348
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
With current implementation if pytest will be killed, it will not be
able to write the stdout from the boost test. With a new way it should
be updated while test executing, instead of writing it the end of the
test.
Closesscylladb/scylladb#25260
This commits introduces an config option 'tablet_load_stats_refresh_interval_in_seconds'
that allows overriding the default value without using error injection.
Fixesscylladb/scylladb#24641Closesscylladb/scylladb#24746
Tests sometimes fail in ScyllaCluster.add_server on the
'replaced_srv.host_id' line because host_id is not resolved yet. In
this commit we introduce functions try_get_host_id and get_host_id
that resolve it when needed.
Closesscylladb/scylladb#25177
This PR implements solution proposed in scylladb/scylladb#24481
Instead of terminating connections immediately, the shutdown now proceeds in two stages: first closing the receive (input) side to stop new requests, then waiting for all active requests to complete before fully closing the connections.
The updated shutdown process is as follows:
1. Initial Shutdown Phase
* Close the accept gate to block new incoming connections.
* Abort all accept() calls.
* For all active connections:
* Close only the input side of the connection to prevent new requests.
* Keep the output side open to allow responses to be sent.
2. Drain Phase
* Wait for all in-progress requests to either complete or fail.
3. Final Shutdown Phase
* Fully close all connections.
Fixesscylladb/scylladb#24481Closesscylladb/scylladb#24499
* https://github.com/scylladb/scylladb:
test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`, decrease request timeout.
generic_server: Two-step connection shutdown.
transport: consmetic change, remove extra blanks.
transport: Handle sleep aborted exception in sleep_until_timeout_passes
generic_server: replace empty destructor with `= default`
generic_server: refactor connection::shutdown to use `shutdown_input` and `shutdown_output`
generic_server: add `shutdown_input` and `shutdown_output` functions to `connection` class.
test: Add test for query execution during CQL server shutdown
Previous way of executin repeat was to launch pytest for each repeat.
That was resource consuming, since each time pytest was doing discovery
of the tests. Now all repeats are done inside one pytest process.
Currently, only test function name used for output and log files. For better
clarity adding the relative path from the test directory of the file name
without extension to these files.
Before:
test_aggregate_avg.1.log
test_aggregate_avg_stdout.1.log
After:
boost.aggregate_fcts_test.test_aggregate_avg.1.log
boost.aggregate_fcts_test.test_aggregate_avg_stdout.3.log
If test was not executed for some reason, for example not known parameter passed to the test, but boost framework was able to finish correctly, log file will have data but it will be parsed to an empty list. This will raise an exception in pytest execution, rather than produce test output. This change will handle this situation.
decrease request timeout.
In debug mode, queries may sometimes take longer than the default 30 seconds.
To address this, the timeout value `request_timeout_on_shutdown_in_seconds`
during tests is aligned with other request timeouts.
Change request timeout for tests from 180s to 90s since we must keep the request
timeout during shutdown significantly lower than the graceful shutdown timeout(2m),
or else a request timeout would cause a graceful shutdown timeout and fail a test.
When repairing a partition with many rows, we can store many fragments
in a repair_row_on_wire object which is sent as a rpc stream message.
This could cause reactor stalls when the rpc stream compression is
turned on, because the compression compresses the whole message without
any split and compression.
This patch solves the problem at the higher level by reducing the
message size that is sent to the rpc stream.
Tests are added to make sure the message split works.
Fixes#24808
We're enabling the configuration option `rf_rack_valid_keyspaces`
in all Python test suites. All relevant tests have been adjusted
to work with it enabled.
That encompasses the following suites:
* alternator,
* broadcast_tables,
* cluster (already enabled in scylladb/scylladb@ee96f8dcfc),
* cql,
* cqlpy (already enabled in scylladb/scylladb@be0877ce69),
* nodetool,
* rest_api.
Two remaining suites that use tests written in Python, redis and scylla_gdb,
are not affected, at least not directly.
The redis suite requires creating an instance of Scylla manually, and the tests
don't do anything that could violate the restriction.
The scylla_gdb suite focuses on testing the capabilities of scylla-gdb.py, but
even then it reuses the `run` file from the cqlpy suite.
Fixesscylladb/scylladb#25126Closesscylladb/scylladb#24617