Trie-based sstable indexes are supposed to be (hopefully)
a better default than the old BIG indexes.
Make them the new default.
If we change our mind, this change can be reverted later.
Returns the sstable version currently chosen for use in for new sstables.
We are adding it because some tests want to know what format they are
writing (tests using upgradesstable, tests which check stats that only
apply to one of the index types, etc).
(Currently they are using `highest_supported_sstable_format` for this
purpose, which is inappropriate, and will become invalid if a non-latest
format is the default).
There are two checks for live endpoints performed in test_gossiper.py,
but one of those sits in test_gossiper_unreachable_endpoints somehow.
This patch moves live endpoints check into live endpoints test.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28224
To fix the problem, we need to remove the first, redundant definition of
test_gossiper_unreachable_endpoints (lines 19-24). The second definition
(lines 25-40) should be retained as it has more substantial test logic.
No other code changes or imports are needed, as the test logic is
preserved fully in the retained definition.
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
Closesscylladb/scylladb#27632
In the current state pytest do not support the order of execution, so this parameter is removed. There is no big need in this due to the differences what pytest and test.py counted test. pytest run test functions in the threads, while test.py executed test files in the threads. That's why pytest's way is more granular and allows to fill threads better.
Remove skip node, since it already added as a pytest mark for each test in the file.
Remove pool_size, since this is not used by pytest at all. Pytest uses
xdist to set the amount of threads instead of pool_size used by test.py
With this commit test.py will lose ability to run tests by itself always bypassing execution to the pytest.
NOTE: this is a breaking change. From this commit, several directories
with tests will require a path to the file to launch the test.
Affected directories
test/alternator
test/broadcast_tables
test/cql
test/cqlpy
test/rest_api
Function skip_mode works only on function and only in cluster test. This if OK
when we need to skip one test, but it's not possible to use it with pytestmark
to automatically mark all tests in the file. The goal of this PR is to migrate
skip_mode to be dynamic pytest.mark that can be used as ordinary mark.
Closesscylladb/scylladb#27853
[avi: apply to test/cluster/test_tablets.py::test_table_creation_wakes_up_balancer]
This reverts commit b0643f8959, reversing
changes made to e8b0f8faa9.
The change forgot to update
sstables_manager::get_highest_supported_format(), which results in
/system/highest_supported_sstable_version still returning me, confusing
and breaking tests.
Fixes: scylladb/scylla-dtest#6435Closesscylladb/scylladb#27379
Trie-based sstable indexes are supposed to be (hopefully)
a better default than the old BIG indexes.
Make them the new default.
If we change our mind, this change can be reverted later.
Returns the sstable version currently chosen for use in for new sstables.
We are adding it because some tests want to know what format they are
writing (tests using upgradesstable, tests which check stats that only
apply to one of the index types, etc).
(Currently they are using `highest_supported_sstable_format` for this
purpose, which is inappropriate, and will become invalid if a non-latest
format is the default).
Adds a test case for the `/storage_service/natural_endpoints/v2/{keyspace}` endpoint,
verifying that it correctly resolves natural endpoints for a composite partition key
passed as multiple `key_component` query parameters.
Previously, the send() method in RestApiSession only supported one value per query parameter key.
This patch updates it to support passing lists of values, allowing the same key to appear multiple
times in the query string (e.g. ?key=value1&key=value2).
Some test suites have own test runners based on pytest, and they
don't need all stuff we use for test.py. Move all code related to
test.py framework to test/pylib/runner.py and use it as a plugin
conditionally (by using TEST_RUNNER variable.)
Users with single-column partition keys that contain colon characters
were unable to use certain REST APIs and 'nodetool' commands, because the
API split key by colon regardless of the partition key schema.
Affected commands:
- 'nodetool getendpoints'
- 'nodetool getsstables'
Affected endpoints:
- '/column_family/sstables/by_key'
- '/storage_service/natural_endpoints'
Refs: #16596 - This does not fully fix the issue, as users with compound
keys will face the issue if any column of the partition key contains
a colon character.
Closesscylladb/scylladb#24829
Added a new POST endpoint `/storage_service/drop_quarantined_sstables` to the REST API.
This endpoint allows dropping all quarantined SSTables either globally or
for a specific keyspace and tables.
Optional query parameters `keyspace` and `tables` (comma-separated table names) can be
provided to limit the scope of the operation.
Fixesscylladb/scylladb#19061
The test has two major problems
1. Wrongly computed time windows. Data was not spread across two 1-minute
windows causing the test to generate even three sstables instead
of two
2. Timestamp was not propagated to the prepared CQL statements. So
in fact, a current time was used implicitly
3. Because of the incorrect timestamp issue, the remaining tests
testing purged tombstones were affected as well.
Fixes https://github.com/scylladb/scylladb/issues/24532Closesscylladb/scylladb#24609
test_repair_task_progress checks the progress of children of root
repair task. However, nothing ensures that the children are
already created.
Wait until at least one child of a root repair task is created.
Fixes: #24556.
Closesscylladb/scylladb#24560
Add the `host` fixture which uses `PythonTest.run_ctx()` context manager
to setup and teardown ScyllaDB node if `--test-py-init` argument is used.
Otherwise, this fixture returns a value of `--host` CLI argument.
Use dynamic scope provided by `testpy_test_fixture_scope()` function
instead of `session` to maintain compatibility with test.py and ./run
scripts.
Add 3 supplementary functions to `test.pylib.suite.python`:
`add_host_option()` (which adds `--host` options to pytest session),
`add_cql_connection_options()` (which adds `--port`, and `--ssl`),
and `--add-s3-options` (which adds options related to S3 connection.)
Each function decorated with `@cache` decorator to be executed once per
pytest session and avoid CLI options duplication for runs which
executes `alternator`, `cqlpy`, `rest_api`, or `broadcast_tables`
in one pytest session.
The class was introduced to facilitate path and query parameters parsing from requests, but in fact it's mostly dead code.
First, the class introduces the concept of "mandatory" parameters which are seastar path params. If missing, the parameter validation throws, but in all cases where this option is used in scylla it's impossible to get empty path param -- if the parameter is missing seastar returns 404 (not found) before calling handler.
Second, the req_params::get<T>() doesn't work for anything but string argument (or types such that optional<T> can be implicitly casted to optional<sstring>). And it's in fact only used to get sstrings, so it compiles and works so far.
The remaining ability to parse bool from string is partially duplicated by the validate_bool() method. Using plain method to parse string to bool is less code than req_params introduce.
One (arguably) useful thing req_params do it validate the incoming request _not_ to contain unknown query parameters. However, quite a few endpoints use this, most of them just cherry-pick parameters they want and ignore the others. There's already a comprehensive description of accepted parameters for each endpoint in api-doc/ and req_params duplicate it. Good validation code should rely on api-doc/, not on its partial copy.
Having said that, this PR introduces validate_bool_x() helper to do req_params-like parsing of strings to bools, patches existing handlers to use existing parameters parsing facilities (such as validate_keyspace() and parse_table_infos()) and drops the req_params.
Closesscylladb/scylladb#24159
* github.com:scylladb/scylladb:
api: Drop class req_params
api: Stop using req_params in parse_scrub_options
api: Stop using req_params in tasks::force_keyspace_compaction_async
api: Stop using req_params in ss::force_keyspace_compaction
api: Stop using req_params in ss::force_compaction
api: Stop using req_params in cf::force_major_compaction
api: Add validate_bool_x() helper
The "keyspace" and "cf" pair of options are now parsed similarly to how
recently changed ss::force_keyspace_compaction handler does.
The "scrub_mode" query param is saved directly into sstring variable and
its presense is checked by .empty() call. If the parameter is missing,
the request::get_query_param() would return empty string, so the change
is correct.
The "skip_corrupted" is boolean option, other options are already parsed
by hand, without the help of req_params facilities.
There's a test that validates the work of req_params::process() of scrub
endpoint -- it passes "invalid" options. This test is temporarily
removed according to the PR description.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The pair of column_family/metrics/(total|live)_disk_space_used/{name}
reports the disk usage by sstables. The test creates table, populates,
flushes and checks that the size corresonds to what stat(2) reports for
the respective files.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This will result in new tables having at least 10 tablet replicas per
shard by default.
We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.
Fixes#21967
In some tests, we explicity set the initial scale to 1 as some of the
existing tests assume 1 compaction group per shard.
test.py uses a lower default. Having many tablets per shard slows down
certain topology operations like decommission/replace/removenode,
where the running time is proportional to tablet count, not data size,
because constant cost (latency) of migration dominates. This latency
is due to group0 operations and barriers. This is especially
pronounced in debug mode. Scheduler allows at most 2 migrations per
shard, so this latency becomes a determining factor for decommission
speed.
To avoid this problem in tests, we use lower default for tablet count per
shard, 2 in debug/dev mode and 4 in release mode. Alternatively, we
could compensate by allowing more concurrency when migrating small
tablets, but there's no infrastructure for that yet.
I observed that with 10 tablets per shard, debug-mode
topology_custom.mv/test_mv_topology_change starts to time-out during
removenode (30 s).
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.
The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.
In the following patches, get_status won't be unregistering finished
tasks. However, tests need a functionality to drop a task, so that
they could manipulate only with the tasks for operations that were
invoked by these tests.
Add /task_manager/drain/{module} to unregister all finished tasks
from the module. Add respective nodetool command.
Adds core integration of the audit subsystem into Scylla's main process flow. Changes include:
- Import audit subsystem header
- Initialize audit system during server startup using configuration and token metadata
- Start audit system after API server initialization with query processor and memory manager
- Add proper shutdown sequence for audit system using RAII pattern
- Add error handling for audit system initialization failures
The audit system is now properly integrated into Scylla's lifecycle, ensuring:
- Correct initialization order relative to other subsystems
- Proper resource cleanup during shutdown
- Graceful error handling for initialization failures
In Scylla there are two options that control IO bandwidth limit -- the /storage_service/(compaction|stream)_throughput REST API endpoints. The endpoints are partially implemented and have no counterparts in the nodetool.
This set implements the missing bits and adds tests for new functionality.
Closesscylladb/scylladb#21877
* github.com:scylladb/scylladb:
nodetool: Implement [gs]etstreamthroughput commands
nodetool: Implement [gs]etcompationthroughput commands
test: Add validation of how IO-updating endpoints work
api: Implement /storage_service/(stream|compaction)_throughput endpoints
api: Disqualify const config reference
api: Implement /storage_service/stream_throughput endpoint
api: Move stream throughput set/get endpoints from storage service block
api: Move set_compaction_throughput_mb_per_sec to config block
util: Include fmt/ranges.h in config_file.hh
We still have a number of issues to be solved for views with tablets.
Until they are fixed, we should prevent users from creating them,
and use the vnode-based views instead.
This patch prepares the feature for enabling views with tablets. The
feature is disabled by default, but currently it has no effect.
After all tests are adjusted to use the feature, we should depend
on the feature for deciding whether we can create materialized views
in tablet-enabled keyspaces.
The unit tests are adjusted to enable this feature explicitly, and it's
also added to the scylla sstable tool config - this tool treats all
tables as if they were tablet-based (surprisingly, with SimpleStrategy),
so for it to work on views, the new feature must be enabled.
Refs scylladb/scylladb#21832Closesscylladb/scylladb#21833
The /column_family/compaction_strategy has GET and POST implemented, the
latter changes the strategy on the table.
Unknown strategy name implicitly renders internal server error code by
catching exception from compaction_strategy::type() that tries to
convert strategy name string to strategy enum class type.
This is to finish validation of #21533
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#21569
There are now four of those and these are all the same in the way they
interpret the value parameter (though it's named differently)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When users start an operation asynchronously with API, they are expected to check the operation's status. Hence, the status should be kept in task manager for reasonable time after the operation is done. The operations that are started internally usually don't need to stay in task manager for that long.
Add api_task_ttl that will be used for tasks started with API. By default it's 1 hour. The time for which non-API tasks stay in task manager isn't changed.
Fixes: #21499.
Refs: #21425.
No backport needed - previous versions may use task_ttl
Closesscylladb/scylladb#21505
* github.com:scylladb/scylladb:
test: add test to check user_task_ttl
tasks: api: move make_task method
docs: nodetool: update backup and restore commands docs
docs: update task manager docs
nodetool: add nodetool tasks user-ttl command
node_ops: use user task ttl for node ops virtual task
tasks: use user_task_ttl for tasks started by user
api: task_manager: add /task_manager/user_ttl to get and set user task ttl
tasks: add task_manager::task::is_user_task method
tasks: keep updateable_value of task_ttl in task manager
db: config: add user_task_ttl_seconds named value
Stop taking snapshots of MVs and allow taking snapshot of individual tables, now one can take a snapshot of any base table, any view or index. Also add tests to cover new cases both boost test (using cc code) and pytest (using the API)
Also, update documentation to reflect the change
fixes: #21339fixes: #20760Closesscylladb/scylladb#21433
Python and Python developers don't like directory names to include a
minus sign, like "cql-pytest". In this patch we rename test/cql-pytest
to test/cqlpy, and also change a few references in other code (e.g., code
that used test/cql-pytest/run.py) and also references to this test suite
in documentation and comments.
Arguably, the word "test" was always redundant in test/cql-pytest, and
I want to leave the "py" in test/cqlpy to emphasize that it's Python-based
tests, contrasting with test/cql which are CQL-request-only approval
tests.
Fixes#20846
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
For a table with NullCompactionStrategy and
TimeWindowCompactionStrategy, the test
- inserts a bunch of data and flushes the table
- deletes/update some data, delete a range of data and flushes
the table
- Triggers a major compaction and calls for compactionhistory
to retrieve and validate the histogram
This option was silently broken when --enable-tablet's default changed
from false to true. The reason is that when --vnodes is passed, run only
removes --enable-tablets=true from scylla's command line. With the new
default this is not enough, we need to explicitely disable tablets to
override the default.
Closesscylladb/scylladb#20462
Increase pool size changes were recently reverted because of the flakiness for the test_gossip_boot test. Test started
to fail on adding the node to the cluster without any issues in the Scylla log file. In test logs it looked like the
installation process for the new node just hanged. After investigating the problem, I've found out that the issue is that
test.py was draining the io_executor pool for cleaning the directory during install that was set to eight workers. So
to fix the issue, io_executor pool should be increased to more or less the same ratio as it was: doubled cluster pool size.
Closesscylladb/scylladb#20276
Virtual tasks are supported by get_task_status, abort_task and
wait_task.
Task status returned by get_task_status and wait_task:
- contains task_kind to indicate whether it's virtual (cluster) or
regular (node) task;
- children list apart from task_id contains node address of the task.