Commit Graph

37 Commits

Author SHA1 Message Date
Petr Gusev
33e9ea4a0f test.py: add universalasync_typed_wrap
The universalasync.wrap function doesn't preserve the
type information, which confuses the VS Code Pylance
plugin and makes code navigation hard.

In this commit we fix the problem by adding a typed
wrapped around universalasync.wrap.

Fixes: scylladb/scylladb#26639
2025-10-22 11:32:37 +02:00
Andrei Chekun
24d17c3ce5 test.py: rewrite the wait_for_first_completed
Rewrite wait_for first_completed to return only first completed task guarantee
of awaiting(disappearing) all cancelled and finished tasks
Use wait_for_first_completed to avoid false pass tests in the future and issues
like #26148
Use gather_safely to await tasks and removing warning that coroutine was
not awaited

Closes scylladb/scylladb#26435
2025-10-22 01:13:43 +03:00
Avi Kivity
744015cf26 test.py: allow cmake configuration and ./configure.py configuration to coexist
Cmake emits its build.ninja into build/, while configure.py emits
build.ninja into ./. test.py uses this difference to choose the directory
structure to test.

The problem is that vscode will randomly call cmake to understand the
directory structure, so we end up with both build.ninja set up.

Invert the logic to look for ./build.ninja to determine the mode (instead
of build/build.ninja which can exist even if the user uses traditional
configuration).

It can still happen that a stray ./build.ninja exists (for example due
to switching branches), but that is rarer than having vscode auto-create
it.

Closes scylladb/scylladb#24269
2025-06-03 16:46:41 +03:00
Botond Dénes
51025de755 test/cluster: extract execute_with_tracing() into pylib/util.py
To allow reuse in other tests.
2025-05-02 01:53:35 -04:00
Andrei Chekun
bf2a9e267e test.py: change the parameter for get_modes_to_run()
Change the parameter for get_modes_to_run() from session to config to
narrow the scope, and prepare it to later use in method that do not have
access to the session, but have access to the config object
2025-04-22 18:58:33 +02:00
Botond Dénes
e5afd9b5fb test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts
Such that a given index in the return hosts refers to the same
underlying Scylla instance, as the same index in the passed-in nodes
list. This is what users of this method intuitively expect, but
currently the returned hosts list is unordered (has random order).
2025-04-08 00:11:36 -04:00
Evgeniy Naydanov
9bba59631f test.py: add xdist worker ID to log filenames
When run tests in parallel we need to ensure that filenames
are unique by adding xdist worker ID to them.
2025-03-30 03:19:30 +00:00
Evgeniy Naydanov
c4ae4e247a test.py: refactor paths constants and options
Add path constants to `test` module and use them in different test suites
instead of own dups of the same code:

 - TOP_SRC_DIR : ScyllaDB's source code root directory
 - TEST_DIR : the directory with test.py tests and libs
 - BUILD_DIR : directory with ScyllaDB's build artefacts

Add TestSuite.log_dir attribute as a ScyllaDB's build mode subdir of a path
provided using `--tmpdir` CLI argument.  Don't use `tmpdir` name because it
mixed up with pytest's built-in fixture and `--tmpdir` option itself.

Change default value for `--tmdir` from `./testlog` to `TOP_SRC_DIR/testlog`

Refactor `ResourceGather*` classes to use path from a `test` object instead of
providing it separately.

Move modes constants to `test` module and remove duplications.

Move `prepare_dirs()` and `start_3rd_party_services()` from `pylib.util` to
`pylib.suite.base` to avoid circular imports (with little refactoring to
use `pathlib.Path` instead of `str` as paths.)

Also, in some places refactor to use f-strings for formatting.
2025-03-30 03:19:29 +00:00
Andrei Chekun
1ea7b99385 test.py: Refactor test/pylib/cpp/ldap
Rename and move prepare_instance from ldap tests directory to
pylib/ldap_server.
2025-03-20 11:41:30 +01:00
Andrei Chekun
33e53565c4 test.py: move starting LDAP service to dedicate method
Move starting LDAP to the method where the rest of the services are
started. This will unify the way of starting the 3rd party services.
Fix LDAP tests flakiness due not possible to connect to LDAP server
Add catching stdout and stderr of toxiproxy-cli in case of errors
2025-03-20 11:37:04 +01:00
Andrei Chekun
a20d848c01 test.py: Refactor test/conftest.py
Move functions responsible for preparation of the environment to the util file.
This is extracted from https://github.com/scylladb/scylladb/pull/22894 to make it easier to work together.

Closes scylladb/scylladb#23221
2025-03-17 11:31:00 +02:00
Piotr Dulikowski
1b6fb95efc test.py: move gather_safely to pylib utils
The gather_safely function was originally defined in the
test.pylib.scylla_cluster module, but it is a generic concurrency
combinator which is not tied to the concept of Scylla clusters at all.
Move it to test.pylib.util to make this fact more clear.
2025-02-17 12:47:13 +01:00
Andrei Chekun
8ef840a1c5 test.py: Add the possibility to run boost test from pytest
Add the possibility to run boost test from pytest.
Boost facade based on code from https://github.com/pytest-dev/pytest-cpp, but enhanced and rewritten to suite better.
2025-02-07 21:40:25 +01:00
Andrei Chekun
4addc039e5 test.py: Add discovery for C++ tests for pytest
Code based on https://github.com/pytest-dev/pytest-cpp. Updated, customized, enhanced to suit current needs.
Modify generate report to not modify the names, since it will break
xdist way of working. Instead modification will be done in post collect
but before executing the tests.
2025-02-07 19:44:06 +01:00
Andrei Chekun
108ef5856f test.py: Move get configured modes to common lib
This will allow using this method inside the test module for pytest launching the boost and unit tests
2025-02-07 19:38:53 +01:00
Avi Kivity
f3eade2f62 treewide: relicense to ScyllaDB-Source-Available-1.0
Drop the AGPL license in favor of a source-available license.
See the blog post [1] for details.

[1] https://www.scylladb.com/2024/12/18/why-were-moving-to-a-source-available-license/
2024-12-18 17:45:13 +02:00
Abhinav
39dfd2d7ac test: move wait_for_first_completed to pylib/util.py
This function is needed in a new test added in the next commit and this
refactoring avoids code duplication.
2024-10-25 13:26:42 +05:30
Michael Litvak
3ca5dd537f test/pylib: use view_build_status_v2 table in wait_for_view
Change the util function wait_for_view to read the view build status
from the system.view_build_status_v2 table which replaces
system_distributed.view_build_status.
The old table can still be used but it is less efficient because it's
implemented as a virtual table which reads from the v2 table, so it's
better to read directly from the v2 table. This can cause slowness in
tests.
The additional util function wait_for_view_v1 reads from the old table.
This may be needed in upgrade tests if the v2 table is not available
yet.
2024-09-05 15:42:35 +03:00
Emil Maskovsky
9ab25e5cbf test: raft: replace the use of read_barrier work-around
Replaced the old `read_barrier` helper from "test/pylib/util.py"
by the new helper from "test/pylib/rest_client.py" that is calling
the newly introduced direct REST API.

Replaced in all relevant tests and decommissioned the old helper.

Introduced a new helper `get_host_api_address` to retrieve the host API
address - which in come cases can be different from the host address
(e.g. if the RPC address is changed).

Fixes: scylladb/scylladb#19662

Closes scylladb/scylladb#19739
2024-07-19 19:20:44 +02:00
Nadav Har'El
96dff367f8 Merge 'storage_proxy: update view update backlog on correct shard when writing' from Wojciech Mitros
This series is another approach of https://github.com/scylladb/scylladb/pull/18646 and https://github.com/scylladb/scylladb/pull/19181. In this series we only change where the view backlog gets
updated - we do not assure that the view update backlog returned in a response is necessarily the backlog
that increased due to the corresponding write, the returned backlog may be outdated up to 10ms. Because
 this series does not include this change, it's considerably less complex and it doesn't modify the common
write patch, so no particular performance considerations were needed in that context. The issue being fixed
is still the same, the full description can be seen below.

When a replica applies a write on a table which has a materialized view
it generates view updates. These updates take memory which is tracked
by `database::_view_update_concurrency_sem`, separate on each shard.
The fraction of units taken from the semaphore to the semaphore limit
is the shard's view update backlog. Based on these backlogs, we want
to estimate how busy a node is with its view updates work. We do that
by taking the max backlog across all shards.
To avoid excessive cross-shard operations, the node's (max) backlog isn't
calculated each time we need it, but up to 1 time per 10ms (the `_interval`) with an optimization where the backlog of the calculating shard is immediately up-to-date (we don't need cross-shard operations for it):
```
update_backlog node_update_backlog::fetch() {
    auto now = clock::now();
    if (now >= _last_update.load(std::memory_order_relaxed) + _interval) {
        _last_update.store(now, std::memory_order_relaxed);
        auto new_max = boost::accumulate(
                _backlogs,
                update_backlog::no_backlog(),
                [] (const update_backlog& lhs, const per_shard_backlog& rhs) {
                    return std::max(lhs, rhs.load());
                });
        _max.store(new_max, std::memory_order_relaxed);
        return new_max;
    }
    return std::max(fetch_shard(this_shard_id()), _max.load(std::memory_order_relaxed));
}
```
For the same reason, even when we do calculate the new node's backlog,
we don't read from the `_view_update_concurrency_sem`. Instead, for
each shard we also store a update_backlog atomic which we use for
calculation:
```
    struct per_shard_backlog {
        // Multiply by 2 to defeat the prefetcher
        alignas(seastar::cache_line_size * 2) std::atomic<update_backlog> backlog = update_backlog::no_backlog();
        need_publishing need_publishing = need_publishing::no;

        update_backlog load() const {
            return backlog.load(std::memory_order_relaxed);
        }
    };
 std::vector<per_shard_backlog> _backlogs;
```
Due to this distinction, the update_backlog atomic need to be updated
separately, when the `_view_update_concurrency_sem` changes.
This is done by calling `storage_proxy::update_view_update_backlog`, which reads the `_view_update_concurrency_sem` of the shard (in `database::get_view_update_backlog`)
and then calls node`_update_backlog::add` where the read backlog
is stored in the atomic:
```
void storage_proxy::update_view_update_backlog() {
    _max_view_update_backlog.add(get_db().local().get_view_update_backlog());
}
void node_update_backlog::add(update_backlog backlog) {
    _backlogs[this_shard_id()].backlog.store(backlog, std::memory_order_relaxed);
    _backlogs[this_shard_id()].need_publishing = need_publishing::yes;
}
```
For this implementation of calculating the node's view update backlog to work,
we need the atomics to be updated correctly when the semaphores of corresponding
shards change.

The main event where the view update backlog changes is an incoming write
request. That's why when handling the request and preparing a response
we update the backlog calling `storage_proxy::get_view_update_backlog` (also
because we want to read the backlog and send it in the response):
backlog update after local view updates (`storage_proxy::send_to_live_endpoints` in `mutate_begin`)
```
 auto lmutate = [handler_ptr, response_id, this, my_address, timeout] () mutable {
     return handler_ptr->apply_locally(timeout, handler_ptr->get_trace_state())
             .then([response_id, this, my_address, h = std::move(handler_ptr), p = shared_from_this()] {
         // make mutation alive until it is processed locally, otherwise it
         // may disappear if write timeouts before this future is ready
         got_response(response_id, my_address, get_view_update_backlog());
     });
 };
backlog update after remote view updates (storage_proxy::remote::handle_write)

 auto f = co_await coroutine::as_future(send_mutation_done(netw::messaging_service::msg_addr{reply_to, shard}, trace_state_ptr,
         shard, response_id, p->get_view_update_backlog()));
```
Now assume that on a certain node we have a write request received on shard A,
which updates a row on shard B (A!=B). As a result, shard B will generate view
updates and consume units from its `_view_update_concurrency_sem`, but will
not update its atomic in `_backlogs` yet. Because both shards in the example
are on the same node, shard A will perform a local write calling `lmutate` shown
above. In the `lmutate` call, the `apply_locally` will initiate the actual write on
shard B and the `storage_proxy::update_view_update_backlog` will be called back
on shard A. In no place will the backlog atomic on shard B get updated even
though it increased in size due to the view updates generated there.
Currently, what we calculate there doesn't really matter - it's only used for the
MV flow control delays, so currently, in this scenario, we may only overload
a replica causing failed replica writes which will be later retried as hints. However,
when we add MV admission control, the calculated backlog will be the difference
between an accepted and a rejected request.

Fixes: https://github.com/scylladb/scylladb/issues/18542

Without admission control (https://github.com/scylladb/scylladb/pull/18334), this patch doesn't affect much, so I'm marking it as backport/none

Closes scylladb/scylladb#19341

* github.com:scylladb/scylladb:
  test: add test for view backlog not being updated on correct shard
  test: move auxiliary methods for waiting until a view is built to util
  mv: update view update backlog when it increases on correct shard
2024-07-04 11:40:09 +03:00
Wojciech Mitros
c4f5659c11 test: move auxiliary methods for waiting until a view is built to util
In many materialized view tests we need to wait until a view is built before
actually working on it, future tests will also need it. In existing tests
we use the same, duplicated method for achieving that.
In this patch the method is deduplicated and moved to pylib/util.py
and existing tests are modified to use it instead.
2024-07-03 23:18:52 +02:00
Marcin Maliszkiewicz
d08a80b34f test: extend unique_name with random sufix
This reduces collision risk in an unlikely
and incorrect setup where tests would be
run concurrently by multiple processes.
2024-06-27 09:28:02 +02:00
Tomasz Grabiec
3e1ba4c859 test: pylib: Extract start_writes() load generator utility 2024-05-29 10:02:56 +02:00
Kamil Braun
ec1f574b3a test/pylib: util: silence exception from refresh_nodes
Driver's `refresh_nodes` function may throw an exception if we call it
in the middle of driver reconnecting. Silence it.

Fixes scylladb/scylladb#17616

Closes scylladb/scylladb#17620
2024-03-04 17:50:16 +02:00
Evgeniy Naydanov
10eebe3c66 test: use different IP addresses for listen and RPC addresses
Scylla can be configured to use different IPs for the internode communication
and client connections.  This test allocates and configure unique IP addresses
for the client connections (`rpc_address`) for 2-nodes cluster.

Two scenarios tested:
  1) Change RPC IPs sequentially
  2) Change RPC IPs simultaneously

Closes scylladb/scylladb#15965
2023-12-17 18:00:09 +02:00
Alejo Sanchez
8fceb7b7a0 test/pylib: fix function attribute
Instead of globally hardcoding an attribute, set it in the function
itself.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-07-18 13:33:46 +02:00
Piotr Dulikowski
e7c355e84f test: introduce get_supported_features/get_enabled_features
Introduces two helper functions that allow getting information about
supported/enabled features on a node, according to its system tables.

As a bonus, the `wait_for_feature` function is refactored to use
`get_enabled_features`.
2023-06-12 13:28:16 +02:00
Piotr Dulikowski
56d3d8b9e2 test: move wait_for_feature to pylib utils
The `wait_for_feature` can be useful, and will be used, in other test
suites than `topology_raft_disabled`, so it is moved to the common pylib
utils.
2023-06-12 10:09:00 +02:00
Kamil Braun
64dc76db55 test: pylib: fix read_barrier implementation
The previous implementation didn't actually do a read barrier, because
the statement failed on an early prepare/validate step which happened
before read barrier was even performed.

Change it to a statement which does not fail and doesn't perform any
schema change but requires a read barrier.

This breaks one test which uses `RandomTables.verify_schema()` when only
one node is alive, but `verify_schema` performs a read barrier. Unbreak
it by skipping the read barrier in this case (it makes sense in this
particular test).

Closes #13933
2023-05-18 18:30:11 +02:00
Kamil Braun
3f3dcf451b test: pylib: random_tables: perform read barrier in verify_schema
`RandomTables.verify_schema` is often called in topology tests after
performing a schema change. It compares the schema tables fetched from
some node to the expected latest schema stored by the `RandomTables`
object.

However there's no guarantee that the latest schema change has already
propagated to the node which we query. We could have performed the
schema change on a different node and the change may not have been
applied yet on all nodes.

To fix that, pick a specific node and perform a read barrier on it, then
use that node to fetch the schema tables.

Fixes #13788

Closes #13789
2023-05-08 13:21:10 +02:00
Petr Gusev
330d1d5163 scylla_cluster.py: fix read_last_line
This is a follow-up to #13399, the patch
addresses the issues mentioned there:
* linesep can be split between blocks;
* linesep can be part of UTF-8 sequence;
* avoid excessively long lines, limit to 512 chars;
* the logic of the function made simpler and more
maintainable.
2023-05-05 12:57:36 +04:00
Petr Gusev
8a5e211c30 scylla_cluster.py: move read_last_line to util.py
We want to add tests for read_last_line, so we
move it to make this simper.
2023-05-05 12:51:25 +04:00
Kamil Braun
c959ec455a test/pylib: move some utility functions to util.py
They were used in test_raft_upgrade, but we want to use them in other
test files too.
2023-01-17 12:28:00 +01:00
Kamil Braun
4f7e5ee963 test/pylib: prefix cluster/manager logs with the current test name
The log file produced by test.py combines logs coming from multiple
concurrent test runs. Each test has its own log file as well, but this
"global" log file is useful when debugging problems with topology tests,
since many events related to managing clusters are stored there.

Make the logs easier to read by including information about the test case
that's currently performing operations such as adding new servers to
clusters and so on. This includes the mode, test run name and the name
of the test case.

We do this by using custom `Logger` objects (instead of calling
`logging.info` etc. which uses the root logger) with `LoggerAdapter`s
that include the prefixes. A bit of boilerplate 'plumbing' through
function parameters is required but it's mostly straightforward.

This doesn't apply to all events, e.g. boost test cases which don't
setup a "real" Scylla cluster. These events don't have additional
prefixes.

Example:
```

17:41:43.531 INFO> [dev/topology.test_topology.1] Cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) adding server...
17:41:43.531 INFO> [dev/topology.test_topology.1] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-10...
17:41:43.603 INFO> [dev/topology.test_topology.1] starting server at host 127.40.246.10 in scylla-10...
17:41:43.614 INFO> [dev/topology.test_topology.2] Cluster ScyllaCluster(name: 7a497fce-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(2, 127.40.246.2, f59d3b1d-efbb-4657-b6d5-3fa9e9ef786e), ScyllaServer(5, 127.40.246.5, 9da16633-ce53-4d32-8687-e6b4d27e71eb), ScyllaServer(9, 127.40.246.9, e60c69cd-212d-413b-8678-dfd476d7faf5), stopped: ) adding server...
17:41:43.614 INFO> [dev/topology.test_topology.2] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-11...
17:41:43.670 INFO> [dev/topology.test_topology.2] starting server at host 127.40.246.11 in scylla-11...
```
2023-01-11 10:09:39 +01:00
Kamil Braun
0f49813312 test/pylib: util: configurable period in wait_for 2022-11-16 14:01:50 +01:00
Petr Gusev
44f48bea0f raft: test_remove_node_with_concurrent_ddl
The test runs remove_node command with background ddl workload.
It was written in an attempt to reproduce scylladb#11228 but seems to have
value on its own.

The if_exists parameter has been added to the add_table
and drop_table functions, since the driver could retry
the request sent to a removed node, but that request
might have already been completed.

Function wait_for_host_known waits until the information
about the node reaches the destination node. Since we add
new nodes at each iteration in main, this can take some time.

A number of abort-related options was added
SCYLLA_CMDLINE_OPTIONS as it simplifies
nailing down problems.

Closes #11734
2022-11-04 17:16:35 +01:00
Konstantin Osipov
26fa9336d1 test.py: add common utility functions to test/pylib 2022-05-25 20:26:42 +03:00