There's a nasty scenario when this searching plays bad joke.
When CI picks up a new branch and notices, that a test had changed, it
spawns a custom job with test.py --repeat 100 $changed_test_name in
it. Next, when the test.py tries opt-in test name matching, it uses the
wildcard search and can pick up extra unwanted tests into the run.
To solve this, the case-selection syntax is extended. Now if the caller
specifies `suite/test::*` as test, the test file is selected by exact
name match, but the specific test-case is not selected, the `*` makes it
run all cases.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#18704
This PR resolves issue with double count of the test result for topology tests. It will not appear in the consolidated report anymore.
Another fix is to provide a better view which test failed by modifying the test case name in the report enriching it with mode and run id, so making them unique across the run.
The scope of this change is:
1. Modify the test name to have run id in name
2. Add handlers to get logs of test.py and pytest in one file that are related to test, rather than to the full suite
3. Remove topology tests from aggregating them on a suite level in Junit results
4. Add a link to the logs related to the failed tests in Junit results, so it will be easier to navigate to all logs related to test
5. Gather logs related to the failed test to one directory for better logs investigation
Ref: scylladb/scylladb#17851Closesscylladb/scylladb#18277
As part of the unification process, alternator tests are migrated to the PythonTestSuite instead of using the RunTestSuite. The main idea is to have one suite, so there will be easier to maintain and introduce new features.
Introduce the prepare_sql option for suite.yaml to add possibility to run cql statements as precondition for the test suite.
Related: https://github.com/scylladb/scylladb/issues/18188Closesscylladb/scylladb#18442
By default the suitename in the junit files generated by pytest
is named `pytest` for all suites instead of the suite, ex. `topology_experimental_raft`
With this change, the junit files will use the real suitename
This change doesn't affect the Test Report in Jenkins, but it
raised part of the other task of publishing the test results to
elasticsearch https://github.com/scylladb/scylla-pkg/pull/3950
where we parse the XMLs and we need the correct suitename
Closesscylladb/scylladb#18172
To create the list of tests to run there's a loop that fist collects all
tests from suits, then filters the list in two ways -- excludes
opt-out-ed lists (disabled and matching the skip pattern) or leaves
there only opt-in-ed (those, specified as positional arguments).
This patch keeps both list-checking code close to each other so that the
intent is explicitly clear.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Fixes#16912
By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility.
Closesscylladb/scylladb#17941
Fixes#17569
Tests are not closing file descriptor after it finishes. This leads to inability to continue tests since the default value for opened files in Linux is 1024. Issue easy to reproduce with the next command:
```
$ ./test.py --mode debug test_native_transport --repeat 1500
```
After fix applied all tests are passed with a next command:
```
$ ./test.py --mode debug test_native_transport --repeat 10000
```
Closesscylladb/scylladb#17798
summarize_tests() is only used to summarize boost tests, so reflect
this fact using its name. we will need to summarize the tests which
generate JUnit XML as well, so this change also prepares for a
following-up change to implement a new summarize helper.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#17746
Boost tests support case-by-case execution and always turn it on -- when
run, boost test is split into parallel-running sub-tests each with the
specific case name.
This patch tunes this, so that when a test is run like
test.py boost/testname::casename
No parallel-execution happens, but instead just the needed casename is
run. Example of selection:
test.py --mode=${mode} boost/bptree_test::test_cookie_find
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Before the change, when a test failed because of some error
in the `cql_test_env.cc`, we were getting:
```
error: boost/virtual_table_test: failed to parse XML output '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```
After the change we're getting:
```
error: boost/virtual_table_test: Empty testcase XML output, possibly caused by a crash in the cql_test_env.cc, details: '/home/piotrs/src/scylla2/testlog/debug/xml/boost.virtual_table_test.test_system_config_table_read.1.xunit.xml': no element found: line 1, column 0
```
Closesscylladb/scylladb#17679
Today's test.py allows filtering tests to run with the `test.py --options name` syntax. The "name" argument is then considered to be some prefix, and when iterating tests only those whose name starts with that prefix are collected and executed. This has two troubles.
Minor: since it is prefix filtering, running e.g. topology_custom/test_tablets will run test_tablets _and_ test_tablets_migration from it. There's no way to exclude the latter from this selection. It's not common, but careful file names selection is welcome for better ~~user~~ testing experience.
Major: most of test files in topology and python suites contain many cases, some are extremely long. When the intent is to run a single, potentially fast, test case one needs to either wait or patch the test .py file by hand to somehow exclude unwanted test cases.
This PR adds the ability to run individual test case with test.py. The new syntax is `test.py --options name::case`. If the "::case" part is present two changes apply.
First, the test file selection is done by name match, not by prefix match. So running topology_custom/test_tablets will _not_ select test_tablets_migration from it.
Second, the "::case" part is appended to the pytest execution so that it collects and runs only the specified test case.
Closesscylladb/scylladb#17481
* github.com:scylladb/scylladb:
test.py: Add test-case splitting in 'name' selection
test.py: Add casename argument to PythonTest
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.
A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.
We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.
Fixesscylladb/scylladb#15924
When filtering a test by 'name' consider that name can be in a
'test::case' format. If so, get the left part to be the filter and the
right part to be the case name to be passed down to test itself.
Later, when the pytest starts it then appends the case name (if not
None) to the pytest execution, thus making it run only the specified
test-case, not the whole test file.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
And propagate it from add_test() helper. For now keep it None, next
patch will bring more sense to this place
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Support skipping multiple patterns by allowing them to be passed via
multiple '--skip' arguments to test.py.
Example : `test.py --skip=topology --skip=sstables`
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#17220
this change is a cleanup.
so it only returns tests, to be more symmetric with `junit_tests()`.
this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`.
as only the BoostTest will be asked for `get_test_case()` after this
change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16961
before this change, all "tool" test suites use "pytest" to launch their
tests. but some of the tests might need a dedicated namespace so they
do not interfere with each other. fortunately, "unshare(1)" allows us
to run a progame in new namespaces.
in this change, we add a "launcher" option to "tool" test suites. so
that these tests can run with the specified "launcher" instead of using
"launcher". if "launcher" is not specified, its default value of
"pytest" is used.
Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
test.py already support the routing of coverage data into a
predetermined folder under the `tmpdir` logs folder. This patch extends
on that and leverage the code coverage processing libraries to produce
test coverage lcov files and a coverage summary at the end of the run.
The reason for not generating the full report (which can be achieved
with a one liner through the `coverage_utils.py` cli) is that it is
assumed that unit testing is not necessarily the "last stop" in the
testing process and it might need to be joined with other coverage
information that is created at other testing stages (for example dtest).
The result of this patch is that when running test.py with one of the
coverage options (`--coverage` / `--mode-coverage`) it will perform
another step of processing and aggregating the profiling information
created.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.
For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
coverage.
--coverage-mode - which can be used to "turn on" coverage support only
for some of the modes in this run.
The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.
Also added support of 'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
CMake puts `build.ninja` under `build`, so use it if it exists, and
fall back to current directory otherwise.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
use ninja() to build target using `ninja`. since CMake puts
`build.ninja` under "build", while `configure.py` puts it under
the root source directory, this change prepares us for a follow-up
change to build with build/build.ninja.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
use path_to() to find the path to the directory under build directory.
this change helps to find the executables built using CMake as well.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
because scylla build mode and CMAKE_BUILD_TYPE is not identical,
let's define `all_modes` as a dict so we can look it up.
this change prepares for a follow-up commit which adds a path
resolver which support both build system generator: the plain
`configure.py` and CMake driven by `configure.py`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
test.py inherits its env from the user, which is the right thing:
some python modules, e.g. logging, do accept env-based configuration.
However, test.py also starts subprocesses, i.e. tests, which start
scylladb instances. And when the instance is started without an explicit
configuration file, SCYLLA_CONF from user environment can be used.
If this scylla.conf contains funny parameters, e.g. unsupported
configuration options, the tests may break in an unexpected way.
Avoid this by resetting the respecting env keys in test.py.
Fixes gh-16583
Closesscylladb/scylladb#16577
This reverts commit 7c7baf71d5.
If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52 self.exit_artifacts = {}
12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52 Traceback (most recent call last):
12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52 return fut.result()
12:35:52 ^^^^^^^^^^^^
12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52 return await self._transport._wait()
12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52 return await waiter
12:35:52 ^^^^^^^^^^^^
12:35:52 asyncio.exceptions.CancelledError
12:35:52
12:35:52 The above exception was the direct cause of the following exception:
12:35:52
12:35:52 Traceback (most recent call last):
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52 raise exceptions.TimeoutError() from exc
12:35:52 TimeoutError
12:35:52
12:35:52 During handling of the above exception, another exception occurred:
12:35:52
12:35:52 Traceback (most recent call last):
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52 code = await main()
12:35:52 ^^^^^^^^^^^^
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52 await run_all_tests(signaled, options)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52 await reap(done, pending, signaled)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52 result = coro.result()
12:35:52 ^^^^^^^^^^^^^
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52 await test.run(options)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52 await anext(self.gen)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52 await manager.stop()
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52 await self.clusters.put(self.cluster, is_dirty=True)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52 await self.destroy(obj)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52 await cluster.stop_gracefully()
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52 raise RuntimeError(
12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
This change move existing suits which create cluster through the
testing infra to be stopped and uninstalled gracefully.
The motivation, besides the obvious advantage of testing our stop
sequence is that it will pave the way for applying code coverage support
to all tests (not only standalone unit and boost test executables).
testing:
Ran all tests 10 times in a row in dev mode.
Ran all tests once in release mode
Ran all tests once in debug mode
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
before this change, pytest does not populate its suites's
`scylla_env` down to the forked pytest child process. this works
if the test does not care about the env variables in `scylla_env`.
but object_store is an exception, as it launches scylla instances
by itself. so, without the help of `scylla_env`, `run.find_scylla()`
always find the newest file globbed by `build/*/scylla`. this is not
always what we expect. on the contrary, if we launch object_store's
pytest using `test.py`, there are good chances that object_store
ends up with testing a wrong scylla executable if we have multiple
builds under `build/*/scylla`.
so, in this change, we populate `self.suite.scylla_env` down to
the child process created by `PythonTest`, so that all pytest
based tests can have access to its suites's env variables.
in addition to 'SCYLLA' env variable, they also include the
the env variables required by LLVM code coverage instrumentation.
this is also nice to have.
Fixes#15679
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15682
ScyllaCluster can be marked as 'dirty' which means that the cluster is
in unusable state (after test) and shouldn't be re-used by other tests
launched by test.py. For now this is only implemented via the cluster
manager class which is only available for topology tests.
Add a less flexible short-cut for cql-pytest-s via suite.yaml marking.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In this PR a simple test for fencing is added. It exercises the data
plane, meaning if it somehow happens that the node has a stale topology
version, then requests from this node will get an error 'stale
topology'. The test just decrements the node version manually through
CQL, so it's quite artificial. To test a more real-world scenario we
need to allow the topology change fiber to sometimes skip unavailable
nodes. Now the algorithm fails and retries indefinitely in this case.
The PR also adds some logs, and removes one seemingly redundant topology
version increment, see the commit messages for details.
Closes#14901
* github.com:scylladb/scylladb:
test_fencing: add test_fence_hints
test.py: output the skipped tests
test.py: add skip_mode decorator and fixture
test.py: add mode fixture
hints: add debug log for dropped hints
hints: send_one_hint: extend the scope of file_send_gate holder
pylib: add ScyllaMetrics
hints manager: add send_errors counter
token_metadata: add debug logs
fencing: add simple data plane test
random_tables.py: add counter column type
raft topology: don't increment version when transitioning to node_state::normal
pytest option -rs forces it to print
all the skipped tests along with
the reasons. Without this option we
can't tell why certain tests were skipped,
maybe some of them shouldn't already.
The option was introduced to bootstrap the project. It's still
useful for testing, but that translates into maintaining an
additional option and code that will not be really used
outside of testing. A possible option is to later map the
option in boost tests to initial_tablets, which may yield
the same effect for testing.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
since MinioServer find a free port by itself, there is no need to
provide it an IP address for it anymore -- we can always use
127.0.0.1.
so, in this change, we just drop the HostRegistry parameter passed
to the constructor of MinioServer, and pass the host address in place
of it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
before this change, we would have report in Jenkins like:
```
[Info] - 1 out of 3 times failed: failed.
== [File] - test/boost/commitlog_test.cc
== [Line] - 298
[Info] - passed: release=1, dev=1
== [File] - test/boost/commitlog_test.cc
== [Line] - 298
[Info] - failed: debug=1
== [File] - test/boost/commitlog_test.cc
== [Line] - 298
```
the first section is rendered from the an `Info` tag,
created by `test.py`. but the ending "failed" does not
help in this context, as we already understand it's failing.
so, in this change, it is dropped.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14546
Low CPU utilization is a major contributor to high test time.
Low CPU utilization can happen due to tests sleeping, or lack
of concurrency due to Amdahl's law.
Utilization is computed by dividing the utilized CPU by the available
CPU (CPU count times wall time).
Example output:
Found 134 tests.
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[134/134] boost dev [ PASS ] boost.json_cql_query_test.test_unpack_decimal.1
------------------------------------------------------------------------------
CPU utilization: 4.8%
Closes#14251
there are chances that a Boost::test test fails to generate a
valid XML file after the test finishes. and
xml.etree.ElementTree.parse() throws when parsing it.
see https://github.com/scylladb/scylla-pkg/issues/3196
before this change, the exception is not handled, and test.py
aborts in this case. this does not help and could be misleading.
after this change, the exception is handled and printed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14180
before this change, when consolidating the boost's XML logger file,
we just practically concatenate all the tests' logger file into a single
one. sometimes, we run the tests for multiple times, and these runs share
the same TestSuite and TestCase tags. this has two sequences,
1. there is chance that only a test has both successful and failed
runs. but jenkins' "Test Results" page cannot identify the failed
run, it just picks a random run when one click for the detail of
the run. as it takes the TestCase's name as part of its identifier.
and we have multiple of them if the argument passed to the --repeat
option is greater than 1 -- this is the case when we promote the
"next" branch.
2. the testReport page of Jenkins' xUnit plugin created for the "next"
job is 3 times as large as the one for the regular "scylla-ci" run.
as all tests are repeated for 3 times. but what we really cares is
history of a certain test not a certain run of it.
in this change, we just pick a representive run of a test if it is
repeated multiple times and add a "Message" tag for including the
summary of the runs. this should address the problems above:
1. the failed tests always stand out so we can always pinpoint it with
Jenkins's "Test Results" page.
2. the tests are deduped by its name.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#14069
If the executable of a matching unit or boost test is not executable,
warn to console and skip.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13982
If the executable of a matching unit or boost test is not present, warn
to console and skip.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13949
Separate cluster_size into a cluster section and specify this value as
initial_size.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#13440