Commit Graph

480 Commits

Author SHA1 Message Date
Patryk Jędrzejczak
9dfb26428b test: topology: decrease the server's request timeouts
We decrease the server's request timeouts in topology tests so that
they are lower than the driver's timeout. Before, the driver could
time out its request before the server handled it successfully.
This problem caused scylladb/scylladb#15924.

A high server's request timeout can slow down the topology tests
(see the new comment in `make_scylla_conf`). We make the timeout
dependent on the testing mode to not slow down tests for no reason.

We don't touch the driver's request timeout. Decreasing it in some
modes would require too much effort for almost no improvement.

Fixes scylladb/scylladb#15924
2024-02-29 18:37:38 +01:00
Lakshmi Narayanan Sreethar
f8f8d64982 test.py: support skipping multiple test patterns
Support skipping multiple patterns by allowing them to be passed via
multiple '--skip' arguments to test.py.

Example : `test.py --skip=topology --skip=sstables`

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#17220
2024-02-13 17:32:03 +02:00
Kefu Chai
f5e3a2d98e test.py: add boost_tests() to suite
this change is a cleanup.

so it only returns tests, to be more symmetric with `junit_tests()`.
this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`.
as only the BoostTest will be asked for `get_test_case()` after this
change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16961
2024-01-31 13:43:21 +02:00
Kefu Chai
ee28cf2285 test.py: s/defalt/default/
this typo was identified by codespell

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16980
2024-01-25 16:54:07 +02:00
Kefu Chai
35b3c51f40 test.py: add "launcher" option support
before this change, all "tool" test suites use "pytest" to launch their
tests. but some of the tests might need a dedicated namespace so they
do not interfere with each other. fortunately, "unshare(1)" allows us
to run a progame in new namespaces.

in this change, we add a "launcher" option to "tool" test suites. so
that these tests can run with the specified "launcher" instead of using
"launcher". if "launcher" is not specified, its default value of
"pytest" is used.

Refs #16542
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-25 20:28:01 +08:00
Kefu Chai
a9851cf834 test.py: replace "$foo is False" with "not $foo"
for better readability.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16960
2024-01-24 15:21:53 +02:00
Lakshmi Narayanan Sreethar
a1867986e7 test.py: deduce correct path for unit tests when built with cmake
Fix the path deduction for unit test executables when the source code is
built with cmake.

Fixes #16906

Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>

Closes scylladb/scylladb#16907
2024-01-22 10:03:44 +02:00
Eliran Sinvani
c7dff1b81b test.py: support code coverage
test.py already support the routing of coverage data into a
predetermined folder under the `tmpdir` logs folder. This patch extends
on that and leverage the code coverage processing libraries to produce
test coverage lcov files and a coverage summary at the end of the run.
The reason for not generating the full report (which can be achieved
with a one liner through the `coverage_utils.py` cli) is that it is
assumed that unit testing is not necessarily the "last stop" in the
testing process and it might need to be joined with other coverage
information that is created at other testing stages (for example dtest).

The result of this patch is that when running test.py with one of the
coverage options (`--coverage` / `--mode-coverage`) it will perform
another step of processing and aggregating the profiling information
created.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Eliran Sinvani
f4b6c9074a test.py: support --coverage and --coverage-mode
We aim to support code coverage reporting as part of our development
process, to this end, we will need the ability to "route" the dumped
profiles from scylla and unit test to a predetermined location.
We can consider profile data as logged data that should persist after
tests have been run.

For this we add two supported options to test.py:
--coverage - which means that all suits on all modes will participate in
             coverage.
--coverage-mode - which can be used to "turn on" coverage support only
                  for some of the modes in this run.

The strategy chosen is to save the profile data in
`tmpdir`/mode/coverage/%m.profraw (ref:
https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program)
This means that for every suite the profiling data of each object is
going to be merged into the same file (llvm claims to lock the file so
concurrency is fine).
More resolution than the suite level seems to not give us anything
useful (at least not at the moment). Moreover, it can also be achieved
by running a single test.
Data in the suite level will help us to detect suits that don't generate
coverage data at all and to fix this or to skip generating the profiles
for them.

Also added support of  'coverage' parameter in the `suite.yaml` file,
which can be used to disable coverage for a specific suite, this
parameter defaults to True but if a suite is known to not generate
profiles or the suite profile data is not needed or obfuscate the result
it can be set to false in order to cancel profiles routing and
processing for this suite.

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2024-01-18 11:11:34 +02:00
Kefu Chai
382a5e2d0c test.py: build using build/build.ninja when it exists
CMake puts `build.ninja` under `build`, so use it if it exists, and
fall back to current directory otherwise.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
6674e87842 test.py: extract ninja()
use ninja() to build target using `ninja`. since CMake puts
`build.ninja` under "build", while `configure.py` puts it under
the root source directory, this change prepares us for a follow-up
change to build with build/build.ninja.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
5fda822c4e test.py: extract path_to()
use path_to() to find the path to the directory under build directory.

this change helps to find the executables built using CMake as well.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
0b11ae9fe6 test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE
because scylla build mode and CMAKE_BUILD_TYPE is not identical,
let's define `all_modes` as a dict so we can look it up.
this change prepares for a follow-up commit which adds a path
resolver which support both build system generator: the plain
`configure.py` and CMake driven by `configure.py`.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-10 10:01:02 +08:00
Kefu Chai
47d8edc0fc test.py: s/asyncio.get_event_loop()/asyncio.get_running_loop()/
the latter raises a RuntimeError if there is no no running event loop,
while the former gets one from the the default policy in this case.
in the use cases in test.py, there is always a running event loop,
when `asyncio.get_event_loop()` gets called. so let's use
the preferred `asyncio.get_running_loop()`.

see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16398
2024-01-04 08:39:49 +02:00
Konstantin Osipov
246da8884a test.py: override SCYLLA_* env keys
test.py inherits its env from the user, which is the right thing:
some python modules, e.g. logging, do accept env-based configuration.

However, test.py also starts subprocesses, i.e. tests, which start
scylladb instances. And when the instance is started without an explicit
configuration file, SCYLLA_CONF from user environment can be used.

If this scylla.conf contains funny parameters, e.g. unsupported
configuration options, the tests may break in an unexpected way.

Avoid this by resetting the respecting env keys in test.py.

Fixes gh-16583

Closes scylladb/scylladb#16577
2023-12-31 13:02:49 +02:00
Yaniv Kaul
0b0a3ee7fc Typos: fix typos in code
Last batch, hopefully, sing codespell, went over the docs and fixed some typos.

Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#16388
2023-12-13 10:45:21 +02:00
Yaron Kaikov
c3ee53f3be test.py: enable xml validation
Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862

Adding back xml validation

Closes: https://github.com/scylladb/scylla-pkg/issues/3441

Closes scylladb/scylladb#16198
2023-11-29 09:02:36 +02:00
Kamil Braun
3bcee6a981 Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani"
This reverts commit 7c7baf71d5.

If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52  /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52    self.exit_artifacts = {}
12:35:52  RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52  Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52  Traceback (most recent call last):
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52      return fut.result()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52      return await self._transport._wait()
12:35:52             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52    File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52      return await waiter
12:35:52             ^^^^^^^^^^^^
12:35:52  asyncio.exceptions.CancelledError
12:35:52
12:35:52  The above exception was the direct cause of the following exception:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52      await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52    File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52      raise exceptions.TimeoutError() from exc
12:35:52  TimeoutError
12:35:52
12:35:52  During handling of the above exception, another exception occurred:
12:35:52
12:35:52  Traceback (most recent call last):
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52      code = await main()
12:35:52             ^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52      await run_all_tests(signaled, options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52      await reap(done, pending, signaled)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52      result = coro.result()
12:35:52               ^^^^^^^^^^^^^
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52      await test.run(options)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52      async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52    File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52      await anext(self.gen)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52      await manager.stop()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52      await self.clusters.put(self.cluster, is_dirty=True)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52      await self.destroy(obj)
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52      await cluster.stop_gracefully()
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52      await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52    File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52      raise RuntimeError(
12:35:52  RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58  sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
2023-11-09 12:30:35 +01:00
Eliran Sinvani
2a45fed0cf test.py: move to a gracefull temination of nodes on teardown
This change move existing suits which create cluster through the
testing infra to be stopped and uninstalled gracefully.
The motivation, besides the obvious advantage of testing our stop
sequence is that it will pave the way for applying code coverage support
to all tests (not only standalone unit and boost test executables).

testing:
	Ran all tests 10 times in a row in dev mode.
	Ran all tests once in release mode
	Ran all tests once in debug mode

Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
2023-10-31 13:12:49 +02:00
Kefu Chai
19e724822d test.py: pass self.suite.scylla_env to pytest process
before this change, pytest does not populate its suites's
`scylla_env` down to the forked pytest child process. this works
if the test does not care about the env variables in `scylla_env`.
but object_store is an exception, as it launches scylla instances
by itself. so, without the help of `scylla_env`, `run.find_scylla()`
always find the newest file globbed by `build/*/scylla`. this is not
always what we expect. on the contrary, if we launch object_store's
pytest using `test.py`, there are good chances that object_store
ends up with testing a wrong scylla executable if we have multiple
builds under `build/*/scylla`.

so, in this change, we populate `self.suite.scylla_env` down to
the child process created by `PythonTest`, so that all pytest
based tests can have access to its suites's env variables.
in addition to 'SCYLLA' env variable, they also include the
the env variables required by LLVM code coverage instrumentation.
this is also nice to have.

Fixes #15679
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#15682
2023-10-17 09:27:12 +03:00
Botond Dénes
56f7b2f45d test.py: add ToolTestSuite and ToolTest
A test suite for python pytests, testing tools, and hence not needing a
scylla cluster setup for them.
2023-09-14 05:25:14 -04:00
Pavel Emelyanov
375b8c6213 test.py: Add suite option to auto-dirty cluster after test
ScyllaCluster can be marked as 'dirty' which means that the cluster is
in unusable state (after test) and shouldn't be re-used by other tests
launched by test.py. For now this is only implemented via the cluster
manager class which is only available for topology tests.

Add a less flexible short-cut for cql-pytest-s via suite.yaml marking.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-09-11 17:37:48 +03:00
Kamil Braun
cdc3cd2b79 Merge 'raft: add fencing tests' from Petr Gusev
In this PR a simple test for fencing is added. It exercises the data
plane, meaning if it somehow happens that the node has a stale topology
version, then requests from this node will get an error 'stale
topology'. The test just decrements the node version manually through
CQL, so it's quite artificial. To test a more real-world scenario we
need to allow the topology change fiber to sometimes skip unavailable
nodes. Now the algorithm fails and retries indefinitely in this case.

The PR also adds some logs, and removes one seemingly redundant topology
version increment, see the commit messages for details.

Closes #14901

* github.com:scylladb/scylladb:
  test_fencing: add test_fence_hints
  test.py: output the skipped tests
  test.py: add skip_mode decorator and fixture
  test.py: add mode fixture
  hints: add debug log for dropped hints
  hints: send_one_hint: extend the scope of file_send_gate holder
  pylib: add ScyllaMetrics
  hints manager: add send_errors counter
  token_metadata: add debug logs
  fencing: add simple data plane test
  random_tables.py: add counter column type
  raft topology: don't increment version when transitioning to node_state::normal
2023-08-22 16:28:21 +02:00
Petr Gusev
3ccd2abad4 test.py: output the skipped tests
pytest option -rs forces it to print
all the skipped tests along with
the reasons. Without this option we
can't tell why certain tests were skipped,
maybe some of them shouldn't already.
2023-08-22 15:48:40 +04:00
Petr Gusev
a639d161e6 test.py: add mode fixture
Sometimes a test wants to know what mode
it is running in so that e.g. it can skip
itself in some of them.
2023-08-22 15:48:40 +04:00
Raphael S. Carvalho
b578d6643f Kill scylla option to configure number of compaction groups
The option was introduced to bootstrap the project. It's still
useful for testing, but that translates into maintaining an
additional option and code that will not be really used
outside of testing. A possible option is to later map the
option in boost tests to initial_tablets, which may yield
the same effect for testing.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-08-16 18:23:53 -03:00
Kefu Chai
0c0a59bf62 test: stop using HostRegistry in MinioServer
since MinioServer find a free port by itself, there is no need to
provide it an IP address for it anymore -- we can always use
127.0.0.1.

so, in this change, we just drop the HostRegistry parameter passed
to the constructor of MinioServer, and pass the host address in place
of it.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2023-08-09 23:40:22 +08:00
Konstantin Osipov
df97135583 test.py: forward the optional property file when creating a server
To support multi-DC tests we need to provide a property
file when creating a server.
Forward it from the test client to test.py.

Closes #14683
2023-08-02 13:45:19 +02:00
Alejo Sanchez
13e31eaeca test.py: show mode and suite name when listing tests
For --list, show also mode and suite name.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #14729
2023-07-18 09:06:47 +03:00
Kefu Chai
a871de33e6 test.py: remove redundant message in report
before this change, we would have report in Jenkins like:

```
[Info] - 1 out of 3 times failed: failed.
 == [File] - test/boost/commitlog_test.cc
 == [Line] - 298

[Info] - passed: release=1, dev=1
 == [File] - test/boost/commitlog_test.cc
 == [Line] - 298

[Info] - failed: debug=1
 == [File] - test/boost/commitlog_test.cc
 == [Line] - 298
```

the first section is rendered from the an `Info` tag,
created by `test.py`. but the ending "failed" does not
help in this context, as we already understand it's failing.
so, in this change, it is dropped.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14546
2023-07-13 11:31:13 +03:00
Avi Kivity
3b3f28fc12 test.py: report CPU utilization
Low CPU utilization is a major contributor to high test time.
Low CPU utilization can happen due to tests sleeping, or lack
of concurrency due to Amdahl's law.

Utilization is computed by dividing the utilized CPU by the available
CPU (CPU count times wall time).

Example output:

Found 134 tests.
================================================================================
[N/TOTAL]   SUITE    MODE   RESULT   TEST
------------------------------------------------------------------------------
[134/134]   boost     dev   [ PASS ] boost.json_cql_query_test.test_unpack_decimal.1
------------------------------------------------------------------------------
CPU utilization: 4.8%

Closes #14251
2023-06-18 19:33:02 +03:00
Kefu Chai
c123f4644a test.py: do not abort if fails to parse an XML logger file
there are chances that a Boost::test test fails to generate a
valid XML file after the test finishes. and
xml.etree.ElementTree.parse() throws when parsing it.
see https://github.com/scylladb/scylla-pkg/issues/3196

before this change, the exception is not handled, and test.py
aborts in this case. this does not help and could be misleading.

after this change, the exception is handled and printed.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14180
2023-06-08 11:02:01 +03:00
Kefu Chai
421331a20b test.py: consolidate multiple runs of the same test
before this change, when consolidating the boost's XML logger file,
we just practically concatenate all the tests' logger file into a single
one. sometimes, we run the tests for multiple times, and these runs share
the same TestSuite and TestCase tags. this has two sequences,

1. there is chance that only a test has both successful and failed
   runs. but jenkins' "Test Results" page cannot identify the failed
   run, it just picks a random run when one click for the detail of
   the run. as it takes the TestCase's name as part of its identifier.
   and we have multiple of them if the argument passed to the --repeat
   option is greater than 1 -- this is the case when we promote the
   "next" branch.
2. the testReport page of Jenkins' xUnit plugin created for the "next"
   job is 3 times as large as the one for the regular "scylla-ci" run.
   as all tests are repeated for 3 times. but what we really cares is
   history of a certain test not a certain run of it.

in this change, we just pick a representive run of a test if it is
repeated multiple times and add a "Message" tag for including the
summary of the runs. this should address the problems above:

1. the failed tests always stand out so we can always pinpoint it with
   Jenkins's "Test Results" page.
2. the tests are deduped by its name.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes #14069
2023-06-04 13:15:46 +03:00
Alejo Sanchez
2050a1a125 test.py: warn and skip for missing unit/boost tests
If the executable of a matching unit or boost test is not executable,
warn to console and skip.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13982
2023-05-29 23:03:24 +03:00
Alejo Sanchez
1940016cd1 test.py: warn and skip for missing unit/boost tests
If the executable of a matching unit or boost test is not present, warn
to console and skip.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13949
2023-05-22 12:49:32 +03:00
Alejo Sanchez
19687b54f1 test/pytest: yaml configuration cluster section
Separate cluster_size into a cluster section and specify this value as
initial_size.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>

Closes #13440
2023-05-15 09:48:39 +02:00
Pavel Emelyanov
fe70333c19 test: Auto-skip object-storage test cases if run from shell
In case an sstable unit test case is run individually, it would fail
with exception saying that S3_... environment is not set. It's better to
skip the test-case rather than fail. If someone wants to run it from
shell, it will have to prepare S3 server (minio/AWS public bucket) and
provide proper environment for the test-case.

refs: #13569

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes #13755
2023-05-04 14:15:18 +03:00
Pavel Emelyanov
6dbe41d277 test.py: Equip it with minio server
When test.py starts it activates a minio server inside test-dir and
configures an anonymous bucket for test cases to run on

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-04-10 16:43:01 +03:00
Nadav Har'El
ef50e4022c test: drop our "pytest" wrapper script
When Fedora 37 came out, we discovered that its "pytest" script started
to run Python with the "-s" option, which caused problems for packages
installed personally via pip. We fixed this by adding our own wrapper
script test/pytest.

But this bug (https://bugzilla.redhat.com/show_bug.cgi?id=2152171) was
already fixed in Fedora 37, and the new version already reached our
dbuild. So we no longer need this wrapper script. Let's remove it.

Fixes #12412

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes #13083
2023-03-08 07:31:37 +02:00
Nadav Har'El
2653865b34 Merge 'test.py: improve test failure handling' from Kamil Braun
Improve logging by printing the cluster at the end of each test.

Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure.

Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test.

Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do.

Closes #12652

* github.com:scylladb/scylladb:
  test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
  test/topology: don't drop random_tables keyspace after a failed test
  test/pylib: mark cluster as dirty after a failed test
  test: pylib, topology: don't perform operations after test on a dirty cluster
  test/pylib: print cluster at the end of test
2023-02-12 12:13:25 +02:00
Kamil Braun
d991f71910 test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters
`TopologyTest`s (used by `topology/` suite and friends) already relied
on the `is_dirty` flag stored in `ScyllaCluster` thanks to
`ScyllaClusterManager` (which passes the flag when returning a cluster
to the pool).

But `PythonTest`s (cql-pytest/ suite) and `CQLApprovalTest`s (cql/
suite) had different ways to decide whether a cluster should be
recycled. For example, `PythonTest` would recycle a cluster if
`after_test` raised an exception. This depended on a post-condition
check made by `after_test`: it would query the number of keyspaces and
throw an exception if it was different than when the test started. If
the cluster (which for `PythonTest` is always single-node) was dead,
this query would fail.

However, we modified the behavior of `after_test` in earlier commits -
it no longer preforms the post-condition check on dirty clusters. So
it's also no longer reliable to use the exception raised by `after_test`
to decide that we should recycle the cluster.

Unify the behavior of `PythonTest` and `CQLApprovalTest` with what
`TopologyTest` does - using the `is_dirty` flag to decide that we should
recycle a cluster. Thanks to earlier commits, this flag is set to `True`
whenever a test fails, so it should cover most cases where we want to
recycle a cluster. (The only case not currently covered is if a
non-dirty cluster crashes after we perform the keyspace post-condition
check, which seems quite improbable.)

Note that this causes us to recycle clusters more often in these tests:
previously, when a `PythonTest` or `CQLApprovalTest` failed, but the
cluster was still alive and the post-condition check passed, we would
use the cluster for the next test. Now we recycle a cluster whenever a
test that used it fails.
2023-02-03 11:49:35 +01:00
Kamil Braun
a9dbd89478 test/pylib: mark cluster as dirty after a failed test
We don't expect the cluster to be functioning at all after a failed
test. The whole cluster might have crashed, for example. In these
situations the framework would report multiple errors (one for the
actual failure, another for a failed post-condition check because the
cluster was down) which would only obscure the report and make debugging
harder. It's also not safe in general to reuse the cluster in another
test - if the test previous failed, we should not assume that it's in a
valid state.

Therefore, mark the cluster as dirty after a failed test. This will let
us recycle the cluster based on the dirty flag and it will disable
post-condition check after a failed test (which is only done on
non-dirty clusters).

To implement this in topology tests, we use the
`pytest_runtest_makereport` hook which executes after a test finishes
but before fixtures finish. There we store a test-failed flag in a stash
provided by pytest, then access the flag in the `manager` fixture.
2023-02-02 16:35:55 +01:00
Raphael S. Carvalho
e3923a9caf test.py: Add option to run scylla tests with multiple compaction groups
The tests can now optionally run with multiple groups via option
--x-log2-compaction-groups.

This includes boost tests and the ones which run against either
one (e.g. cql) or many instances (e.g. topology).

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
2023-02-01 20:17:16 -03:00
Kamil Braun
858803cc2c test/pylib: pool: replace steal with put(is_dirty=True)
The pool usage was kind of awkward previously: if the user of a pool
decided that a previously borrowed object should no longer be used,
it was their responsibility to destroy the object (releasing associated
resources and so on) and then call `steal()` on the pool to free space
for a new object.

Change the interface. Now the `Pool` constructor obtains a `destroy`
function additionally to the `build` function. The user calls the
function `put` to return both objects that are still usable and those
aren't. For the latter, they set `is_dirty=True`. The pool will
'destroy' the object with the provided function, which could mean e.g.
releasing associated resources.

For example, instead of:
```
if self.cluster.is_dirty:
    self.clusters.stop()
    self.clusters.release_ips()
    self.clusters.steal()
else:
    self.clusters.put(self.cluster)
```
we can now use:
```
self.clusters.put(self.cluster, is_dirty=self.cluster.is_dirty)
```
(assuming that `self.clusters` is a pool constructed with a `destroy`
function that stops the cluster and releases its IPs.)

Also extend the interface of the context manager obtained by
`instance()` - the user must now pass a flag `dirty_on_exception`. If
the context manager exists due to an exception and that flag was `True`,
the object will be considered dirty. The dirty flag can also be set
manually on the context manager. For example:
```
async with (cm := pool.instance(dirty_on_exception=True)) as server:
    cm.dirty = await run_test(test, server)
    # It will also be considered dirty if run_test throws an exception
```
2023-01-26 11:58:00 +01:00
Alejo Sanchez
f236d518c6 test.py: manual cluster handling for PythonSuite
Instead of complex async with logic, use manual cluster pool handling.

Revert the discard() logic in Pool from a recent commit.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-24 11:38:17 +01:00
Alejo Sanchez
a6059e4bb7 test.py: stop cluster if PythonSuite fails to start
If cluster fails to start, stop it.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-24 11:36:49 +01:00
Alejo Sanchez
dec0c1d9f6 test.py: minor fix for failed PythonSuite test
Even though test can't fail both before and after, make the logic
explicit in case code changes in the future.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-24 11:36:49 +01:00
Alejo Sanchez
51e84508ee test.py: handle broken clusters for Python suite
If the after test check fails (!is_after_test_ok), discard the cluster
and raise exception so context manager (pool) does not recycle it.

Ignore Pool exception re-raised by the context manager.

Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
2023-01-19 21:43:50 +01:00
Kamil Braun
4f7e5ee963 test/pylib: prefix cluster/manager logs with the current test name
The log file produced by test.py combines logs coming from multiple
concurrent test runs. Each test has its own log file as well, but this
"global" log file is useful when debugging problems with topology tests,
since many events related to managing clusters are stored there.

Make the logs easier to read by including information about the test case
that's currently performing operations such as adding new servers to
clusters and so on. This includes the mode, test run name and the name
of the test case.

We do this by using custom `Logger` objects (instead of calling
`logging.info` etc. which uses the root logger) with `LoggerAdapter`s
that include the prefixes. A bit of boilerplate 'plumbing' through
function parameters is required but it's mostly straightforward.

This doesn't apply to all events, e.g. boost test cases which don't
setup a "real" Scylla cluster. These events don't have additional
prefixes.

Example:
```

17:41:43.531 INFO> [dev/topology.test_topology.1] Cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) adding server...
17:41:43.531 INFO> [dev/topology.test_topology.1] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-10...
17:41:43.603 INFO> [dev/topology.test_topology.1] starting server at host 127.40.246.10 in scylla-10...
17:41:43.614 INFO> [dev/topology.test_topology.2] Cluster ScyllaCluster(name: 7a497fce-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(2, 127.40.246.2, f59d3b1d-efbb-4657-b6d5-3fa9e9ef786e), ScyllaServer(5, 127.40.246.5, 9da16633-ce53-4d32-8687-e6b4d27e71eb), ScyllaServer(9, 127.40.246.9, e60c69cd-212d-413b-8678-dfd476d7faf5), stopped: ) adding server...
17:41:43.614 INFO> [dev/topology.test_topology.2] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-11...
17:41:43.670 INFO> [dev/topology.test_topology.2] starting server at host 127.40.246.11 in scylla-11...
```
2023-01-11 10:09:39 +01:00
Kamil Braun
ff2c030bf9 test.py: include mode in ScyllaClusterManager logs
The logs often mention the test run and the current test case in a given
run, such as `test_topology.1` and
`test_topology.1::test_add_server_add_column`. However, if we run
test.py in multiple modes, the different modes might be running the same
test case and the logs become confusing. To disambiguate, prefix the
test run/case names with the mode name.

Example:
```
Leasing Scylla cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4
760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) for test dev/topology.test_topology.1::test_add_server_add_column
```
2023-01-10 17:41:54 +01:00