Test identifiers are very unique, but this makes them less
useful in Jenkins Test Result Analyzer view. For example,
counter_test can be counter_test.432 in one run and counter_test.442
in another. Jenkins considers them different and so we don't see
a trend.
Limit the id uniqueness within a test case, so that we'll have
counter_test.{1, 2, 3} consistently. Those test will be grouped
together so we can see pass/fail trends.
Closes#11946
`ScyllaServer`s were constructed without IP addresses. They leased an IP
address from `HostRegistry` and released them in `uninstall`.
This responsibility was now moved into `ScyllaCluster`, which leases an
IP address for a server before constructing it, and passes it to the
constructor. It releases the addresses of its serverswhen uninstalling
itself.
This will allow the cluster to reuse the IP address of an existing
server in that cluster when adding a new server which wants to replace
the existing one. Instead of leasing a new address, it will pass
the existing IP address to the new server's constructor.
The refactor is also nice in that it establishes an invariant for
`ScyllaServer`, simplifying reasoning about the class: now it has
an `ip_addr` field at all times.
`host_registry` was moved from `ScyllaServer` to `ScyllaCluster`.
`ScyllaCluster` constructor takes a function `create_server` which
itself takes 3 parameters now. Soon it will take a 4th. The list of
parameters is repeated at the constructor definition and the call site
of the constructor, with many parameters it begins being tiresome.
Refactor the list of parameters to a `NamedTuple`.
`self.artifacts` was calling `ScyllaServer.stop` and
`ScyllaServer.uninstall`. Now it calls `ScyllaCluster.stop` and
`ScyllaCluster.uninstall`, which underneath stops/uninstalls
servers in this cluster.
We must be a bit more careful now in case installing/starting a
server inside a cluster fails: there are no server cleanup artifacts,
and a server is added to cluster's `running` map only after
`install_and_start` finishes (until that happens,
`ScyllaCluster.stop/uninstall` won't catch this server).
So handle failures explicitly in `install_and_start`.
This commit does not logically change how the tests are running - every
started server belongs to some cluster, so it will be cleaned up
- but it's an important refactor.
It will allow us to move IP address (de)allocation code outside
`ScyllaServer`, into `ScyllaCluster`, which in turn will allow us to
implement node replace operation for the case where we want to reuse
the replaced node's IP.
Also, `ScyllaCluster.uninstall` was unused before this change, now it's
used.
Some tests may take longer than a few seconds to run. We want to
mark such tests in some way, so that we can run them selectively.
This patch proposes to use pytest markers for this. The markers
from the test.py command line are passed to pytest
as is via the -m parameter.
By default, the marker filter is not applied and all tests
will be run without exception. To exclude e.g. slow tests
you can write --markers 'not slow'.
The --markers parameter is currently only supported
by Python tests, other tests ignore it. We intend to
support this parameter for other types of tests in the future.
Another possible improvement is not to run suites for which
all tests have been filtered out by markers. The markers are
currently handled by pytest, which means that the logic in
test.py (e.g., running a scylla test cluster) will be run
for such suites.
Closes#11713
Modern (as of Fedora 37) pytest has the "-sP" flags in the Python command
line, as found in /usr/bin/pytest. This means it will reject the
site-packages directory, where we install the Scylla Python driver. This
causes all the tests to fail.
Work around it by supplying an alternative pytest script that does not
have this change.
Closes#11764
Instead of `test.py.log`, use:
`test.py.dev.log`
when running with `--mode dev`,
`test.py.dev-release.log`
when running with `--mode dev --mode release`,
and so on.
This is useful in Jenkins which is running test.py multiple times in
different modes; a later run would overwrite a previous run's test.py
file. With this change we can preserve the test.py files of all of these
runs.
Closes#11678
Fix the type of `create_server`, rename `topology_for_class` to `get_cluster_factory`, simplify the suite definitions and parameters passed to `get_cluster_factory`
Closes#11590
* github.com:scylladb/scylladb:
test.py: replace `topology` with `cluster_size` in Topology tests
test.py: rename `topology_for_class` to `get_cluster_factory`
test/pylib: ScyllaCluster: fix create_server parameter type
First, a reminder of a few basic concepts in Scylla:
- "topology" is a mapping: for each node, its DC and Rack.
- "replication strategy" is a method of calculating replica sets in
a cluster. It is not a cluster-global property; each keyspace can have
a different replication strategy. A cluster may have multiple
keyspaces.
- "cluster size" is the number of nodes in a cluster.
Replication strategy is orthogonal to topology. Cluster size can be
derived from topology and is also orthogonal to replication strategy.
test.py was confusing the three concepts together. For some reason,
Topology suites were specifying a "topology" parameter which contained
replication strategy details - having nothing to do with topology. Also
it's unclear why a test suite would specify anything to do with
replication strategies - after all, a test may create keyspaces with
different replication strategies, and a suite may contain multiple
different tests.
Get rid of the "topology" parameter, replace it with a simple
"cluster_size". In the future we may re-introduce it when we actually
implement the possibility to start clusters with custom topologies
(which involves configuring the snitch etc.) Simplify the test.py code.
The previous name had nothing to do with what the function calculated
and returned (it returned a `create_cluster` function; the standard name
for a function that constructs objects would be 'factory', so
`get_cluster_factory` is an appropriate name for a function that returns
cluster factories).
The only usage of `ScyllaCluster` constructor passed a `create_server`
function which expected a `List[str]` for the second parameter, while
the constructor specified that the function should expect an
`Optional[List[str]]`. There was no reason for the latter, we can easily
fix this type error.
Also give a type hint for `create_cluster` function in
`PythonTestSuite.topology_for_class`. This is actually what catched the
type error.
Start ScyllaClusterManager within error handling so the ScyllaCluster
logs are available in case of error starting up.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Previously, if the suite.yaml file provided
`extra_scylla_config_options` but didn't provide values for `authorizer`
or `authenticator` inside the config options, the harness wouldn't give
any defaults for these keys. It would only provide defaults for these
keys if suite.yaml didn't specify `extra_scylla_config_options` at all.
It makes sense to give the user the ability to provide extra options
while relying on harness defaults for `authenticator` and `authorizer`
if the user doesn't care about them.
Enable pytest log capture for Python suite. This will help debugging
issues in remote machines.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
- Remove `ScyllaCluster.__getitem__()` (pending request by @kbr- in a previous pull request), for this remove all direct access to servers from caller code
- Increase Python driver timeouts (req by @nyh)
- Improve `ManagerClient` API requests: use `http+unix://<sockname>/<resource>` instead of `http://localhost/<resource>` and callers of the helper method only pass the resource
- Improve lint and type hints
Closes#11305
* github.com:scylladb/scylladb:
test.py: remove ScyllaCluster.__getitem__()
test.py: ScyllaCluster check kesypace with any server
test.py: ScyllaCluster server error log method
test.py: ScyllaCluster read_server_log()
test.py: save log point for all running servers
test.py: ScyllaCluster provide endpoint
test.py: build host param after before_test
test.py: manager client disable lint warnings
test.py: scylla cluster lint and type hint fixes
test.py: increase more timeouts
test.py: ManagerClient improve API HTTP requests
Provide server error logs to caller (test.py).
Avoids direct access to list of servers.
To be done later: pick the failed server. For now it just provides the
log of one server.
While there, fix type hints.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Instead of accessing the first server, now test.py asks ScyllaCluster
for the server log.
In a later commit, ScyllaCluster will pick the appropriate server.
Also removes another direct access to the list of servers we want to get
rid of.
For error reporting, before a test a mark of the log point in time is
saved. Previously, only the log of the first server was saved. Now it's
done for all running servers.
While there, remove direct access to servers on test.py.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
If no server started, there is no server in the cluster list. So only
build the pytest --host param after before_test check is done.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Preparing for topology tests with changing clusters, run before and
after checks per test case.
Change scope of pytest fixtures to function as we need them per test
casse.
Add server and client API logic.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Add an API via Unix socket to Manager so pytests can query information
about the cluster. Requests are managed by ManagerClient helper class.
The socket is placed inside a unique temporary directory for the
Manager (as safe temporary socket filename is not possible in Python).
Initial API services are manager up, cluster up, if cluster is dirty,
cql port, configured replicas (RF), and list of host ids.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Instead of only using last started server as seed, use all started
servers as seed for new servers.
This also avoids tracking last server's state.
Pass empty list instead of None.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Preparing to cycle clusters modified (dirty) and use multiple clusters
per topology pytest, introduce Topology tests and Manager class to
handle clusters.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
For scylla servers, keep default PasswordAuthenticator and
CassandraAuthorizer but allow this to be configurable per test suite.
Use AllowAll* for topology test suite.
Disabling authentication avoids complications later for topology tests
as system_auth kespace starts with RF=1 and tests take down nodes. The
keyspace would need to change RF and run repair. Using AllowAll avoids
this problem altogether.
A different cql fixture is created without auth for topology tests.
Topology tests require servers without auth from scylla.yaml conf.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
The code in test.py using a ScyllaCluster is getting a server id and
taking logs from only the first server.
If there is a failure in another server it's not reported properly.
And CQL connection will go only to the first server.
Also, it might be better to have ScyllaCluster to handle these matters
and be more opaque.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Previously, if pytest itself failed (e.g. bad import or unexpected
parameter), there was no output file but test.py tried to copy it and
failed.
Change the logic of handling the output file to first check if the
file is there. Then if it's worth keeping it, *move* it to the test
directory for easier comparison and maintenance. Else, if it's not worth
keeping, discard it.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#11193
Fix mixing of log filename and log summary in error reporting for
CQLApprovalTest and PythonTest.
Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>
Closes#11125
cql-pytest contains subdirectories with tests
ported from Cassandra. It's desirable to preserve
the same layout and file names for these tests as in
the original source tree. To do that, add support for recursive
search of tests to PythonTestSuite. The log files for
the tests which are found recursively are created in subdirs
of the test tmpdir.
While implementing the feature, switch to using pathlib,
since a) it supports rglob (recursive glob) and b) it
was requested in one of the earlier reviews.
Closes#11018
Add test and server logs, as well as the unidiff, to
XML output. This makes jenkins reports nicer.
While on it, debug & fix bugs in handling of flaky tests:
- the reset would reset a flaky test even after the last attempt
fails, so it would be impossible to see what happened to it
- the args needed to be reset as well, since execution modifies
them
- we would say that we're going to retry the flaky test when in
fact it was the last attempt to run it and no more retries were
planned
Today, if you want to reproduce a rare condition using the same RNG seed
reported, you cannot use test.py which provides useful infrastructure
and will have to run the tests manually instead.
So let's extend test.py to allow optional forwarding of RNG seed to
boost tests only, as other suites don't support the seed option.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220615223657.142110-1-raphaelsc@scylladb.com>
The idea is that a flaky test can be marked as flaky
rather than disabled to make sure it passes in CI.
This reduces chances of a regression being added
while the flakiness is being resolved and the number
of disabled tests doesn't grow.
Introduce reset() hierarchy, which is similar to __init__(),
i.e. allows to reset test execution state before retrying it.
Useful for retrying flaky tests.
If output is a not a tty, verbose is set automatically.
If the output is a tty, one has to request --verbose.
However, a part of test.py verbosity was ignoring --verbose
and looking only at the terminal type.