cql-pytest contains subdirectories with tests
ported from Cassandra. It's desirable to preserve
the same layout and file names for these tests as in
the original source tree. To do that, add support for recursive
search of tests to PythonTestSuite. The log files for
the tests which are found recursively are created in subdirs
of the test tmpdir.
While implementing the feature, switch to using pathlib,
since a) it supports rglob (recursive glob) and b) it
was requested in one of the earlier reviews.
Closes#11018
Add test and server logs, as well as the unidiff, to
XML output. This makes jenkins reports nicer.
While on it, debug & fix bugs in handling of flaky tests:
- the reset would reset a flaky test even after the last attempt
fails, so it would be impossible to see what happened to it
- the args needed to be reset as well, since execution modifies
them
- we would say that we're going to retry the flaky test when in
fact it was the last attempt to run it and no more retries were
planned
Today, if you want to reproduce a rare condition using the same RNG seed
reported, you cannot use test.py which provides useful infrastructure
and will have to run the tests manually instead.
So let's extend test.py to allow optional forwarding of RNG seed to
boost tests only, as other suites don't support the seed option.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Message-Id: <20220615223657.142110-1-raphaelsc@scylladb.com>
The idea is that a flaky test can be marked as flaky
rather than disabled to make sure it passes in CI.
This reduces chances of a regression being added
while the flakiness is being resolved and the number
of disabled tests doesn't grow.
Introduce reset() hierarchy, which is similar to __init__(),
i.e. allows to reset test execution state before retrying it.
Useful for retrying flaky tests.
If output is a not a tty, verbose is set automatically.
If the output is a tty, one has to request --verbose.
However, a part of test.py verbosity was ignoring --verbose
and looking only at the terminal type.
Before this patch, approval tests (test/cql/*) were using a C++
application called cql_repl, which is a seastar app running
Scylla, reading commands from the standard input and producing
results in Json format in the standard output. The rationale for
this was to avoid running a standalone Scylla which could leak
more resources such as open sockets.
Now that other suites already start and stop Scylla servers, it
makes more sense to run CQL commands in approval tests against an
existing running server. It saves us from building a one more
binary and allows to better format the output. Specifically, we
would like to see Scylla output in tabular format in approval
tests, which is difficult to do when C++ formatting libraries
are used.
Track running tests in the suite.
Cleanup after each suite (after all tests
in the suite end).
Cleanup all artifacts before exit. Don't drop server logs if
there is at least one failed test.
Search for test cases in parallel.
This speeds up the search for test cases from 30 to 4-5
seconds in absence of test case cache and from 4 to 3
seconds if case cache is present.
test.py runs each unit test's test case in a separate process.
The list of test cases is built at start, by running --list-cases
for each unit test. The output is cached, so that if one uses --repeat
option, we don't list the cases again and again.
The cache, however, was only useful for --repeat, because it was only
caching the last tests' output, not all tests output, so if I, for example,
run tests like:
./test.py foo bar foo
.. the cache was unused. Make the cache global which simplifies its
logic and makes it work in more cases.
To run tests in a given mode we will need to start off scylla
clusters, which we would want to pool and reuse between many tests.
TestSuite class was designed to share resources of common tests.
One can't pool together scylla servers compiled with different
tests, so create an own TestSuite instance for each mode.
It's good practice to use linters and style formatters for
all scripted languages. Python community is more strict
about formatting guidelines than others, and using
formatters (like flake8 or black) is almost universally
accepted.
test.py was adhering to flake8 standards at some point,
but later this was spoiled by random commits.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
The test suite names seen by Jenkins are suboptimal: there is
no distinction between modes, and the ".cc" suffix of file names
is interpreted as a class name, which is converted to a tree node
that must be clicked to expand. Massage the names to remove
unnecessary information and add the mode.
Closes#9696
The recent parallelization of boost unit tests caused an increase
in xml result files. This is challenging to Jenkins, since it
appears to use rpc-over-ssh to read the result files, and as a result
it takes more than an hour to read all result files when the Jenkins
main node is not on the same continent as the agent.
To fix this, merge the result files in test.py and leave one result
file per mode. Later we can leave one result file overall (integrating
the mode into the testsuite name), but that can wait.
Tested on a local Jenkins instance (just reading the result files,
not the entire build).
Closes#9668
The option accepts taskset-style cpulist and limits the launched tests
respectively. When specified, the default number of jobs is adjusted
accordingly, if --jobs is given it overrides this "default" as expected.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Next patch will need to know if the --jobs option was specified or the
caller is OK with the default. One way to achieve it is to keep 0 as the
default and set the default value afterwards.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There were few missing bits before making this the default.
- default max number of AIOs, now tests are run with the greatly
reduced value
- 1.5 hours single case from database_test, now it's split and
scales with --parallel-cases
- suite add_test methods called in a loop for --repeat options,
patch #1 from this set fixes it
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The add_test method of a siute can be called several times in a
row e.g. in case of --repeat option or because there are more
than one custom_args entries in the suite.yaml file. In any case
it's pointless to re-collect the test cases by launching the
test binary again, it's much faster (and 100% safe) to keep the
list of cases from the previous call and re-use it if the test
name matches.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Seastar's default limit of 10,000 iocbs per shard is too low for
some workload (it places an upper bound on the number of idle
connections, above which a crash occurs). Use the new Seastar
feature to raise the default to 50000.
Also multiply the global reservation by 5, and round it upwards
so the number is less weird. This prevents io_setup() from failing.
For tests, the reservation is reduced since they don't create large
numbers of connections. This reduces surprise test failures when they
are run on machines that haven't been adjusted.
Fixes#9051Closes#9052
The parallelizm is acheived by listing the content of each (boost)
test and by adding a test for each case found appending the
'--run_test={case_name}' option.
Also few tests (logallog and memtable) have cases that depend on
each other (the former explicitly stated this in the head comment),
so these are marked as "no_parallel_cases" in the suite.yaml file.
In dev mode tests need 2m:5s to run by default. With parallelizm
(and updated long-running tests list) -- 1m 35s.
In debug mode there are 6 slow _cases_ that overrun 30 minutes.
They finish last and deserve some special (incremental) care. All
the other tests run ~1h by default vs ~25m in parallel.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This means adding the casename argument to its describing class
and handling it:
1. appending to the shortname
2. adding the --run_test= argument to boost args
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method in question is in charge of creating a single
entry in the list of tests to be run. The BoostTestSuite's
method is about to create several entries and this patch
prepares it for this:
- makes it distinguish individual arguments
- lets it select the test.id value itself
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When running tests in parallel-cases mode the test.uname must
include the case name to make different log and xml files for
different runs and to show which exact case is run when shown
by the tabular-output. At the same time the test shortname
identifies the binary with the whole test.
This patch makes class Test treat the shortname argument as
a dot-separated string where the 0th component is the binary
with the test and the rest is how test identifies itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This change solves several issues that would arise with the
case-by-case run.
First, the currently printed name is "$binary_name.$id". For
case-by-case run the binary name would coinside for many cases
and it will be inconvenient to identify the test case. So
the tests uname is printed instead.
Second, the tests uname doesn't contain suite name (unlike the
test binary name which does), so this patch also adds the
explicit suite name back as a separate column (like MODE)
Third, the testname + casename string length will be far above
the allocated 50 characters, so the test name is moved at the
tail of the line.
Fourth, the total number of cases is 2100+, the field of 7
characters is not enough to print it, so it's extended.
Finally the test.py output would look like this for parallel run:
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/2108] raft dev [ PASS ] etcd_test.test_progress_leader.40 0.06s
[2/2108] raft dev [ PASS ] etcd_test.test_vote_from_any_state.45 0.03s
[3/2108] raft dev [ PASS ] etcd_test.test_progress_flow_control.43 0.04s
[4/2108] raft dev [ PASS ] etcd_test.test_progress_resume_by_append_resp.41 0.05s
[5/2108] raft dev [ PASS ] etcd_test.test_leader_election_overwrite_newer_logs.44 0.04s
[6/2108] raft dev [ PASS ] etcd_test.test_progress_paused.42 0.05s
[7/2108] raft dev [ PASS ] etcd_test.test_log_replication_2.47 0.06s
...
or like this for regular:
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[1/184] raft dev [ PASS ] fsm_test.41 0.06s
[2/184] raft dev [ PASS ] etcd_test.40 0.06s
[3/184] cql dev [ PASS ] cassandra_cql_test.2 1.87s
[4/184] unit dev [ PASS ] btree_stress_test.30 1.82s
...
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Since fce124bd90 ("Merge "Introduce flat_mutation_reader_v2" from
Tomasz") tests involving mutation_reader are a lot slower due to
the new API testing. On slower machines it's enough to time out.
Work underway to improve the situation, and it will also revert back
to the original timing once the flat_mutation_reader_v2 work is done,
but meanwhile, increase the timeout.
Closes#9046
Instead of attempting to universally set the proper environment
necessary for tests to generate profiling data such that coverage.py can
process it, allow each Test subclass to set up the environment as needed
by the specific Test variant.
With this we now have support for all current test types, including cql,
cql-pytest and alternator tests.
* Add ability to skip tests in individual modes using "skip_in_<mode>".
* Add ability to allow tests in specific modes using "run_in_<mode>".
* Rename "skip_in_debug_mode" to "skip_in_debug_modes", because there
is an actual mode named "debug" and this is confusing.
Signed-off-by: Michael Livshin <michael.livshin@scylladb.com>
Add support for the newly added coverage mode. When --mode=coverage,
also invoke the coverage generation report script to produce a coverage
report after having run the tests.
There are still some rough edges, alternator and cql tests don't work.
8a8589038c ("test: increase quota for tests to 6GB") increased
the quota for tests from 2GB to 6GB. I later found that the increased
requirement is related to the page size: Address Sanitizer allocates
at least a page per object, and so if the page size is larger the
memory requirement is also larger.
Make use of this by only increasing the quota if the page size
is greater than 4096 (I've only seen 4096 and 65536 in the wild).
This allows greater parallelism when the page size is small.
Closes#8371
Tests are short-lived and use a small amount of data. They
are also often run repeatly, and the data is deleted immediately
after the test. This is a good scenario for using the kernel page
cache, as it can cache read-only data from test to test, and avoid
spilling write data to disk if it is deleted quickly.
Acknowledge this by using the new --kernel-page-cache option for
tests.
This is expected to help on large machines, where the disk can be
overloaded. Smaller machines with NVMe disks probably will not see
a difference.
Closes#8347
The patch which introduces build-dependent testing
has a regression: it quietly filters out all tests
which are not part of ninja output. Since ninja
doesn't build any CQL tests (including CQL-pytest),
all such tests were quietly disabled.
Fix the regression by only doing the filtering
in unit and boost test suites.
test: dev (unit), dev + --build-raft
Message-Id: <20201119224008.185250-1-kostja@scylladb.com>