scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-23 01:50:35 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	9dfb26428b	test: topology: decrease the server's request timeouts We decrease the server's request timeouts in topology tests so that they are lower than the driver's timeout. Before, the driver could time out its request before the server handled it successfully. This problem caused scylladb/scylladb#15924. A high server's request timeout can slow down the topology tests (see the new comment in `make_scylla_conf`). We make the timeout dependent on the testing mode to not slow down tests for no reason. We don't touch the driver's request timeout. Decreasing it in some modes would require too much effort for almost no improvement. Fixes scylladb/scylladb#15924	2024-02-29 18:37:38 +01:00
Lakshmi Narayanan Sreethar	f8f8d64982	test.py: support skipping multiple test patterns Support skipping multiple patterns by allowing them to be passed via multiple '--skip' arguments to test.py. Example : `test.py --skip=topology --skip=sstables` Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#17220	2024-02-13 17:32:03 +02:00
Kefu Chai	f5e3a2d98e	test.py: add `boost_tests()` to suite this change is a cleanup. so it only returns tests, to be more symmetric with `junit_tests()`. this allows us to drop the dummy `get_test_case()` in `PythonTestSuite`. as only the BoostTest will be asked for `get_test_case()` after this change. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16961	2024-01-31 13:43:21 +02:00
Kefu Chai	ee28cf2285	test.py: s/defalt/default/ this typo was identified by codespell Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16980	2024-01-25 16:54:07 +02:00
Kefu Chai	35b3c51f40	test.py: add "launcher" option support before this change, all "tool" test suites use "pytest" to launch their tests. but some of the tests might need a dedicated namespace so they do not interfere with each other. fortunately, "unshare(1)" allows us to run a progame in new namespaces. in this change, we add a "launcher" option to "tool" test suites. so that these tests can run with the specified "launcher" instead of using "launcher". if "launcher" is not specified, its default value of "pytest" is used. Refs #16542 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-25 20:28:01 +08:00
Kefu Chai	a9851cf834	test.py: replace "$foo is False" with "not $foo" for better readability. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16960	2024-01-24 15:21:53 +02:00
Lakshmi Narayanan Sreethar	a1867986e7	test.py: deduce correct path for unit tests when built with cmake Fix the path deduction for unit test executables when the source code is built with cmake. Fixes #16906 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#16907	2024-01-22 10:03:44 +02:00
Eliran Sinvani	c7dff1b81b	test.py: support code coverage test.py already support the routing of coverage data into a predetermined folder under the `tmpdir` logs folder. This patch extends on that and leverage the code coverage processing libraries to produce test coverage lcov files and a coverage summary at the end of the run. The reason for not generating the full report (which can be achieved with a one liner through the `coverage_utils.py` cli) is that it is assumed that unit testing is not necessarily the "last stop" in the testing process and it might need to be joined with other coverage information that is created at other testing stages (for example dtest). The result of this patch is that when running test.py with one of the coverage options (`--coverage` / `--mode-coverage`) it will perform another step of processing and aggregating the profiling information created. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Eliran Sinvani	f4b6c9074a	test.py: support --coverage and --coverage-mode We aim to support code coverage reporting as part of our development process, to this end, we will need the ability to "route" the dumped profiles from scylla and unit test to a predetermined location. We can consider profile data as logged data that should persist after tests have been run. For this we add two supported options to test.py: --coverage - which means that all suits on all modes will participate in coverage. --coverage-mode - which can be used to "turn on" coverage support only for some of the modes in this run. The strategy chosen is to save the profile data in `tmpdir`/mode/coverage/%m.profraw (ref: https://clang.llvm.org/docs/SourceBasedCodeCoverage.html#running-the-instrumented-program) This means that for every suite the profiling data of each object is going to be merged into the same file (llvm claims to lock the file so concurrency is fine). More resolution than the suite level seems to not give us anything useful (at least not at the moment). Moreover, it can also be achieved by running a single test. Data in the suite level will help us to detect suits that don't generate coverage data at all and to fix this or to skip generating the profiles for them. Also added support of 'coverage' parameter in the `suite.yaml` file, which can be used to disable coverage for a specific suite, this parameter defaults to True but if a suite is known to not generate profiles or the suite profile data is not needed or obfuscate the result it can be set to false in order to cancel profiles routing and processing for this suite. Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2024-01-18 11:11:34 +02:00
Kefu Chai	382a5e2d0c	test.py: build using build/build.ninja when it exists CMake puts `build.ninja` under `build`, so use it if it exists, and fall back to current directory otherwise. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	6674e87842	test.py: extract ninja() use ninja() to build target using `ninja`. since CMake puts `build.ninja` under "build", while `configure.py` puts it under the root source directory, this change prepares us for a follow-up change to build with build/build.ninja. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	5fda822c4e	test.py: extract path_to() use path_to() to find the path to the directory under build directory. this change helps to find the executables built using CMake as well. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	0b11ae9fe6	test.py: define all_modes as a dict of mode:CMAKE_BUILD_TYPE because scylla build mode and CMAKE_BUILD_TYPE is not identical, let's define `all_modes` as a dict so we can look it up. this change prepares for a follow-up commit which adds a path resolver which support both build system generator: the plain `configure.py` and CMake driven by `configure.py`. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-01-10 10:01:02 +08:00
Kefu Chai	47d8edc0fc	test.py: s/asyncio.get_event_loop()/asyncio.get_running_loop()/ the latter raises a RuntimeError if there is no no running event loop, while the former gets one from the the default policy in this case. in the use cases in test.py, there is always a running event loop, when `asyncio.get_event_loop()` gets called. so let's use the preferred `asyncio.get_running_loop()`. see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.get_event_loop Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#16398	2024-01-04 08:39:49 +02:00
Konstantin Osipov	246da8884a	test.py: override SCYLLA_* env keys test.py inherits its env from the user, which is the right thing: some python modules, e.g. logging, do accept env-based configuration. However, test.py also starts subprocesses, i.e. tests, which start scylladb instances. And when the instance is started without an explicit configuration file, SCYLLA_CONF from user environment can be used. If this scylla.conf contains funny parameters, e.g. unsupported configuration options, the tests may break in an unexpected way. Avoid this by resetting the respecting env keys in test.py. Fixes gh-16583 Closes scylladb/scylladb#16577	2023-12-31 13:02:49 +02:00
Yaniv Kaul	0b0a3ee7fc	Typos: fix typos in code Last batch, hopefully, sing codespell, went over the docs and fixed some typos. Refs: https://github.com/scylladb/scylladb/issues/16255 Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com> Closes scylladb/scylladb#16388	2023-12-13 10:45:21 +02:00
Yaron Kaikov	c3ee53f3be	test.py: enable xml validation Following https://github.com/scylladb/scylladb/issues/4774#issuecomment-1752089862 Adding back xml validation Closes: https://github.com/scylladb/scylla-pkg/issues/3441 Closes scylladb/scylladb#16198	2023-11-29 09:02:36 +02:00
Kamil Braun	3bcee6a981	Revert "Merge 'Change all tests to shut down gracefully on shutdown' from Eliran Sinvani" This reverts commit `7c7baf71d5`. If `stop_gracefully` times out during test teardown phase, it crashes the test framework reporting multiple errors, for example: ``` 12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited 12:35:52 self.exit_artifacts = {} 12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback 12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:52 Traceback (most recent call last): 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for 12:35:52 return fut.result() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait 12:35:52 return await self._transport._wait() 12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait 12:35:52 return await waiter 12:35:52 ^^^^^^^^^^^^ 12:35:52 asyncio.exceptions.CancelledError 12:35:52 12:35:52 The above exception was the direct cause of the following exception: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully 12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS) 12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for 12:35:52 raise exceptions.TimeoutError() from exc 12:35:52 TimeoutError 12:35:52 12:35:52 During handling of the above exception, another exception occurred: 12:35:52 12:35:52 Traceback (most recent call last): 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789 12:35:52 code = await main() 12:35:52 ^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main 12:35:52 await run_all_tests(signaled, options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests 12:35:52 await reap(done, pending, signaled) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap 12:35:52 result = coro.result() 12:35:52 ^^^^^^^^^^^^^ 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run 12:35:52 await test.run(options) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run 12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager: 12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__ 12:35:52 await anext(self.gen) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager 12:35:52 await manager.stop() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop 12:35:52 await self.clusters.put(self.cluster, is_dirty=True) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put 12:35:52 await self.destroy(obj) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster 12:35:52 await cluster.stop_gracefully() 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully 12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values())) 12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully 12:35:52 raise RuntimeError( 12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited 12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited ```	2023-11-09 12:30:35 +01:00
Eliran Sinvani	2a45fed0cf	test.py: move to a gracefull temination of nodes on teardown This change move existing suits which create cluster through the testing infra to be stopped and uninstalled gracefully. The motivation, besides the obvious advantage of testing our stop sequence is that it will pave the way for applying code coverage support to all tests (not only standalone unit and boost test executables). testing: Ran all tests 10 times in a row in dev mode. Ran all tests once in release mode Ran all tests once in debug mode Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>	2023-10-31 13:12:49 +02:00
Kefu Chai	19e724822d	test.py: pass self.suite.scylla_env to pytest process before this change, pytest does not populate its suites's `scylla_env` down to the forked pytest child process. this works if the test does not care about the env variables in `scylla_env`. but object_store is an exception, as it launches scylla instances by itself. so, without the help of `scylla_env`, `run.find_scylla()` always find the newest file globbed by `build//scylla`. this is not always what we expect. on the contrary, if we launch object_store's pytest using `test.py`, there are good chances that object_store ends up with testing a wrong scylla executable if we have multiple builds under `build//scylla`. so, in this change, we populate `self.suite.scylla_env` down to the child process created by `PythonTest`, so that all pytest based tests can have access to its suites's env variables. in addition to 'SCYLLA' env variable, they also include the the env variables required by LLVM code coverage instrumentation. this is also nice to have. Fixes #15679 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#15682	2023-10-17 09:27:12 +03:00
Botond Dénes	56f7b2f45d	test.py: add ToolTestSuite and ToolTest A test suite for python pytests, testing tools, and hence not needing a scylla cluster setup for them.	2023-09-14 05:25:14 -04:00
Pavel Emelyanov	375b8c6213	test.py: Add suite option to auto-dirty cluster after test ScyllaCluster can be marked as 'dirty' which means that the cluster is in unusable state (after test) and shouldn't be re-used by other tests launched by test.py. For now this is only implemented via the cluster manager class which is only available for topology tests. Add a less flexible short-cut for cql-pytest-s via suite.yaml marking. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-09-11 17:37:48 +03:00
Kamil Braun	cdc3cd2b79	Merge 'raft: add fencing tests' from Petr Gusev In this PR a simple test for fencing is added. It exercises the data plane, meaning if it somehow happens that the node has a stale topology version, then requests from this node will get an error 'stale topology'. The test just decrements the node version manually through CQL, so it's quite artificial. To test a more real-world scenario we need to allow the topology change fiber to sometimes skip unavailable nodes. Now the algorithm fails and retries indefinitely in this case. The PR also adds some logs, and removes one seemingly redundant topology version increment, see the commit messages for details. Closes #14901 * github.com:scylladb/scylladb: test_fencing: add test_fence_hints test.py: output the skipped tests test.py: add skip_mode decorator and fixture test.py: add mode fixture hints: add debug log for dropped hints hints: send_one_hint: extend the scope of file_send_gate holder pylib: add ScyllaMetrics hints manager: add send_errors counter token_metadata: add debug logs fencing: add simple data plane test random_tables.py: add counter column type raft topology: don't increment version when transitioning to node_state::normal	2023-08-22 16:28:21 +02:00
Petr Gusev	3ccd2abad4	test.py: output the skipped tests pytest option -rs forces it to print all the skipped tests along with the reasons. Without this option we can't tell why certain tests were skipped, maybe some of them shouldn't already.	2023-08-22 15:48:40 +04:00
Petr Gusev	a639d161e6	test.py: add mode fixture Sometimes a test wants to know what mode it is running in so that e.g. it can skip itself in some of them.	2023-08-22 15:48:40 +04:00
Raphael S. Carvalho	b578d6643f	Kill scylla option to configure number of compaction groups The option was introduced to bootstrap the project. It's still useful for testing, but that translates into maintaining an additional option and code that will not be really used outside of testing. A possible option is to later map the option in boost tests to initial_tablets, which may yield the same effect for testing. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-08-16 18:23:53 -03:00
Kefu Chai	0c0a59bf62	test: stop using HostRegistry in MinioServer since MinioServer find a free port by itself, there is no need to provide it an IP address for it anymore -- we can always use 127.0.0.1. so, in this change, we just drop the HostRegistry parameter passed to the constructor of MinioServer, and pass the host address in place of it. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2023-08-09 23:40:22 +08:00
Konstantin Osipov	df97135583	test.py: forward the optional property file when creating a server To support multi-DC tests we need to provide a property file when creating a server. Forward it from the test client to test.py. Closes #14683	2023-08-02 13:45:19 +02:00
Alejo Sanchez	13e31eaeca	test.py: show mode and suite name when listing tests For --list, show also mode and suite name. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #14729	2023-07-18 09:06:47 +03:00
Kefu Chai	a871de33e6	test.py: remove redundant message in report before this change, we would have report in Jenkins like: ``` [Info] - 1 out of 3 times failed: failed. == [File] - test/boost/commitlog_test.cc == [Line] - 298 [Info] - passed: release=1, dev=1 == [File] - test/boost/commitlog_test.cc == [Line] - 298 [Info] - failed: debug=1 == [File] - test/boost/commitlog_test.cc == [Line] - 298 ``` the first section is rendered from the an `Info` tag, created by `test.py`. but the ending "failed" does not help in this context, as we already understand it's failing. so, in this change, it is dropped. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14546	2023-07-13 11:31:13 +03:00
Avi Kivity	3b3f28fc12	test.py: report CPU utilization Low CPU utilization is a major contributor to high test time. Low CPU utilization can happen due to tests sleeping, or lack of concurrency due to Amdahl's law. Utilization is computed by dividing the utilized CPU by the available CPU (CPU count times wall time). Example output: Found 134 tests. ================================================================================ [N/TOTAL] SUITE MODE RESULT TEST ------------------------------------------------------------------------------ [134/134] boost dev [ PASS ] boost.json_cql_query_test.test_unpack_decimal.1 ------------------------------------------------------------------------------ CPU utilization: 4.8% Closes #14251	2023-06-18 19:33:02 +03:00
Kefu Chai	c123f4644a	test.py: do not abort if fails to parse an XML logger file there are chances that a Boost::test test fails to generate a valid XML file after the test finishes. and xml.etree.ElementTree.parse() throws when parsing it. see https://github.com/scylladb/scylla-pkg/issues/3196 before this change, the exception is not handled, and test.py aborts in this case. this does not help and could be misleading. after this change, the exception is handled and printed. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14180	2023-06-08 11:02:01 +03:00
Kefu Chai	421331a20b	test.py: consolidate multiple runs of the same test before this change, when consolidating the boost's XML logger file, we just practically concatenate all the tests' logger file into a single one. sometimes, we run the tests for multiple times, and these runs share the same TestSuite and TestCase tags. this has two sequences, 1. there is chance that only a test has both successful and failed runs. but jenkins' "Test Results" page cannot identify the failed run, it just picks a random run when one click for the detail of the run. as it takes the TestCase's name as part of its identifier. and we have multiple of them if the argument passed to the --repeat option is greater than 1 -- this is the case when we promote the "next" branch. 2. the testReport page of Jenkins' xUnit plugin created for the "next" job is 3 times as large as the one for the regular "scylla-ci" run. as all tests are repeated for 3 times. but what we really cares is history of a certain test not a certain run of it. in this change, we just pick a representive run of a test if it is repeated multiple times and add a "Message" tag for including the summary of the runs. this should address the problems above: 1. the failed tests always stand out so we can always pinpoint it with Jenkins's "Test Results" page. 2. the tests are deduped by its name. Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes #14069	2023-06-04 13:15:46 +03:00
Alejo Sanchez	2050a1a125	test.py: warn and skip for missing unit/boost tests If the executable of a matching unit or boost test is not executable, warn to console and skip. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13982	2023-05-29 23:03:24 +03:00
Alejo Sanchez	1940016cd1	test.py: warn and skip for missing unit/boost tests If the executable of a matching unit or boost test is not present, warn to console and skip. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13949	2023-05-22 12:49:32 +03:00
Alejo Sanchez	19687b54f1	test/pytest: yaml configuration cluster section Separate cluster_size into a cluster section and specify this value as initial_size. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com> Closes #13440	2023-05-15 09:48:39 +02:00
Pavel Emelyanov	fe70333c19	test: Auto-skip object-storage test cases if run from shell In case an sstable unit test case is run individually, it would fail with exception saying that S3_... environment is not set. It's better to skip the test-case rather than fail. If someone wants to run it from shell, it will have to prepare S3 server (minio/AWS public bucket) and provide proper environment for the test-case. refs: #13569 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes #13755	2023-05-04 14:15:18 +03:00
Pavel Emelyanov	6dbe41d277	test.py: Equip it with minio server When test.py starts it activates a minio server inside test-dir and configures an anonymous bucket for test cases to run on Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2023-04-10 16:43:01 +03:00
Nadav Har'El	ef50e4022c	test: drop our "pytest" wrapper script When Fedora 37 came out, we discovered that its "pytest" script started to run Python with the "-s" option, which caused problems for packages installed personally via pip. We fixed this by adding our own wrapper script test/pytest. But this bug (https://bugzilla.redhat.com/show_bug.cgi?id=2152171) was already fixed in Fedora 37, and the new version already reached our dbuild. So we no longer need this wrapper script. Let's remove it. Fixes #12412 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes #13083	2023-03-08 07:31:37 +02:00
Nadav Har'El	2653865b34	Merge 'test.py: improve test failure handling' from Kamil Braun Improve logging by printing the cluster at the end of each test. Stop performing operations like attempting queries or dropping keyspaces on dirty clusters. Dirty clusters might be completely dead and these operations would only cause more "errors" to happen after a failed test, making it harder to find the real cause of failure. Mark cluster as dirty when a test that uses it fails - after a failed test, we shouldn't assume that the cluster is in a usable state, so we shouldn't reuse it for another test. Rely on the `is_dirty` flag in `PythonTest`s and `CQLApprovalTest`s, similarly to what `TopologyTest`s do. Closes #12652 * github.com:scylladb/scylladb: test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters test/topology: don't drop random_tables keyspace after a failed test test/pylib: mark cluster as dirty after a failed test test: pylib, topology: don't perform operations after test on a dirty cluster test/pylib: print cluster at the end of test	2023-02-12 12:13:25 +02:00
Kamil Braun	d991f71910	test.py: rely on ScyllaCluster.is_dirty flag for recycling clusters `TopologyTest`s (used by `topology/` suite and friends) already relied on the `is_dirty` flag stored in `ScyllaCluster` thanks to `ScyllaClusterManager` (which passes the flag when returning a cluster to the pool). But `PythonTest`s (cql-pytest/ suite) and `CQLApprovalTest`s (cql/ suite) had different ways to decide whether a cluster should be recycled. For example, `PythonTest` would recycle a cluster if `after_test` raised an exception. This depended on a post-condition check made by `after_test`: it would query the number of keyspaces and throw an exception if it was different than when the test started. If the cluster (which for `PythonTest` is always single-node) was dead, this query would fail. However, we modified the behavior of `after_test` in earlier commits - it no longer preforms the post-condition check on dirty clusters. So it's also no longer reliable to use the exception raised by `after_test` to decide that we should recycle the cluster. Unify the behavior of `PythonTest` and `CQLApprovalTest` with what `TopologyTest` does - using the `is_dirty` flag to decide that we should recycle a cluster. Thanks to earlier commits, this flag is set to `True` whenever a test fails, so it should cover most cases where we want to recycle a cluster. (The only case not currently covered is if a non-dirty cluster crashes after we perform the keyspace post-condition check, which seems quite improbable.) Note that this causes us to recycle clusters more often in these tests: previously, when a `PythonTest` or `CQLApprovalTest` failed, but the cluster was still alive and the post-condition check passed, we would use the cluster for the next test. Now we recycle a cluster whenever a test that used it fails.	2023-02-03 11:49:35 +01:00
Kamil Braun	a9dbd89478	test/pylib: mark cluster as dirty after a failed test We don't expect the cluster to be functioning at all after a failed test. The whole cluster might have crashed, for example. In these situations the framework would report multiple errors (one for the actual failure, another for a failed post-condition check because the cluster was down) which would only obscure the report and make debugging harder. It's also not safe in general to reuse the cluster in another test - if the test previous failed, we should not assume that it's in a valid state. Therefore, mark the cluster as dirty after a failed test. This will let us recycle the cluster based on the dirty flag and it will disable post-condition check after a failed test (which is only done on non-dirty clusters). To implement this in topology tests, we use the `pytest_runtest_makereport` hook which executes after a test finishes but before fixtures finish. There we store a test-failed flag in a stash provided by pytest, then access the flag in the `manager` fixture.	2023-02-02 16:35:55 +01:00
Raphael S. Carvalho	e3923a9caf	test.py: Add option to run scylla tests with multiple compaction groups The tests can now optionally run with multiple groups via option --x-log2-compaction-groups. This includes boost tests and the ones which run against either one (e.g. cql) or many instances (e.g. topology). Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2023-02-01 20:17:16 -03:00
Kamil Braun	858803cc2c	test/pylib: pool: replace `steal` with `put(is_dirty=True)` The pool usage was kind of awkward previously: if the user of a pool decided that a previously borrowed object should no longer be used, it was their responsibility to destroy the object (releasing associated resources and so on) and then call `steal()` on the pool to free space for a new object. Change the interface. Now the `Pool` constructor obtains a `destroy` function additionally to the `build` function. The user calls the function `put` to return both objects that are still usable and those aren't. For the latter, they set `is_dirty=True`. The pool will 'destroy' the object with the provided function, which could mean e.g. releasing associated resources. For example, instead of: ``` if self.cluster.is_dirty: self.clusters.stop() self.clusters.release_ips() self.clusters.steal() else: self.clusters.put(self.cluster) ``` we can now use: ``` self.clusters.put(self.cluster, is_dirty=self.cluster.is_dirty) ``` (assuming that `self.clusters` is a pool constructed with a `destroy` function that stops the cluster and releases its IPs.) Also extend the interface of the context manager obtained by `instance()` - the user must now pass a flag `dirty_on_exception`. If the context manager exists due to an exception and that flag was `True`, the object will be considered dirty. The dirty flag can also be set manually on the context manager. For example: ``` async with (cm := pool.instance(dirty_on_exception=True)) as server: cm.dirty = await run_test(test, server) # It will also be considered dirty if run_test throws an exception ```	2023-01-26 11:58:00 +01:00
Alejo Sanchez	f236d518c6	test.py: manual cluster handling for PythonSuite Instead of complex async with logic, use manual cluster pool handling. Revert the discard() logic in Pool from a recent commit. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:38:17 +01:00
Alejo Sanchez	a6059e4bb7	test.py: stop cluster if PythonSuite fails to start If cluster fails to start, stop it. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:36:49 +01:00
Alejo Sanchez	dec0c1d9f6	test.py: minor fix for failed PythonSuite test Even though test can't fail both before and after, make the logic explicit in case code changes in the future. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-24 11:36:49 +01:00
Alejo Sanchez	51e84508ee	test.py: handle broken clusters for Python suite If the after test check fails (!is_after_test_ok), discard the cluster and raise exception so context manager (pool) does not recycle it. Ignore Pool exception re-raised by the context manager. Signed-off-by: Alejo Sanchez <alejo.sanchez@scylladb.com>	2023-01-19 21:43:50 +01:00
Kamil Braun	4f7e5ee963	test/pylib: prefix cluster/manager logs with the current test name The log file produced by test.py combines logs coming from multiple concurrent test runs. Each test has its own log file as well, but this "global" log file is useful when debugging problems with topology tests, since many events related to managing clusters are stored there. Make the logs easier to read by including information about the test case that's currently performing operations such as adding new servers to clusters and so on. This includes the mode, test run name and the name of the test case. We do this by using custom `Logger` objects (instead of calling `logging.info` etc. which uses the root logger) with `LoggerAdapter`s that include the prefixes. A bit of boilerplate 'plumbing' through function parameters is required but it's mostly straightforward. This doesn't apply to all events, e.g. boost test cases which don't setup a "real" Scylla cluster. These events don't have additional prefixes. Example: ``` 17:41:43.531 INFO> [dev/topology.test_topology.1] Cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) adding server... 17:41:43.531 INFO> [dev/topology.test_topology.1] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-10... 17:41:43.603 INFO> [dev/topology.test_topology.1] starting server at host 127.40.246.10 in scylla-10... 17:41:43.614 INFO> [dev/topology.test_topology.2] Cluster ScyllaCluster(name: 7a497fce-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(2, 127.40.246.2, f59d3b1d-efbb-4657-b6d5-3fa9e9ef786e), ScyllaServer(5, 127.40.246.5, 9da16633-ce53-4d32-8687-e6b4d27e71eb), ScyllaServer(9, 127.40.246.9, e60c69cd-212d-413b-8678-dfd476d7faf5), stopped: ) adding server... 17:41:43.614 INFO> [dev/topology.test_topology.2] installing Scylla server in /home/kbraun/dev/scylladb/testlog/dev/scylla-11... 17:41:43.670 INFO> [dev/topology.test_topology.2] starting server at host 127.40.246.11 in scylla-11... ```	2023-01-11 10:09:39 +01:00
Kamil Braun	ff2c030bf9	test.py: include mode in ScyllaClusterManager logs The logs often mention the test run and the current test case in a given run, such as `test_topology.1` and `test_topology.1::test_add_server_add_column`. However, if we run test.py in multiple modes, the different modes might be running the same test case and the logs become confusing. To disambiguate, prefix the test run/case names with the mode name. Example: ``` Leasing Scylla cluster ScyllaCluster(name: 7a414ffc-903c-11ed-bafb-f4d108a9e4a3, running: ScyllaServer(1, 127.40.246.1, 29c4ec73-8912-45ca-ae19-8bfda701a6b5), ScyllaServer(4, 127.40.246.4, 75ae2afe-ff9b-4 760-9e19-cd0ed8d052e7), ScyllaServer(7, 127.40.246.7, 67a27df4-be63-4b4c-a70c-aeac0506304f), stopped: ) for test dev/topology.test_topology.1::test_add_server_add_column ```	2023-01-10 17:41:54 +01:00

1 2 3 4 5 ...

480 Commits