mirror of
https://github.com/scylladb/scylladb.git
synced 2026-05-02 06:05:53 +00:00
`ScyllaClusterManager` is used to run a sequence of test cases from a single test file. Between two consecutive tests, if the previous test left the cluster 'dirty', meaning the cluster cannot be reused, it would free up space in the pool (using `steal`), stop the cluster, then get a new cluster from the pool. Between the `steal` and the `get`, a concurrent test run (with its own instance of `ScyllaClusterManager` would start, because there was free space in the pool. This resulted in undesirable behavior when we ran tests with `--repeat X` for a large `X`: we would start with e.g. 4 concurrent runs of a test file, because the pool size was 4. As soon as one of the runs freed up space in the pool, we would start another concurrent run. Soon we'd end up with 8 concurrent runs. Then 16 concurrent runs. And so on. We would have a large number of concurrent runs, even though the original 4 runs didn't finish yet. All of these concurrent runs would compete waiting on the pool, and waiting for space in the pool would take longer and longer (the duration is linear w.r.t number of concurrent competing runs). Tests would then time out because they would have to wait too long. Fix that by using the new `replace_dirty` function introduced to the pool. This function frees up space by returning a dirty cluster and then immediately takes it away to be used for a new cluster. Thanks to this, we will only have at most as many concurrent runs as the pool size. For example with --repeat 8 and pool size 4, we would run 4 concurrent runs and start the 5th run only when one of the original 4 runs finishes, then the 6th run when a second run finishes and so on. The fix is preceded by a refactor that replaces `steal` with `put(is_dirty=True)` and a `destroy` function passed to the pool (now the pool is responsible for stopping the cluster and releasing its IPs). Fixes #11757 Closes #12549 * github.com:scylladb/scylladb: test/pylib: scylla_cluster: ensure there's space in the cluster pool when running a sequence of tests test/pylib: pool: introduce `replace_dirty` test/pylib: pool: replace `steal` with `put(is_dirty=True)`