There are two tests, test_read_all and test_read_with_partition_row_limits, which asserts on every page as well
as at the end that there are no misses whatsoever. This is incorrect, because it is possible that on a given page, not all shards participate and thus there won't be a saved reader on every shard. On the subsequent page, a shard without a reader may produce a miss. This is fine. Refine the asserts, to check that we have only as much misses, as many
shards we have without readers on them.
Fixes: https://github.com/scylladb/scylladb/issues/14087Closesscylladb/scylladb#15806
* github.com:scylladb/scylladb:
test/boost/multishard_mutation_query_test: fix querier cache misses expectations
test/lib/test_utils: add require_* variants for all comparators
After 146e49d0dd (Rewrap keyspace population loop) the datadir layout is no longer needed by sstables boot-time loader and finally directories can be omitted for S3-backed keyspaces. Tables of that keyspace don't touch/remove their datadirs either (snapshots still don't work for S3)
fixes: #13020Closesscylladb/scylladb#16007
* github.com:scylladb/scylladb:
test/object_store: Check that keyspace directory doesn't appear
sstables/storage: Do storage init/destroy based on storage options
replica/{ks|cf}: Move storage init/destroy to sstables manager
database: Add get_sstables_manager(bool_class is_system) method
This reverts commit 7c7baf71d5.
If `stop_gracefully` times out during test teardown phase, it crashes
the test framework reporting multiple errors, for example:
```
12:35:52 /jenkins/workspace/scylla-master/next/scylla/test/pylib/artifact_registry.py:41: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
12:35:52 self.exit_artifacts = {}
12:35:52 RuntimeWarning: Enable tracemalloc to get the object allocation traceback
12:35:52 Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:52 Traceback (most recent call last):
12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 500, in wait_for
12:35:52 return fut.result()
12:35:52 ^^^^^^^^^^^^
12:35:52 File "/usr/lib64/python3.11/asyncio/subprocess.py", line 137, in wait
12:35:52 return await self._transport._wait()
12:35:52 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
12:35:52 File "/usr/lib64/python3.11/asyncio/base_subprocess.py", line 230, in _wait
12:35:52 return await waiter
12:35:52 ^^^^^^^^^^^^
12:35:52 asyncio.exceptions.CancelledError
12:35:52
12:35:52 The above exception was the direct cause of the following exception:
12:35:52
12:35:52 Traceback (most recent call last):
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 521, in stop_gracefully
12:35:52 await asyncio.wait_for(wait_task, timeout=STOP_TIMEOUT_SECONDS)
12:35:52 File "/usr/lib64/python3.11/asyncio/tasks.py", line 502, in wait_for
12:35:52 raise exceptions.TimeoutError() from exc
12:35:52 TimeoutError
12:35:52
12:35:52 During handling of the above exception, another exception occurred:
12:35:52
12:35:52 Traceback (most recent call last):
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1615, in workaround_python26789
12:35:52 code = await main()
12:35:52 ^^^^^^^^^^^^
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1582, in main
12:35:52 await run_all_tests(signaled, options)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1359, in run_all_tests
12:35:52 await reap(done, pending, signaled)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 1342, in reap
12:35:52 result = coro.result()
12:35:52 ^^^^^^^^^^^^^
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 201, in run
12:35:52 await test.run(options)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 957, in run
12:35:52 async with get_cluster_manager(self.mode + '/' + self.uname, self.suite.clusters, test_path) as manager:
12:35:52 File "/usr/lib64/python3.11/contextlib.py", line 211, in __aexit__
12:35:52 await anext(self.gen)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1330, in get_cluster_manager
12:35:52 await manager.stop()
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 1024, in stop
12:35:52 await self.clusters.put(self.cluster, is_dirty=True)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/pool.py", line 104, in put
12:35:52 await self.destroy(obj)
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/./test.py", line 368, in recycle_cluster
12:35:52 await cluster.stop_gracefully()
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 689, in stop_gracefully
12:35:52 await asyncio.gather(*(server.stop_gracefully() for server in self.running.values()))
12:35:52 File "/jenkins/workspace/scylla-master/next/scylla/test/pylib/scylla_cluster.py", line 527, in stop_gracefully
12:35:52 raise RuntimeError(
12:35:52 RuntimeError: Stopping server ScyllaServer(470, 127.126.216.43, a577f4c7-d5e6-4bdb-8d37-727c11e2a8df) gracefully took longer than 60s
12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.uninstall' was never awaited
12:35:58 sys:1: RuntimeWarning: coroutine 'PythonTestSuite.get_cluster_factory.<locals>.create_cluster.<locals>.stop' was never awaited
```
This reverts commit 2860d43309, reversing
changes made to a3621dbd3e.
Reverting because rest_api.test_compaction_task started failing after
this was merged.
Fixes: #16005
When creating a S3-backed keyspace its storage dir shouldn't be made.
Also it shouldn't be "resurrected" by boot-time loader of existing
keyspaces.
For extra confidence check that the system keyspace's directory does
exists where the test expects keyspaces' directories to appear.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add a space after each colon and comma (if they don't have any after) in values of table option which are json objects (`caching`, `tombstone_gc` and `cdc`).
This improves readability and matches client-side describe format.
Fixes: #14895Closesscylladb/scylladb#15900
* github.com:scylladb/scylladb:
cql-pytest:test_describe: add test for whitespaces in json objects
schema: add whitespace to description of table options
This PR implements the following new nodetool commands:
* snapshot
* drain
* flush
* disableautocompaction
* enableautocompaction
All commands come with tests and all tests pass with both the new and the current nodetool implementations.
Refs: https://github.com/scylladb/scylladb/issues/15588Closesscylladb/scylladb#15939
* github.com:scylladb/scylladb:
test/nodetool: add README.md
tools/scylla-nodetool: implement enableautocompaction command
tools/scylla-nodetool: implement disableautocompaction command
tools/scylla-nodetool: implement the flush command
tools/scylla-nodetool: extract keyspace/table parsing
tools/scylla-nodetool: implement the drain command
tools/scylla-nodetool: implement the snapshot command
test/nodetool: add support for matching aproximate query parameters
utils/http: make dns_connection_factory::initialize() static
This is a translation of Cassandra's CQL unit test source file
validation/operations/CreateTest.java into our cql-pytest framework.
The 15 tests did not reproduce any previously-unknown bug, but did provide
additional reproducers for several known issues:
Refs #6442: Always print all schema parameters (including default values)
Refs #8001: Documented unit "µs" not supported for assigning a duration"
type.
Refs #8892: Add an option for default RF for new keyspaces.
Refs #8948: Cassandra 3.11.10 uses "class" instead of "sstable_compression"
for compression settings by default
Unfortunately, I also had to comment out - and not translate - several
tests which weren't real "CQL tests" (tests that use only the CQL driver),
and instead relied on Cassandra's Java implementation details:
1. Tests for CREATE TRIGGER were commented out because testing them
in Cassandra requires adding a Java class for the test. We're also
not likely to ever add this feature to Scylla (Refs #2205).
2. Similarly, tests for CEP-11 (Pluggable memtable implementations)
used internal Java APIs instead of CQL, and it also unlikely
we'll ever implement it in a way compatible with Cassandra because
of its Java reliance.
3. One test for data center names used internal Cassandra Java APIs, not
CQL to create mock data centers and snitches.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#15791
Compaction tasks which do not have a parent are abortable
through task manager. Their children are aborted recursively.
Compaction tasks of the lowest level are aborted using existing
compaction task executors stopping mechanism.
Closesscylladb/scylladb#15083
* github.com:scylladb/scylladb:
test: test abort of compaction task that isn't started yet
test: test running compaction task abort
tasks: fail if a task was aborted
compaction: abort task manager compaction tasks
When running on a particularly slow setup, for example on
an ARM machine in debug mode, the execution time of even
a small Lua UDF that we're using in tests may exceed our
default limits.
To avoid timeout errors, the limit in tests is now increased
to a value that won't be exceeded in any reasonable scenario
(for the current set of tested UDFs), while not making the
test take an excessive amount of time in case of an error in
the UDF execution.
Fixes#15977Closesscylladb/scylladb#15983
Match paramateres within some delta of the expected value. Useful when
nodetool generates a timestamp, whose exact value cannot be predicted in
an exact manner.
These APIs may return stale or simply incorrect data on shards
other than 0. Newer versions of Scylla are better at maintaining
cross-shard consistency, but we need a simple fix that can be easily and
without risk be backported to older versions; this is the fix.
Add a simple test to check that the `failure_detector/endpoints`
API returns nonzero generation.
Fixes: scylladb/scylladb#15816Closesscylladb/scylladb#15970
* github.com:scylladb/scylladb:
test: rest_api: test that generation is nonzero in `failure_detector/endpoints`
api: failure_detector: fix indentation
api: failure_detector: invoke on shard 0
The sstable currently can move between normal, staging and quarantine state runtime. For S3-backed sstables the state change means maintaining the state itself in the ownership table and updating it accordingly.
There's also the upload facility that's implemented as state change too, but this PR doesn't support this part.
fixes: #13017Closesscylladb/scylladb#15829
* github.com:scylladb/scylladb:
test: Make test_sstables_excluding_staging_correctness run over s3 too
sstables,s3: Support state change (without generation change)
system_keyspace: Add state field to system.sstables
sstable_directory: Tune up sstables entries processing comment
system_keyspace: Tune up status change trace message
sstables: Add state string to state enum class convert
The helper makes sstable, writes mutations into it and loads one. Internally it uses the make_memtable() helper that prepares a memtable out of a vector of mutations. There are many test cases that don't use these facilities generating some code duplication.
The make_sstable() wrapper around make_sstable_easy() is removed along the way.
Closesscylladb/scylladb#15930
* github.com:scylladb/scylladb:
tests: Use make_sstable_easy() where appropriate
sstable_conforms_to_mutation_source_test: Open-code the make_sstable() helper
sstable_mutation_test: Use make_sstable_easy() instead of make_sstable()
tests: Make use of make_memtable() helper
tests: Drop as_mutation_source helper
test/sstable_utils: Hide assertion-related manipulations into branch
This is a test for #14277. We do want to match Cassandra's behavior,
which means that a user who is granted ALTER ALL is able to change
the password of a superuser.
Closesscylladb/scylladb#15961
before this change, the tempdir is always nuked no matter if the
test succceds. but sometimes, it would be important to check
scylla's sstables after the test finishes.
so, in this change, an option named `--keep-tmp` is added so
we can optionally preserve the temp directory. this option is off
by default.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15949
this series applies fixes to make the test more PEP8 compliant. the goal is to improve the readability and maintainability.
Closesscylladb/scylladb#15946
* github.com:scylladb/scylladb:
test/object_store: wrap line which is too long
test/object_store: use pattern matching to capture variable in loop
test/object_store: remove space after and before '{' and '}'
test/object_store: add an empty line before nested function definition
test/object_store: use two empty lines in-between global functions
This series refactors the `dht/i_paritioner.hh` header file
and cleans up its usage so to reduce the dependencies on it,
since it is carries a lot of baggage that is rarely required in other header files.
Closesscylladb/scylladb#15954
* github.com:scylladb/scylladb:
everywhere: reduce dependencies on i_partitioner.hh
locator: resolve the dependency of token_metadata.hh on token_range_splitter.hh
cdc: cdc_partitioner: remove extraneous partition_key_view fwd declaration
dht: reduce dependency on i_partitioner.hh
dht: fold compatible_ring_position in ring_position.hh
dht: refactor i_partitioner.hh
dht: move token_comperator to token.{cc,hh}
dht/i_partitioner: include i_partitioner_fwd.hh
This mini series purpose is to move all tests (that uses the infrastructure to create a Scylla cluster) to shut down gracefully
on shutdown.
One benefit is that the shutdown sequence for cluster will be tested better, however it is not the main purpose of this change. The main purpose of this change is to pave the way for coverage reporting on all tests and not only the ones that
has a standalone executables.
Full test runs are only slightly impacted by this change (~2.4% increase in runtime):
Without gracefull shutdown
```
time ./test.py --mode dev
Found 2966 tests.
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[2966/2966] topology_experimental_raft dev [ PASS ] topology_experimental_raft.test_raft_cluster_features.1
------------------------------------------------------------------------------
CPU utilization: 13.1%
real 4m50.587s
user 13m58.358s
sys 6m55.975s
```
With gracefull shutdown
```
time ./test.py --mode dev
Found 2966 tests.
================================================================================
[N/TOTAL] SUITE MODE RESULT TEST
------------------------------------------------------------------------------
[2966/2966] topology_experimental_raft dev [ PASS ] topology_experimental_raft.test_raft_cluster_features.1
------------------------------------------------------------------------------
CPU utilization: 12.6%
real 4m57.637s
user 13m56.864s
sys 6m46.657s
```
Closesscylladb/scylladb#15851
* github.com:scylladb/scylladb:
test.py: move to a gracefull temination of nodes on teardown
test.py: Use stop lock also in the graceful version
instead of referencing the elements in tuple with their indexes, use
pattern matching to capture them. for better readability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Values of `caching`, `tombstone_gc` and `cdc` are json object but they
were printed without any whitespaces. This commit adds them after
colons(:) and commas(,), so the values are more readable and it matches
format of old client-side describe.
After starting the associated node, ScyllaServer waits until the node
starts serving CQL requests. It does that by periodically trying to
establish a python driver session to the node.
During session establishment, the driver tries to fetch some metadata
from the system tables, and uses a pretty short timeout to do so (by
default it's 2 seconds). When running tests in debug mode, this timeout
can prove to be too short and may prevent the testing framework from
noticing that the node came up.
Fix the problem by increasing the timeout. Currently, after the session
is established, a query is sent in order to further verify that the
session works and it uses a very generous timeout of 1000 seconds to do
so - use the same timeout for internal queries in the python driver.
Fixes: scylladb/scylladb#15898Closesscylladb/scylladb#15929
There are two test cases out there that make sstable, write it and the
load, but the make_sstable_easy() is for that, so use it there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
helper
This test case is pretty special in the sense that it uses custom path
for tempdir to create, write and load sstable to/from. It's better to
open-code the make_sstable() helper into the test case rather than
encourage callers to use custom tempdirs. "Good" test cases can use
make_sstable_easy() for the same purposes (in fact they alredy do).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There's one in the utils that creates lw_shared_ptr<memtable> and
applies provided vector of mutations into it. Lots of other test cases
do literally the same by hand.
The make_memtable() assumes that the caller is sitting in the seastar
thread, and all the test cases that can benfit from it already are.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It does nothing by calls the sstable method of the same name. Callers
can do it on their own, the method is public.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The make_sstable_containing() can validate the applied mutations are
produced by the resulting sstable if the callers asks for it. To do so
the mutations are merged prior to checking and this merging should only
happen if validation is requested, otherwise it just makes no sense.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
When running some pytest-based tests they start scylla binary by hand
instead of relying on test.py's "clusters". In automatic run (e.g. via
test.py itself) the correct scylla binary is the one pointed to by
SCYLLA environment, but when run from shell via pytest directly it tries
to be smart and looks at build/*/scylla binaries picking the one with
the greatest mtime.
That guess is not very nice, because if the developer switches between
build modes with configure.py and rebuilds binaries, binaries from
"older" or "previous" builds stay on the way and confuse the guessing
code. It's better to be explicit.
refs: #15679
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#15684
This patch series adds error handling for streaming failure during
topology operations instead of an infinite retry. If streaming fails the
operation is rolled back: bootstrap/replace nodes move to left and
decommissioned/remove nodes move back to normal state.
* 'gleb/streaming-failure-rollback-v4' of github.com:scylladb/scylla-dev:
raft: make sure that all operation forwarded to a leader are completed before destroying raft server
storage_service: raft topology: remove code duplication from global_tablet_token_metadata_barrier
tests: add tests for streaming failure in bootstrap/replace/remove/decomission
test/pylib: do not stop node if decommission failed with an expected error
storage_service: raft topology: fix typo in "decommission" everywhere
storage_service: raft topology: add streaming error injection
storage_service: raft topology: do not increase topology version during CDC repair
storage_service: raft topology: rollback topology operation on streaming failure.
storage_service: raft topology: load request parameters in left_token_ring state as well
storage_service: raft topology: do not report term_changed_error during global_token_metadata_barrier as an error
storage_service: raft topology: change global_token_metadata_barrier error handling to try/catch
storage_service: raft topology: make global_token_metadata_barrier node independent
storage_service: raft topology: split get_excluded_nodes from exec_global_command
storage_service: raft topology: drop unused include_local and do_retake parameters from exec_global_command which are always true
storage_service: raft topology: simplify streaming RPC failure handling
Now this wrapper is unused, all (both) test cases that needed it were
patched to use make_table_for_tests().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The max_ongoing_compaction_test test case constructs table object by
hand. For that it needs tracker, compaction manager and stats. Similarly
to previous patch, the test_env::make_table_for_tests() helper does
exactly that, so the test case can be simplified as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The compacted_sstable_reader() helper constructs table object and all
its "dependencies" by hand. The test_env::make_table_for_tests() helper
does the same, so the test code can be simplified.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>