scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-02 13:06:57 +00:00

Author	SHA1	Message	Date
Marcin Maliszkiewicz	66be0f4577	Merge 'test: cluster: audit test suite optimization' from Dario Mirovic Migrate audit tests from test/cluster/dtest to test/cluster. Optimize their execution time through cluster reuse. The audit test suite is heavy. There are more than 70 test executions. Environment preparation is a significant part of each test case execution time. This PR: 1. Copies audit tests from test/cluster/dtest to test/cluster, refactoring and enabling them 2. Groups tests functions by non-live cluster configuration variations to enable cluster reuse between them - Execution time reduced from 4m 29s to 2m 47s, which is ~38% execution time decrease 3. Removes the old audit tests from test/cluster/dtest Includes two supporting changes: - Allow specifying `AuthProvider` in `ManagerClient.get_cql_exclusive` - Fix server log file handling for clean clusters Refs [SCYLLADB-573](https://scylladb.atlassian.net/browse/SCYLLADB-573) This PR is an improvement and does not require a backport. [SCYLLADB-573]: https://scylladb.atlassian.net/browse/SCYLLADB-573?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ Closes scylladb/scylladb#28650 * github.com:scylladb/scylladb: test: cluster: fix log clear race condition in test_audit.py test: pylib: shut down exclusive cql connections in ManagerClient test: cluster: fix multinode audit entry comparison in test_audit.py test: cluster: dtest: remove old audit tests test: cluster: group migrated audit tests for cluster reuse test: cluster: enable migrated audit tests and make them work test: pylib: manager_client: specify AuthProvider in get_cql_exclusive test: pylib: scylla cluster after_test log fix test: audit: copy audit test from dtest	2026-03-24 09:29:52 +01:00
Piotr Smaron	32225797cd	dtest: fix flaky test_writes_schema_recreated_while_node_down `read_barrier(session2)` was supposed to ensure `node2` has caught up on schema before a CL=ALL write. But `patient_cql_connection(node2)` creates a cluster-aware driver session `(TokenAwarePolicy(DCAwareRoundRobinPolicy()))` that can route the barrier CQL statement to any node — not necessarily `node2`. If the barrier runs on `node1` or `node3` (which already have the new schema), it's a no-op, and `node2` remains stale, thus the observed `WriteFailure`. The fix is to switch to `patient_exclusive_cql_connection(node2)`, which uses `WhiteListRoundRobinPolicy([node2_ip])` to pin all CQL to `node2`. This is already the established pattern used by other tests in the same file. Fixes: SCYLLADB-1139 No need to backport yet, appeared only on master. Closes scylladb/scylladb#29151	2026-03-23 10:25:54 +02:00
Piotr Szymaniak	f511264831	alternator/test: fix test_ttl_with_load_and_decommission flaky Connection refused error The native Scylla nodetool reports ECONNREFUSED as 'Connection refused', not as 'ConnectException' (which is the Java nodetool format). Add 'Connection refused' to the valid_errors list so that transient connection failures during concurrent decommission/bootstrap topology changes are properly tolerated. Fixes SCYLLADB-1167 Closes scylladb/scylladb#29156	2026-03-22 11:01:45 +02:00
Avi Kivity	062751fcec	Merge 'db/config: enable ms sstable format by default' from Łukasz Paszkowski Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make the new format a new default for new clusters by naming ms in the default scylla.yaml. New functionality. No backport needed. This PR is basically Michał's one https://github.com/scylladb/scylladb/pull/26377, Jakub's https://github.com/scylladb/scylladb/pull/27332 fixing `sstables_manager::get_highest_supported_format()` and one test fix. Closes scylladb/scylladb#28960 * github.com:scylladb/scylladb: db/config: announce ms format as highest supported db/config: enable `ms` sstable format by default cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format api/system: add /system/chosen_sstable_version test/cluster/dtest: reduce num_tokens to 16	2026-03-19 18:19:01 +02:00
Dario Mirovic	249a6cec1b	test: cluster: dtest: remove old audit tests Since audit tests have been migrated to test/cluster/test_audit.py, old tests in test/cluster/dtest/audit_test.py have to be removed. Refs SCYLLADB-573	2026-03-19 16:12:13 +01:00
Andrei Chekun	c36df5ecf4	test.py: eliminite drivers exception There is a race condition in driver that raises the RuntimeException. This pollutes the output, so this PR is just silencing this exception. Fixes: SCYLLADB-900 Closes scylladb/scylladb#28957	2026-03-10 14:31:36 +02:00
Michał Chojnowski	949fc85217	db/config: enable `ms` sstable format by default Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make them the new default. If we change our mind, this change can be reverted later.	2026-03-09 17:12:09 +01:00
Michał Chojnowski	6b413e3959	cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format Trie-based indexes and older indexes have a difference in metrics, and the test uses the metrics to check for bypass cache. To choose the right metrics, it uses highest_supported_sstable_format, which is inappropriate, because the sstable format chosen for writes by Scylla might be different than highest_supported_sstable_format. Use chosen_sstable_format instead.	2026-03-09 17:12:09 +01:00
Michał Chojnowski	9280a039ee	test/cluster/dtest: reduce num_tokens to 16 cluster.dtest_alternator_tests.test_slow_query_logging performs a bootstrap with 768 token ranges. It works with `me` sstables, which have 2 open file descriptors per open sstable, but with `ms` sstables, which have 3 open file descriptors per open sstable, it fails with EMFILE. To avoid this problem, let's just decrease the number of vnodes for in the test suite. It's appropriate anyway, because it avoids some unneeded work without weakening the tests. (Note: pylib-based have been setting `num_tokens` to 16 for a long time too). This breaks `bypass_cache_test`, which is written in a way that expects a certain number of token ranges. We adjust the relevant parameter accordingly.	2026-03-09 17:12:09 +01:00
Artsiom Mishuta	7b30a3981b	test.py: enable strict_config,xfail_strict,strict-markers this commit enables 3 strict pytest options: strict_config - if any warnings encountered while parsing the pytest section of the configuration file will raise errors. xfail_strict - if markers not registered in the markers section of the configuration file will raise errors. strict-markers - if tests marked with @pytest.mark.xfail that actually succeed will by default fail the test suite and fix errors that occur after enabling these options Closes scylladb/scylladb#28859	2026-03-05 12:54:26 +02:00
Yauheni Khatsianevich	aa85f5a9c3	test: migrating alternator ttl tests to scylla repo migrating alternator_ttl_tests.py to scylla repo as part of deprecating dtest framework migrated tests: - test_ttl_with_load_and_decommission Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-869 Closes scylladb/scylladb#28858	2026-03-05 10:04:14 +02:00
Marcin Maliszkiewicz	30f18a91fd	Merge 'dtest: wait_for speedup' from Dario Mirovic Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second. This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds. Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes. This patch also improves performance of some alternator tests that use the same wait_for dtest function. `wait_for` in dtest framework has default time step reduced to make the environment more responsive and test execution faster. Refs SCYLLADB-573 This is a performance improvement of testing framework. No need to backport. Closes scylladb/scylladb#28590 * github.com:scylladb/scylladb: dtest: shorten default sleep step in wait_for dtest: wait_for speedup	2026-02-26 09:33:38 +01:00
Dario Mirovic	3222a1a559	dtest: shorten default sleep step in wait_for Default sleep step of 1s is too long. Reduce it to make the test environment more responsive and faster. Refs SCYLLADB-573	2026-02-25 03:17:47 +01:00
Dario Mirovic	51e7c2f8d9	dtest: wait_for speedup Audit tests have been slow. They rely on wait_for function. This function first sleeps for the duration of the time step specified, and then calls the given function. The audit tests need 0.02-0.03 seconds for the given function, but the operation lasts around 1.02-1.03 seconds, since step is 1 second. This patch modifies wait_for dtest function so it first executes the given function, and afterwards calls time.sleep(step). This reduces time needed for the given function from 1.03 to 0.03 seconds. Total audit tests suite speedup is 3x. On the developer machine the time is reduced from 13+ minutes to 4 minutes. This patch also improves performance of some alternator tests that use the same wait_for dtest function. Refs SCYLLADB-573	2026-02-25 03:17:46 +01:00
Andrei Chekun	4a7d8cd99d	test.py: add explicit default values to pytest options Add explicit default values to pytest command line options to prevent issues when running tests with pytest's parallel execution where options are not present on upper conftest, so they're just not set at all.	2026-02-24 09:48:38 +01:00
Andrzej Jackowski	eb5a564df2	test: move dtest/guardrails_test.py to test_guardrails.py This commit moves `guardrails_test.py`, prepared in the previous commit of this patch series, to `test/cluster/test_guardrails.py`. It also cleans up `suite.yaml`.	2026-02-20 11:39:52 +01:00
Andrzej Jackowski	9df426d2ae	test: prepare guardrails_test.py to be moved to test/cluster/ Disable `test/cluster/dtest/guardrails_test.py` in `suite.yaml` and make it compatible with the `test/cluster/` framework. This will allow moving this file from `test/cluster/dtest/` to `test/cluster/` in the next commit of this patch series. There are two motivations for moving the test: - Execution time reduction (from 12s to 9s in 'dev' in my env) - Facilitate adding new tests to the `guardrails_test.py` file	2026-02-20 11:39:43 +01:00
Piotr Smaron	0cf20fa15a	audit_test: fix incorrect config in `test_audit_type_none` Passing Python `None` to setup is incorrect, because config updates are sent as a dict and `None` is treated as "unset" - meaning: use Scylla's default. Using the explicit string "none" to guarantee that audit is disabled.	2026-02-18 15:12:26 +01:00
Marcin Maliszkiewicz	88c4ca3697	Merge 'test: migrate guardrails_test.py from scylla-dtest' from Andrzej Jackowski This patch series copies `guardrails_test.py` from scylla-dtest, fix it and enables it. The motivation is to unify the test execution of guardrails test, as some tests (`cqlpy/test_guardrail_...`) were already in scylladb repo, and some were in `scylla-dtest`. Fixes: SCYLLADB-255 No backport, just test migration Closes scylladb/scylladb#28454 * github.com:scylladb/scylladb: test: refactor test_all_rf_limits in guardrails_test.py test: specify exceptions being caught in guardrails_test.py test: enable guardrails_test.py test: add wait_other_notice to test_default_rf in guardrails_test.py test: copy guardrails_test.py from scylla-dtest	2026-02-02 16:54:13 +01:00
Andrzej Jackowski	298aca7da8	test: refactor test_all_rf_limits in guardrails_test.py Before this commit, `test_all_rf_limits` was implemented in a repetitive manner, making it harder to understand how the guardrails were tested. This commit refactors the test to reduce code redundancy and verify the guardrails more explicitly.	2026-02-02 10:49:12 +01:00
Andrzej Jackowski	136db260ca	test: specify exceptions being caught in guardrails_test.py Before this commit, the test caught a broad `Exception`. This change specifies the expected exceptions to avoid a situation where the product or test is broken and it goes undetected.	2026-02-02 10:48:07 +01:00
Gleb Natapov	08268eee3f	topology: disable force-gossip-topology-changes option The patch marks force-gossip-topology-changes as deprecated and removes tests that use it. There is one test (test_different_group0_ids) which is marked as xfail instead since it looks like gossiper mode was used there as a way to easily achieve a certain state, so more investigation is needed if the tests can be fixed to use raft mode instead. Closes scylladb/scylladb#28383	2026-02-02 09:56:32 +01:00
Andrzej Jackowski	576ad29ddb	test: add wait_other_notice to test_default_rf in guardrails_test.py This commit adds `wait_other_notice=True` to `cluster.populate` in `guardrails_test.py`. Without this, `test_default_rf` sometimes fails because `NetworkTopologyStrategy` setting fails before the node knows about all other DCs. Refs: SCYLLADB-255	2026-01-30 11:51:46 +01:00
Andrzej Jackowski	64c774c23a	test: copy guardrails_test.py from scylla-dtest This commit copies guardrails_test.py from dtest repository and (temporarily) disables it, as it requires improvement in following commits of this patch series before being enabled. Refs: SCYLLADB-255	2026-01-30 11:51:40 +01:00
Botond Dénes	21900c55eb	tools/scylla-sstable: remove --unsafe-accept-nonempty-output-dir This flag was added to operations which have an --output-dir command-line arguments. These operations write sstables and need a directory where to write them. Back in the numeric-generation world this posed a problem: if the directory contained any sstable, generation clash was almost guaranteed, because each scylla-sstable command invokation would start output generations from 1. To avoid this, empty output directory was a requirement, with the --unsafe-accept-nonempty-output-dir allowing for a force-override. Now in the timeuuid generation days, all this is not necessary anymore: generations are unique, so it is not a problem if the output directory already contains sstables: the probability of generation clash is almost 0. Even if it happens, the tool will just simply fail to write the new sstable with the clashing generation. Remove this historic relic of a flag and the related logic, it is just a pointless nuissance nowadays.	2026-01-22 13:55:59 +02:00
Botond Dénes	2e4d0e42f0	test/cluster/dtest: remove is_win() and users ScyllaDB and its tests never run on windows, this function is not needed, patch it out.	2026-01-19 12:56:57 +02:00
Botond Dénes	8953a143e5	test/cluster/dtest/scrub_test.py: add license blurb The original scrub test was done by the Cassandra project, hence there is two Licenses notices: one for the original work by Cassandra (2015) and one for our modifications on top (2021).	2026-01-19 12:55:59 +02:00
Botond Dénes	d2c266eb47	test/cluster/dtest: import scrub_test.py Import the test verbatim. Requires adding is_win() to ccmlib/common.py, with a dummy implementation.	2026-01-19 12:52:44 +02:00
Botond Dénes	99e8a92aef	test/cluster/dtest/ccmlib: scylla_node.py: adapt run_scylla_sstable() at al To work in the local test.py context.	2026-01-19 12:52:44 +02:00
Botond Dénes	807da53583	test/cluster/dtest/ccmlib: scylla_node.py: import run_scylla_sstable() And dependencies: get_sstables() and __gather_sstables(). Code is importend verbatim, but doesn't work yet (no users yet either). Will be patched to work in the next commit.	2026-01-19 12:52:44 +02:00
Botond Dénes	157fe2b80e	test/cluster/dtest/tools/misc.py: add type annotations to list_to_hashed_dict() To hopefully shut up CodeQL "Iterable can be either a string or a sequence". This change makes the code more readable anyway, so it is more than just a gratuitous change to make some code-scanner happy.	2026-01-13 08:33:17 +02:00
Evgeniy Naydanov	a9da14be19	test: dtest: reproducer for parallel rebuild failure 2-DC cluster parallel non-RBNO rebuild failure when expanding RF in DC2. Steps to reproduce: 1. Provision a cluster with 2 datacenters and at least 2 nodes in the second datacenter. 2. Let’s assume datacenter names are "dc1" and "dc2". 3. Create a keyspace ("keyspace1") with RF=0 in dc2. 4. Populate some data into dc1. 5. Change keyspace1 replication in dc2 to 2. 6. On 2 nodes in dc2 run the following command in parallel: nodetool rebuild --source-dc dc1 Parallel execution of rebuilds is not possible with RBNO enabled. This test is the repro for #27804 Closes scylladb/scylladb#27747	2026-01-08 21:55:18 +02:00
Botond Dénes	bf9640457e	Merge 'test: add crash detection during tests' from Cezar Moise After tests end, an extra check is performed, looking into node logs for crashes, aborts and similar issues. The test directory is also scanned for coredumps. If any of the above are found, the test will fail with an error. The following checks are made: - Any log line matching `Assertion.failed` or containing `AddressSanitizer` is marked as a critical error - Lines matching `Aborting on shard` will only be marked as a critical error if the paterns in `manager.ignore_cores_log_patterns` are not found in that log - If any critical error is found, the log is also scanned for backtraces - Any backtraces found are decoded and saved - If the test is marked with `@pytest.mark.check_nodes_for_errors`, the logs are checked for any `ERROR` lines - Any pattern in `manager.ignore_log_patterns` and `manager.ignore_cores_log_patterns` will cause above check to ignore that line - The `expected_error` value that many methods, like `manager.decommission_node`, have will be automatically appended to `manager.ignore_log_patterns` refs: https://github.com/scylladb/qa-tasks/issues/1804 --- [Examples](https://jenkins.scylladb.com/job/scylla-staging/job/cezar/job/byo_build_tests_dtest/46/testReport/): Following examples are run on a separate branch where changes have been made to enable these failures. `test_unfinished_writes_during_shutdown` - Errors are found in logs and are not ignored ``` failed on teardown with "Failed: Server 2096: found 1 error(s) (log: scylla-2096.log) ERROR 2025-12-15 14:20:06,563 [shard 0: gms] raft_topology - raft_topology_cmd barrier_and_drain failed with: std::runtime_error (raft topology: command::barrier_and_drain, the version has changed, version 11, current_version 12, the topology change coordinator had probably migrated to another node) Server 2101: found 4 error(s) (log: scylla-2101.log) ERROR 2025-12-15 14:20:04,674 [shard 0:strm] repair - repair[c434c0c0-68da-472c-ba3e-ed80960ce0d5]: Repair 1 out of 4 ranges, keyspace=system_distributed, table=view_build_status, range=(minimum token,maximum token), peers=[27c027a6-603d-49d0-8766-1b085d8c7d29, b549cb36-fae8-490b-a19e-86d42e7aa07a, f7049967-81ff-4296-9be7-9d6a4d33a29e], live_peers=[b549cb36-fae8-490b-a19e-86d42e7aa07a, f7049967-81ff-4296-9be7-9d6a4d33a29e], status=failed: mandatory neighbor=27c027a6-603d-49d0-8766-1b085d8c7d29 is not alive ERROR 2025-12-15 14:20:04,674 [shard 1:strm] repair - repair[c434c0c0-68da-472c-ba3e-ed80960ce0d5]: Repair 1 out of 4 ranges, keyspace=system_distributed, table=view_build_status, range=(minimum token,maximum token), peers=[27c027a6-603d-49d0-8766-1b085d8c7d29, b549cb36-fae8-490b-a19e-86d42e7aa07a, f7049967-81ff-4296-9be7-9d6a4d33a29e], live_peers=[b549cb36-fae8-490b-a19e-86d42e7aa07a, f7049967-81ff-4296-9be7-9d6a4d33a29e], status=failed: mandatory neighbor=27c027a6-603d-49d0-8766-1b085d8c7d29 is not alive ERROR 2025-12-15 14:20:04,675 [shard 0: gms] raft_topology - raft_topology_cmd stream_ranges failed with: std::runtime_error (["shard 0: std::runtime_error (repair[c434c0c0-68da-472c-ba3e-ed80960ce0d5]: 1 out of 4 ranges failed, keyspace=system_distributed, tables=[\"view_build_status\", \"cdc_generation_timestamps\", \"service_levels\", \"cdc_streams_descriptions_v2\"], repair_reason=bootstrap, nodes_down_during_repair={27c027a6-603d-49d0-8766-1b085d8c7d29}, aborted_by_user=false, failed_because=std::runtime_error (Repair mandatory neighbor=27c027a6-603d-49d0-8766-1b085d8c7d29 is not alive, keyspace=system_distributed, mandatory_neighbors=[27c027a6-603d-49d0-8766-1b085d8c7d29, b549cb36-fae8-490b-a19e-86d42e7aa07a, f7049967-81ff-4296-9be7-9d6a4d33a29e]))", "shard 1: std::runtime_error (repair[c434c0c0-68da-472c-ba3e-ed80960ce0d5]: 1 out of 4 ranges failed, keyspace=system_distributed, tables=[\"view_build_status\", \"cdc_generation_timestamps\", \"service_levels\", \"cdc_streams_descriptions_v2\"], repair_reason=bootstrap, nodes_down_during_repair={27c027a6-603d-49d0-8766-1b085d8c7d29}, aborted_by_user=false, failed_because=std::runtime_error (Repair mandatory neighbor=27c027a6-603d-49d0-8766-1b085d8c7d29 is not alive, keyspace=system_distributed, mandatory_neighbors=[27c027a6-603d-49d0-8766-1b085d8c7d29, b549cb36-fae8-490b-a19e-86d42e7aa07a, f7049967-81ff-4296-9be7-9d6a4d33a29e]))"]) ERROR 2025-12-15 14:20:06,812 [shard 0:main] init - Startup failed: std::runtime_error (Bootstrap failed. See earlier errors (Rolled back: Failed stream ranges: std::runtime_error (failed status returned from 9dd942aa-acec-4105-9719-9bda403e8e94))) Server 2094: found 1 error(s) (log: scylla-2094.log) ERROR 2025-12-15 14:20:04,675 [shard 0: gms] raft_topology - send_raft_topology_cmd(stream_ranges) failed with exception (node state is bootstrapping): std::runtime_error (failed status returned from 9dd942aa-acec-4105-9719-9bda403e8e94)" ``` `test_kill_coordinator_during_op` - aborts caused by injection - `ignore_cores_log_patterns` is not set - while there are errors in logs and `ignore_log_patterns` is not set, they are ignored automatically due to the `expected_error` parameter, such as in `await manager.decommission_node(server_id=other_nodes[-1].server_id, expected_error="Decommission failed. See earlier errors")` ``` failed on teardown with "Failed: Server 1105: found 1 critical error(s), 1 backtrace(s) (log: scylla-1105.log) Aborting on shard 0, in scheduling group gossip. 1 backtrace(s) saved in scylla-1105-backtraces.txt Server 1106: found 1 critical error(s), 1 backtrace(s) (log: scylla-1106.log) Aborting on shard 0, in scheduling group gossip. 1 backtrace(s) saved in scylla-1106-backtraces.txt Server 1113: found 1 critical error(s), 1 backtrace(s) (log: scylla-1113.log) Aborting on shard 0, in scheduling group gossip. 1 backtrace(s) saved in scylla-1113-backtraces.txt Server 1148: found 1 critical error(s), 1 backtrace(s) (log: scylla-1148.log) Aborting on shard 0, in scheduling group gossip. 1 backtrace(s) saved in scylla-1148-backtraces.txt" ``` Decoded backtrace can be found in [failed_test_logs](https://jenkins.scylladb.com/job/scylla-staging/job/cezar/job/byo_build_tests_dtest/46/artifact/testlog/x86_64/dev/failed_test/test_kill_coordinator_during_op.dev.1) Closes scylladb/scylladb#26177 github.com:scylladb/scylladb: test: add logging to crash_coordinator_before_stream injection test: add crash detection during tests test.py: add pid to ServerInfo	2025-12-23 07:27:58 +02:00
Dario Mirovic	f1d63d014c	test: dtest: schema_management_test.py: speed up `TestLargePartitionAlterSchema` tests The tests in `TestLargePartitionAlterSchema` are `test_large_partition_with_add_column` and `test_large_partition_with_drop_column`. These tests need to replicate the following conditions that led to a bug before a fix from around 5 years ago. The scenario in which the problem could have happened has to involve: - a large partition with many rows, large enough for preemption (every 0.5ms) to happen during the scan of the partition. - appending writes to the partition (not overwrites) - scans of the partition - schema alter of that table. The issue is exposed only by adding or dropping a column, such that the added/dropped column lands in the middle (in alphabetical order) of the old column set. The way the test is set up is: - fixed number of writes per populate call - fixed number of reads This has the following implications: - if the machine executing the test is fast, all the writes are done before the 10 seconds sleep - there are too many reads - most of them get executed after the test logic is done This patch solves these issues in the following way: - populate lazily generates write data, and stops when instructed by `stop_populating` event - read, which is done sequentially, stops when instructed by `stop_reading` event - number of max operations is increased significantly, but the operations are stopped 1 second after node flush; this makes sure there are enough operations during the test, but also that the test does not take unnecessary time Test execution time has been reduced severalfold. On dev machine the time the tests take is reduced from 110 seconds to 34 seconds. The patch also introduces a few small improvements: - `cs_run` renamed to `run_stress` for clarity - Stopped checking if cluster is `ScyllaCluster`, since it is the only one we use - `case_map` removed from `test_alter_table_in_parallel_to_read_and_write`, used `mixed` param directly - Added explanation comment on why we do `data[i].append(None)` - Replaced `alter_table` inner function with its body, for simplicity - Removed unnecessary `ck_rows` variable in `populate` - Removed unnecessary `isinstance(self.cluster. ScyllaCluster)` - Adjusted `ThreadPoolExecutor` size in several places where 5 workers are not needed - Replaced functional programming style expressions for `new_versions` and `columns_list` with comprehension/generator statement python style code, improving readability Refs #26932 fix	2025-12-18 17:07:27 +01:00
Cezar Moise	95d0782f89	test: add crash detection during tests After tests end, an extra check if performed, looking into node logs. By default, it only searches for critical errors and scans for coredumps. If the test has the fixture `check_nodes_for_errors`, it will search for all errors. Both checks can be ignored by setting `ignore_cores_log_patterns` and `ignore_log_patterns`. If any of the above are found, the test will fail with an error.	2025-12-18 16:28:13 +02:00
Dario Mirovic	f831ca5ab5	test: dtest: schema_management_test.py: fix large partition add column test `large_partition_with_add_column_test` and `large_partition_with_drop_column_test` were added on August 17th, 2020 in scylladb/scylla-dtest#1569. Only `large_partition_with_drop_column_test` was migrated to pytest, and renamed to `test_large_partition_with_drop_column` on March 31st, 2021 in scylladb/scylla-dtest#2051. Since then this test has not been running. This patch fixes it - the test is updated and renamed and the testing environment now properly picks it up. Refs #26932	2025-12-18 12:54:43 +01:00
Dario Mirovic	1fe0509a9b	test: dtest: schema_management_test.py: add `TestSchemaManagement.prepare` Extract repeated cluster initialization code in `TestSchemaManagement` into a separate `prepare` method. It holds all the common code for cluster preparation, with just the necessary parameters. Refs #26932	2025-12-18 12:54:43 +01:00
Dario Mirovic	e7d76fd8f3	test: dtest: schema_management_test.py: test enhancements Extract regex compilation from the stress functions to the module level, to avoid unnecessary regex compilation repetition. Add descriptions to the stress functions. Do not materialize list in `stress_object` for loop. Use a generator expression. Make `_set_stress_val` an object method. Refs #26932	2025-12-18 12:54:43 +01:00
Dario Mirovic	700853740d	test: dtest: schema_management_test.py: make the tests work Remove unused function markers. Add wait_other_notice=True to cluster start method in TestSchemaHistory.prepare function to make the test stable. Enable the test in suite.yaml for dev and debug modes. Fixes #26932	2025-12-18 12:54:43 +01:00
Dario Mirovic	3c5dd5e5ae	test: dtest: migrate setup and tools from dtest Migrate several functionalities from dtest. These will be used by the schema_management_test.py tests when they are enabled. Refs #26932	2025-12-18 12:54:43 +01:00
Dario Mirovic	5971b2ad97	test: dtest: copy unmodified schema_management_test.py Copy schema_management_test.py from scylla-dtest to test/cluster/dtest/schema_management_test.py. Add license header. Disable it for debug, dev, and release mode. Refs #26932	2025-12-18 12:54:42 +01:00
Botond Dénes	dace39fd6c	Merge 'Make commitlog replay handle files with corrupt file header (non-zero) as data loss, not startup failure' from Calle Wilund Fixes #26744 If a segment to replay is broken such that the main header is not zero, but still broken, we throw header_checksum_error. This was not handled in replayer, which grouped this into the "user error/fundamental problem" category. However, assuming we allow for "real" disk corruption, this should really be treated same as data corruption, i.e. reported data loss, not failure to start up. The `test_one_big_mutation_corrupted_on_startup` test accidentally sometimes provoked this issue, by doing random file wrecking, which on rare occasions provoked this, and thus failed test due to scylla not starting up, instead of losing data as expected. Closes scylladb/scylladb#27556 * github.com:scylladb/scylladb: test::cluster::dtest::tools::files: Remove file commitlog_replay: Handle fully corrupt files same as partial corruption. test::pylib::suite::base: Split options.name test specifier only once	2025-12-16 06:55:42 +02:00
Dario Mirovic	f545ed37bc	test: dtest: audit_test.py: fix audit error log detection `test_insert_failure_doesnt_report_success` test in `test/cluster/dtest/audit_test.py` has an insert statement that is expected to fail. Dtest environment uses `FlakyRetryPolicy`, which has `max_retries = 5`. 1 initial fail and 5 retry fails means we expect 6 error audit logs. The test failed because `create keyspace ks` failed once, then succeeded on retry. It allowed the test to proceed properly, but the last part of the test that expects exactly 6 failed queries actually had 7. The goal of this patch is to make sure there are exactly 6 = 1 + `max_retries` failed queries, counting only the query expected to fail. If other queries fail with successful retry, it's fine. If other queries fail without successful retry, the test will fail, as it should in such situations. They are not related to this expected failed insert statement. Fixes #27322 Closes scylladb/scylladb#27378	2025-12-11 10:17:07 +03:00
Calle Wilund	8c4ac457af	test::cluster::dtest::tools::files: Remove file This contained only one routine; `corrupt_file`, which is highly problematic, and not used. If you want to "corrupt" a file, it should be done controlled, not at random.	2025-12-10 15:37:04 +01:00
Calle Wilund	e48170ca8e	commitlog_replay: Handle fully corrupt files same as partial corruption. Fixes #26744 If a segment to replay is broken such that the main header is not zero, but still broken, we throw header_checksum_error. This was not handled in replayer, which grouped this into the "user error/fundamental problem" category. However, assuming we allow for "real" disk corruption, this should really be treated same as data corruption, i.e. reported data loss, not failure to start up. The `test_one_big_mutation_corrupted_on_startup` test accidentally sometimes provoked this issue, by doing random file wrecking, which on rare occasions provoked this, and thus failed test due to scylla not starting up, instead of loosing data as expected. Changed test to consistently cause this exact error instead.	2025-12-10 15:37:04 +01:00
Botond Dénes	357f91de52	Revert "Merge 'db/config: enable `ms` sstable format by default' from Michał Chojnowski" This reverts commit `b0643f8959`, reversing changes made to `e8b0f8faa9`. The change forgot to update sstables_manager::get_highest_supported_format(), which results in /system/highest_supported_sstable_version still returning me, confusing and breaking tests. Fixes: scylladb/scylla-dtest#6435 Closes scylladb/scylladb#27379	2025-12-02 14:38:56 +02:00
Michał Chojnowski	da51a30780	db/config: enable `ms` sstable format by default Trie-based sstable indexes are supposed to be (hopefully) a better default than the old BIG indexes. Make them the new default. If we change our mind, this change can be reverted later.	2025-11-21 12:39:46 +01:00
Michał Chojnowski	73090c0d27	cluster/dtest/bypass_cache_test: switch from highest_supported_sstable_format to chosen_sstable_format Trie-based indexes and older indexes have a difference in metrics, and the test uses the metrics to check for bypass cache. To choose the right metrics, it uses highest_supported_sstable_format, which is inappropriate, because the sstable format chosen for writes by Scylla might be different than highest_supported_sstable_format. Use chosen_sstable_format instead.	2025-11-21 12:39:46 +01:00
Michał Chojnowski	3f11a5ed8c	test/cluster/dtest: reduce num_tokens to 16 cluster.dtest_alternator_tests.test_slow_query_logging performs a bootstrap with 768 token ranges. It works with `me` sstables, which have 2 open file descriptors per open sstable, but with `ms` sstables, which have 3 open file descriptors per open sstable, it fails with EMFILE. To avoid this problem, let's just decrease the number of vnodes for in the test suite. It's appropriate anyway, because it avoids some unneeded work without weakening the tests. (Note: pylib-based have been setting `num_tokens` to 16 for a long time too). This breaks `bypass_cache_test`, which is written in a way that expects a certain number of token ranges. We adjust the relevant parameter accordingly.	2025-11-21 00:38:50 +01:00
Botond Dénes	d54d409a52	Merge 'audit: write out to both table and syslog' from Dario Mirovic This patch adds support for multiple audit log outputs. If only one audit log output is enabled, the behavior does not change. If multiple audit log outputs are enabled, then the `audit_composite_storage_helper` class is used. It has a collection of `storage_helper` objects. Performance testing shows that read query throughput and auth request throughput are consistent even at high reactor utilization. It can also be observed that read query latency increases a bit. Read query ops = 60k/s AUTH ops = 200/s \| Audit Mode \| QUERY latency (p99) \| Δ% vs none \| \|------------\|---------------------\|------------\| \| none \| 777 \| 0 \| \|table\| 801 \| +3.09% \| \|syslog \| 803 \| +3.35% \| \|table,syslog \| 818 \| +5.28% \| Read query ops = 50k/s AUTH ops = 200/s \| Audit Mode \| QUERY latency (p99) \| Δ% vs none \| \|------------\|---------------------\|------------\| \| none \| 643 \| 0 \| \|table\| 647 \| +0.62% \| \|syslog \| 648 \| +0.78% \| \|table,syslog \| 656 \| +2.02% \| Detailed performance results are in the following Confluence document: [Audit performance impact test](https://scylladb.atlassian.net/wiki/spaces/RND/pages/148308005/Audit+performance+impact+test) Fixes #26022 Backport: The decision is to not backport for now. After making sure it works on the latest release, and if there is a need, we can do it. Closes scylladb/scylladb#26613 * github.com:scylladb/scylladb: test: dtest: audit_test.py: add AuditBackendComposite test: dtest: audit_test.py: group logs in dict per audit mode audit: write out to both table and syslog audit: move storage helper creation from `audit::start` to `audit::audit` audit: fix formatting in `audit::start_audit` audit: unify `create_audit` and `start_audit`	2025-11-17 15:04:15 +02:00

1 2 3

123 Commits