scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-01 05:35:48 +00:00

Author	SHA1	Message	Date
Botond Dénes	efd99bb0af	Merge 'Return tablet ranges from range_to_endpoint_map API' from Pavel Emelyanov The handler in question when called for tablets-enabled keyspace, returns ranges that are inconsistent with those from system.tablets. Like this: system.tablets: ``` TabletReplicas(last_token=-4611686018427387905, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 0)]) TabletReplicas(last_token=-1, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)]) TabletReplicas(last_token=4611686018427387903, replicas=[('22c84cba-d8d0-4d20-8d46-eb90865bb612', 1), ('67c82fc2-8ef9-4dd9-8cf6-c7f9372ce207', 1)]) TabletReplicas(last_token=9223372036854775807, replicas=[('e43ce450-2834-4137-92b7-379bb37684d1', 1), ('22c84cba-d8d0-4d20-8d46-eb90865bb612', 0)]) ``` range_to_endpoint_map: ``` {'key': ['-9069053676502949657', '-8925522303269734226'], 'value': ['127.110.40.2', '127.110.40.3']} {'key': ['-8925522303269734226', '-8868737574445419305'], 'value': ['127.110.40.2', '127.110.40.3']} ... {'key': ['-337928553869203886', '-288500562444694340'], 'value': ['127.110.40.1', '127.110.40.3']} {'key': ['-288500562444694340', '105026475358661740'], 'value': ['127.110.40.1', '127.110.40.3']} {'key': ['105026475358661740', '611365860935890281'], 'value': ['127.110.40.1', '127.110.40.3']} ... {'key': ['8307064440200319556', '9117218379311179096'], 'value': ['127.110.40.2', '127.110.40.1']} {'key': ['9117218379311179096', '9125431458286674075'], 'value': ['127.110.40.2', '127.110.40.1']} ``` Not only the number of ranges differs, but also separating tokens do not match (e.g. tokens -2 and 0 belong to different tablets according to system.tablets, but fall into the same "range" in the API result). The source of confusion is that despite storage_service::get_range_to_address_map() is given correct e.r.m. pointer from the table, it still uses token_metadata::sorted_token() to work with. The fix is -- when the e.r.m. is per-table, the tokens should be get from token_metadata's tablet_map (e.g. compare this to storage_service::effective_ownership() -- it grabs tokens differently for vnodes/tables cases). This PR fixes the mentioned problem and adds validation test. The test also checks /storage_service/describe_ring endpoint that happens to return correct set of values. The API is very ancient, so the bug is present in all versions with tablets Fixes #26331 Closes scylladb/scylladb#26231 * github.com:scylladb/scylladb: test: Add validation of data returned by /storage_service endpoints test,lib: Add range_to_endpoint_map() method to rest client api: Indentation fix after previous patches storage_service: Get tablet tokens if e.r.m. is per-table storage_service,api: Get e.r.m. inside get_range_to_address_map() storage_service: Calculate tokens on stack	2025-09-30 11:20:35 +03:00
Botond Dénes	7a773da425	Merge 'Speed up test cluster/test_alternator::test_localnodes_joining_nodes' from Nadav Har'El Before this patch, the test `cluster/test_alternator::test_localnodes_joining_nodes` was one of the slowest tests in the test/cluster framework, taking over two minutes to run. As comments in the test already acknowledged, there was no good reason why this test had to be so slow. The test needed to, intentionally, boot a server which took a long time (2 minutes) to fail its boot. But it didn't really need to wait for this failure - the right thing to do was to just kill the server at the end of the test. But we just didn't have the test-framework API to do it. So in this series, the first patch introduces the missing API, and the second patch uses it to fix test_localnodes_joining_nodes to kill the (unsuccessfully) booting server. After this patch, the test takes just 7 seconds to run. This is a test speedup only, so no real need to backport it - old release anyway get fewer test runs and the latency of these runs is less important. Closes scylladb/scylladb#25312 * github.com:scylladb/scylladb: test/cluster: greatly speed up test_localnodes_joining_nodes test/pylib: add the ability to stop currently-starting servers	2025-09-29 14:34:34 +03:00
Artsiom Mishuta	eedd61f43f	test.py: remove 'sudo' from resource_gather.py The container now runs as root (`4c1f4c419c`), so sudo it's not needed anymore Closes scylladb/scylladb#26294	2025-09-28 16:51:19 +03:00
Artsiom Mishuta	f23d19e248	test.py: fix dumping big logs to output 1. Remove dumping cluster logs and print only the link to the log. 2. Fail the test (to fail CI and not ignore the problem) and mark the cluster as dirty (to avoid affecting subsequent tests) in case setup/teardown fails. 3. Add 2 cqlpy tests that fail after applying step 2 to the dirties_cluster list so the cluster is discarded afterward. Closes scylladb/scylladb#26183	2025-09-25 22:36:46 +03:00
Nadav Har'El	aa8d6e9e74	test/pylib: add the ability to stop currently-starting servers Some tests need the ability to abruptly stop a server in the test cluster before it fully booted - e.g., because the test knows (and perhaps even expects) that the boot is hung. But before this patch, manager.server_stop() could only kill servers in "running" state. This patch adds to pylib tracking of "starting" servers - servers which we are starting but haven't finished booting - their list can be returned by the manager.starting_servers(). The manage.server_stop function can now kill a server which is just starting - not just "running" servers. To avoid breaking existing tests, manager.all_servers() continues to return just running and stopped servers - not "starting" servers. By the way, when a starting server is killed, it is not listed as stopped - it just behaves as a normal failure to add the server, and not as a server which successfully joined the cluster but was later stopped. Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-09-25 14:00:16 +03:00
Pavel Emelyanov	b85673e9b0	test,lib: Add range_to_endpoint_map() method to rest client Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2025-09-24 15:44:57 +03:00
Botond Dénes	5891aeb1fb	test/pylib/s3_server_mock.py: better handle empty query params Instead of re-inventing empty param handling, use the built-in keep_blank_values=True param of the urllib.parse.parse_qs(). Handles correctly the case where the `=` is also present but no value follows, this is the sytnax used by the new query_params in seastar::http::request. Also add an exception to build_POST_response(). Better than a cryptic message about encode() not callable on NoneType.	2025-09-24 11:52:15 +03:00
Aleksandra Martyniuk	48bbe09c8b	test: fix test_two_tablets_concurrent_repair_and_migration_repair_writer_level test_two_tablets_concurrent_repair_and_migration_repair_writer_level waits for the first node that logs info about repair_writer using asyncio.wait. The done group is never awaited, so we never learn about the error. The test itself is incorrect and the log about repair_writer is never printed. We never learn about that and tests finishes successfully after 10 minutes timeout. Fix the test: - disable hinted handoff; - repair tablets of the whole table: - new table is added so that concurrent migration is possible; - use wait_for_first_completed that awaits done group; - do some cleanups. Remove nightly mark. Fixes: #26148. Closes scylladb/scylladb#26209	2025-09-24 06:40:45 +03:00
Avi Kivity	1258e7c165	Revert "Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski" This reverts commit `fe7e63f109`, reversing changes made to `b5f3f2f4c5`. It is causing test.py failures around cqlpy. Fixes #26163 Closes scylladb/scylladb#26174	2025-09-22 09:32:46 +03:00
Andrzej Jackowski	4af270a271	test: add reload_raft_topology_state() to ScyllaRESTAPIClient To encapsulate `/storage_service/raft_topology/reload` API call	2025-09-18 09:28:32 +02:00
Petr Gusev	49b036cf2b	pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py We want to reuse them to test upgade for LWT fencing	2025-09-15 12:34:45 +02:00
Petr Gusev	92b165b8c0	pylib/rest_client.py: encode injection name Sometimes it's convenient to use slashes in injection names, for example my_component/my_method/my_condition. Without quote() we get 'handler not found' error from Scylla.	2025-09-15 11:24:53 +02:00
Botond Dénes	a89d0a747b	Merge 'test.py: add different levels of verbosity for output' from Andrei Chekun Add another level of verbosity: quiet. Before this it was used as a default one, but it provides not enough information. These changes should be coupled with pytest-sugar plugin to have an intended information for each level. Invoke the pytest as a module, instead of a separate process, to get access to the terminal to be able to it interactively. Framework change only, so backporting in to 2025.3 Fixes: #25403 Closes scylladb/scylladb#25698 * github.com:scylladb/scylladb: test.py: add additional level of verbosity for output test.py: start pytest as a module instead of subprocess	2025-09-09 11:49:51 +03:00
Asias He	cb7db47ae1	repair: Add incremental_mode option for tablet repair This patch introduces a new `incremental_mode` parameter to the tablet repair REST API, providing more fine-grained control over the incremental repair process. Previously, incremental repair was on and could not be turned off. This change allows users to select from three distinct modes: - `regular`: This is the default mode. It performs a standard incremental repair, processing only unrepaired sstables and skipping those that are already repaired. The repair state (`repaired_at`, `sstables_repaired_at`) is updated. - `full`: This mode forces the repair to process all sstables, including those that have been previously repaired. This is useful when a full data validation is needed without disabling the incremental repair feature. The repair state is updated. - `disabled`: This mode completely disables the incremental repair logic for the current repair operation. It behaves like a classic (pre-incremental) repair, and it does not update any incremental repair state (`repaired_at` in sstables or `sstables_repaired_at` in the system.tablets table). The implementation includes: - Adding the `incremental_mode` parameter to the `/storage_service/repair/tablet` API endpoint. - Updating the internal repair logic to handle the different modes. - Adding a new test case to verify the behavior of each mode. - Updating the API documentation and developer documentation. Fixes #25605 Closes scylladb/scylladb#25693	2025-09-09 06:50:21 +03:00
Andrei Chekun	7e34d5aa28	test.py: start pytest as a module instead of subprocess Invoke the pytest as a module, instead of a separate process, to get access to the terminal to be able to it interactively.	2025-09-05 11:54:49 +02:00
Avi Kivity	dfc7957a73	Merge 'test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints' from Benny Halevy Following up on `6129411a5e` improve test_vnode_keyspace_describe_ring be verifying that the endpoints listed by describe_ring match those returned by the `natural_endpoints` api (for random tokens). The latter are calculated using an independent code path directly from the effective_replication_map. * test exists currently only on master, no backport required Closes scylladb/scylladb#25610 * github.com:scylladb/scylladb: test/cluster/test_repair: test_vnode_keyspace_describe_ring: verify that describe_ring results agree with natural_endpoints test/pylib/rest_client: add natural_endpoints function	2025-08-31 20:36:15 +03:00
Łukasz Paszkowski	e34deea50e	tests/cluster: Add new storage tests The storage submodule contains tests that require mounted volumes to be executed. The volumes are created automatically with the `volumes_factory` fixture. The tests in this suite are executed with the custom launcher `unshare -mr pytest` Test scenarios (when one node reaches critical disk utilization): 1. Reject user table writes 2. Disable/Enabled compaction 3. Reject split compactions 4. New split compactions not triggered 5. Abort tablet repair 6. Disable/Enabled incoming tablet migrations 7. Restart a node while a tablet split is triggered	2025-08-29 14:56:13 +02:00
Łukasz Paszkowski	4bb5696a5d	test/scylla_cluster: Override workdir when passed via cmdline Currently, workdir is set in ScyllaCluster constructor and it does not take into accout that the value could be overridden via cmdline arguments. When this happens, then some data (logs, configs) are stored under one path and other (data) is stored under a different. The patch allows overriding the value when passed via cmdline arguments leading to all files being stored under the same path.	2025-08-29 14:56:13 +02:00
Botond Dénes	3dcb596201	Merge 'test: properly unset recovery_leader in the recovery procedure tests' from Patryk Jędrzejczak After changing the type of the `recovery_leader` config option from `sstring` to `UUID` in #25032, setting `recovery_leader` to an empty string became an incorrect way to unset it. The following error started to appear in the recovery procedure tests: ``` init - marshaling error: UUID string size mismatch: '' : recovery_leader ``` We unset `recovery_leader` properly in this PR. To do it, we introduce a simple way to remove config options in tests. Backport is unneeded. This error was harmless, and Scylla ignored `recovery_leader` after logging the error as expected by the tests. Closes scylladb/scylladb#25365 * github.com:scylladb/scylladb: test: properly unset recovery_leader in the recovery procedure tests test: manager_client: allow removing a config option test: manager_client: add docstring to server_update_config	2025-08-22 10:09:37 +03:00
Evgeniy Naydanov	3a98331731	test.py: don't fail if use multiple tests from one dir in commandline There is the stash item REPEATED_FILES for directory items which used to cut recursion. But if multiple tests from one directory added to ./test.py commandline this solution prevents handling non-first tests well because it was already collected for the first one. Change behavior to not store all repeated files in the stash but just files which are in the process of repetition. Rename the stash item to REPEATING_FILES to reflect this change. Closes scylladb/scylladb#25611	2025-08-21 19:43:13 +03:00
Benny Halevy	e34980ac87	test/pylib/rest_client: add natural_endpoints function Invoke the `/storage_service/natural_endpoints/{keyspace}` api Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-08-21 11:48:17 +03:00
Pavel Emelyanov	47750496d2	Merge 'test.py: metrics: add host_id suffix to .db file' from Evgeniy Naydanov CI can run several test.py sessions on different machines (builders) for one build and, and to be not overwritten, .db file with metrics need to have some unique name: add host_id as we already do for .xml report in `run_pytest()` Also add host_id columns to metric tables in case we will somehow aggregate .db files. Add host_id suffix to `toxiproxy_server.log` for the same reason. Fixes: https://github.com/scylladb/scylladb/issues/25462 Closes scylladb/scylladb#25542 * github.com:scylladb/scylladb: test.py: add host_id suffix to toxiproxy_server.log test.py: metrics: add host_id suffix to .db file	2025-08-21 11:34:47 +03:00
Evgeniy Naydanov	47e4d470af	test.py: add host_id suffix to toxiproxy_server.log	2025-08-19 11:33:47 +00:00
Evgeniy Naydanov	8ea49092b7	test.py: metrics: add host_id suffix to .db file CI can run several test.py sessions on different machines (builders) for one build and, and to be not overwritten, .db file with metrics need to have some unique name: add host_id as we already do for .xml report in run_pytest() Also add host_id columns to metric tables in case we will somehow aggregate .db files.	2025-08-19 11:33:11 +00:00
Avi Kivity	611918056a	Merge 'repair: Add tablet incremental repair support' from Asias He The central idea of incremental repair is to allow repair participants to select and repair only a portion of the dataset to speed up the repair process. All repair participants must utilize an identical selection method to repair and synchronize the same selected dataset. There are two primary selection methods: time-based and file-based. The time-based method selects data within a specified time frame. It is versatile but it is less efficient because it requires reading all of the dataset and omitting data beyond the time frame. The file-based method selects data from unrepaired SSTables and is more efficient because it allows the entire SSTable to be omitted. This document patch implements the file-based selection method. Incremental repair will only be supported for tablet tables; it will not be supported for vnode tables. On one hand, the legacy vnode is less important to support. On the other hand, the incremental repair for vnode is much harder to implement. With vnodes, a SSTalbe could contain data for multiple vnode ranges. When a given vnode range is repaired, only a portion of the SSTable is repaired. This complicates the manipulation of SSTables significantly during both repair and compaction. With tablets, an entire tablet is repaired so that a sstable is either fully repaired or not repaired which is a huge simplification. This patch uses the repaired_at from sstables::statistics component to mark a sstable as repaired. It uses a virtual clock as the repair timestamp, i.e., using a monotonically increasing number for the repaired_at field of a SSTable and sstables_repaired_at column in system.tablets table. Notice that when a sstable is not repaired, the repaired_at field will be set to the default value 0 by default. The being_repaired in memory field of a SSTable is used to explicitly mark that a SSTable is being selected. The following variables are used for incremental repair: The repaired_at on disk field of a SSTable is used. - A 64-bit number increases sequentially The sstables_repaired_at is added to the system.tablets table. - repaired_at <= sstables_repaired_at means the sstable is repaired The being_repaired in memory field of a SSTable is added. - A repair UUID tells which sstable has participated in the repair Initial test results: 1) Medium dataset results Node amount: 3 Instance type: i4i.2xlarge Disk usage per node: ~500GB Cluster pre-populated with ~500GB of data before starting repairs job. Results for Repair Timings: The regular repair run took 210 mins. Incremental repair 1st run took 183 mins, 2nd and 3rd runs took around 48s The speedup is: 183 mins / 48s = 228X 2) Small dataset results Node amount: 3 Instance type: i4i.2xlarge Disk usage per node: ~167GB Cluster pre-populated with ~167GB of data before starting the repairs job. Regular repair 1st run took 110s, 2nd and 3rd runs took 110s. Incremental repair 1st run took 110 seconds, 2nd and 3rd run took 1.5 seconds. The speedup is: 110s / 1.5s = 73X 3) Large dataset results Node amount: 6 Instance type: i4i.2xlarge, 3 racks 50% of base load, 50% read/write Dataset == Sum of data on each node Dataset Non-incremental repair (minutes) 1.3 TiB 31:07 3.5 TiB 25:10 5.0 TiB 19:03 6.3 TiB 31:42 Dataset Incremental repair (minutes) 1.3 TiB 24:32 3.0 TiB 13:06 4.0 TiB 5:23 4.8 TiB 7:14 5.6 TiB 3:58 6.3 TiB 7:33 7.0 TiB 6:55 Fixes #22472 Closes scylladb/scylladb#24291 * github.com:scylladb/scylladb: replica: Introduce get_compaction_reenablers_and_lock_holders_for_repair compaction: Move compaction_reenabler to compaction_reenabler.hh topology_coordinator: Make rpc::remote_verb_error to warning level repair: Add metrics for sstable bytes read and skipped from sstables test.py: Disable incremental for test_tombstone_gc_for_streaming_and_repair test.py: Add tests for tablet incremental repair repair: Add tablet incremental repair support compaction: Add tablet incremental repair support feature_service: Add TABLET_INCREMENTAL_REPAIR feature tablet_allocator: Add tablet_force_tablet_count_increase and decrease repair: Add incremental helpers sstable: Add being_repaired to sstable sstables: Add set_repaired_at to metadata_collector mutation_compactor: Introduce add operator to compaction_stats tablet: Add sstables_repaired_at to system.tablets table test: Fix drain api in task_manager_client.py	2025-08-19 13:13:22 +03:00
Asias He	ad5275fd4c	test.py: Add tests for tablet incremental repair The following tests are added for tablet incremental repair: - Basic incremental repair - Basic incremental repair with error - Minor compaction and incremental repair - Major compaction and incremental repair - Scrub compaction and incremental repair - Cleanup/Upgrade compaction and incremental repair - Tablet split and incremental repair - Tablet merge and incremental repair	2025-08-18 11:01:21 +08:00
Evgeniy Naydanov	e44b26b809	test.py: pytest: support --mode/--repeat in a common way for all tests Implement repetition of files using pytest_collect_file hook: run file collection as many times as needed to cover all --mode/--repeat combinations. Also move disabled test logic to this hook. Store build mode and run_id in pytest item stashes. Simplify support of C++ tests: remove redundant facade abstraction and put all code into 3 files: base.py, boost.py, and unit.py Add support for `run_first` option in test_config.yaml	2025-08-17 15:26:23 +00:00
Evgeniy Naydanov	bffb6f3d01	test.py: pytest: streamline suite configuration handling Move test_config.yaml handling code from common_cpp_conftest.py to TestSuiteConfig class in test/pylib/runner.py	2025-08-17 12:32:36 +00:00
Evgeniy Naydanov	a2a59b18a3	test.py: refactor: remove unused imports in test.py Also use the constant for "suite.yaml" string.	2025-08-17 12:32:36 +00:00
Evgeniy Naydanov	a188523448	test.py: fix run with bare pytest after merge of scylladb/scylladb#24573 To run tests with bare pytest command we need to have almost the same set of options as test.py because we reuse code from test.py. scylladb/scylladb#24573 added `--pytest-arg` option to test.py but not to test/conftest.py which breaks running Python tests using bare pytest command.	2025-08-17 12:32:35 +00:00
Evgeniy Naydanov	600d05471b	test.py: refactor: move framework-related code to test.pylib.runner Some test suites have own test runners based on pytest, and they don't need all stuff we use for test.py. Move all code related to test.py framework to test/pylib/runner.py and use it as a plugin conditionally (by using TEST_RUNNER variable.)	2025-08-17 12:32:35 +00:00
Evgeniy Naydanov	f2619d2bb0	test.py: resource_gather: add cwd parameter to run_process() Also done sort arguments in Popen call to match the signature.	2025-08-17 12:32:35 +00:00
Benny Halevy	f22a870a04	test: cluster: test_repair: add test_vnode_keyspace_describe_ring Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-08-13 12:39:40 +03:00
Michael Litvak	5c28cffdb4	test/pylib/rest_client: fix ScyllaMetrics filtering In the ScyllaMetrics `get` function, when requesting the value for a specific shard, it is expected to return the sum of all values of metrics for that shard that match the labels. However, it would return the value of the first matching line it finds instead of summing all matching lines. For example, if we have two lines for one shard like: some_metric{scheduling_group_name="compaction",shard="0"} 1 some_metric{scheduling_group_name="sl:default",shard="0"} 2 The result of this call would be 1 instead of 3: get('some_metric', shard="0") We fix this to sum all matching lines. The filtering of lines by labels is fixed to allow specifying only some of the labels. Previously, for the line to match the filter, either the filter needs to be empty, or all the labels in the metric line had to be specified in the filter parameter and match its value, which is unexpected, and breaks when more labels are added. We also simplify the function signature and the implementation - instead of having the shard as a separate parameter, it can be specified as a label, like any other label.	2025-08-10 10:16:00 +02:00
Botond Dénes	70aa81990b	Merge 'Alternator - add the ability to write, not just read, system tables' from Nadav Har'El In commit `44a1daf` we added the ability to read Scylla system tables with Alternator. This feature is useful, among other things, in tests that want to read Scylla's configuration through the system table system.config. But tests often want to modify system.config, e.g., to temporarily reduce some threshold to make tests shorter. Until now, this was not possible This series add supports for writing to system tables through Alternator, and examples of tests using this capability (and utility functions to make it easy). Because the ability to write to system tables may have non-obvious security consequences, it is turned off by default and needs to be enabled with a new configuration option "alternator_allow_system_table_write" No backports are necessary - this feature is only intended for tests. We may later decide to backport if we want to backport new tests, but I think the probability we'll want to do this is low. Fixes #12348 Closes scylladb/scylladb#19147 * github.com:scylladb/scylladb: test/alternator: utility functions for changing configuration alternator: add optional support for writing to system table test/alternator: reduce duplicated code	2025-08-08 09:13:15 +03:00
Patryk Jędrzejczak	31372843e4	test: manager_client: allow removing a config option Currently, there is no simple way to remove an option from the server's config file in tests. One example when this is needed is removing the `recovery_leader` option on all servers during the recovery procedure. In this commit, we add a new method to `ManagerClient` that removes an option from the given server's config file.	2025-08-07 11:20:00 +02:00
Patryk Jędrzejczak	ce26896704	test: manager_client: add docstring to server_update_config	2025-08-07 11:19:54 +02:00
Nadav Har'El	d632599a92	Merge 'test.py: native pytest repeats' from Andrei Chekun Previous way of execution repeat was to launch pytest for each repeat. That was resource consuming, since each time pytest was doing discovery of the tests. Now all repeats are done inside one pytest process. Backport for 2025.3 is needed, since this functionality is framework only, and 2025.3 affected with this slow repeats as well. Closes scylladb/scylladb#25073 * github.com:scylladb/scylladb: test.py: add repeats in pytest test.py: add directories and filename to the log files test.py: rename log sink file for boost tests test.py: better error handling in boost facade	2025-08-06 18:18:03 +03:00
Nadav Har'El	a896e2dbb9	alternator: add optional support for writing to system table In commit `44a1daf` we added the ability to read system tables through the DynamoDB API (actually, the Scan and Query requests only). This ability is useful for tests, and can also be useful to users who want to read information that is only available through system tables. This patch adds support also for writing into system tables. This will be useful for Alternator tests, were we want to temporarily change some live-updatable configuration option - and so far haven't been able to do that like we did do in some cql-pytest tests. For reasons explained in issue #23218, only superuser roles are allowed to write to system tables - it is not enough for the role to be granted MODIFY permissions on the system table or on ALL KEYSPACES. Moreover, the ability to modify system tables carries special risks, so this patch only allows writes to the system tables if a new configuration option "alternator_allow_system_table_write" turned on. This option is turned off by default. This patch also includes a test for this new configuration-writing capability. The test scripts test/alternator/run and test.py now run Scylla with alternator_allow_system_table_write turned on, but the new test can also run without this option, and will be skipped in that case (to allow running the test suite against some manually- run instance of Scylla). Fixes: #12348 Signed-off-by: Nadav Har'El <nyh@scylladb.com>	2025-08-06 10:00:04 +03:00
Andrei Chekun	c0d652a973	test.py: change boost test stdout to use filehandler instead of pipe With current implementation if pytest will be killed, it will not be able to write the stdout from the boost test. With a new way it should be updated while test executing, instead of writing it the end of the test. Closes scylladb/scylladb#25260	2025-08-01 15:05:00 +03:00
Taras Veretilnyk	1d6808aec4	topology_coordinator: Make tablet_load_stats_refresh_interval configurable This commits introduces an config option 'tablet_load_stats_refresh_interval_in_seconds' that allows overriding the default value without using error injection. Fixes scylladb/scylladb#24641 Closes scylladb/scylladb#24746	2025-07-31 14:31:55 +03:00
Petr Gusev	3500a10197	scylla_cluster.py: add try_get_host_id Tests sometimes fail in ScyllaCluster.add_server on the 'replaced_srv.host_id' line because host_id is not resolved yet. In this commit we introduce functions try_get_host_id and get_host_id that resolve it when needed. Closes scylladb/scylladb#25177	2025-07-31 10:37:06 +02:00
Patryk Jędrzejczak	c41f0e6da9	Merge 'generic server: 2 step shutdown' from Sergey Zolotukhin This PR implements solution proposed in scylladb/scylladb#24481 Instead of terminating connections immediately, the shutdown now proceeds in two stages: first closing the receive (input) side to stop new requests, then waiting for all active requests to complete before fully closing the connections. The updated shutdown process is as follows: 1. Initial Shutdown Phase * Close the accept gate to block new incoming connections. * Abort all accept() calls. * For all active connections: * Close only the input side of the connection to prevent new requests. * Keep the output side open to allow responses to be sent. 2. Drain Phase * Wait for all in-progress requests to either complete or fail. 3. Final Shutdown Phase * Fully close all connections. Fixes scylladb/scylladb#24481 Closes scylladb/scylladb#24499 * https://github.com/scylladb/scylladb: test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`, decrease request timeout. generic_server: Two-step connection shutdown. transport: consmetic change, remove extra blanks. transport: Handle sleep aborted exception in sleep_until_timeout_passes generic_server: replace empty destructor with `= default` generic_server: refactor connection::shutdown to use `shutdown_input` and `shutdown_output` generic_server: add `shutdown_input` and `shutdown_output` functions to `connection` class. test: Add test for query execution during CQL server shutdown	2025-07-31 10:32:30 +02:00
Andrei Chekun	d0e4045103	test.py: add repeats in pytest Previous way of executin repeat was to launch pytest for each repeat. That was resource consuming, since each time pytest was doing discovery of the tests. Now all repeats are done inside one pytest process.	2025-07-30 12:03:08 +02:00
Andrei Chekun	853bdec3ec	test.py: add directories and filename to the log files Currently, only test function name used for output and log files. For better clarity adding the relative path from the test directory of the file name without extension to these files. Before: test_aggregate_avg.1.log test_aggregate_avg_stdout.1.log After: boost.aggregate_fcts_test.test_aggregate_avg.1.log boost.aggregate_fcts_test.test_aggregate_avg_stdout.3.log	2025-07-30 12:03:08 +02:00
Andrei Chekun	557293995b	test.py: rename log sink file for boost tests Log sink is outputted in XML format not just simple text file. Renaming to have better clarity	2025-07-30 12:03:08 +02:00
Andrei Chekun	cc75197efd	test.py: better error handling in boost facade If test was not executed for some reason, for example not known parameter passed to the test, but boost framework was able to finish correctly, log file will have data but it will be parsed to an empty list. This will raise an exception in pytest execution, rather than produce test output. This change will handle this situation.	2025-07-30 12:03:08 +02:00
Sergey Zolotukhin	4f63e1df58	test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`, decrease request timeout. In debug mode, queries may sometimes take longer than the default 30 seconds. To address this, the timeout value `request_timeout_on_shutdown_in_seconds` during tests is aligned with other request timeouts. Change request timeout for tests from 180s to 90s since we must keep the request timeout during shutdown significantly lower than the graceful shutdown timeout(2m), or else a request timeout would cause a graceful shutdown timeout and fail a test.	2025-07-29 15:37:47 +02:00
Asias He	e28c75aa79	repair: Avoid too many fragments in a single repair_row_on_wire When repairing a partition with many rows, we can store many fragments in a repair_row_on_wire object which is sent as a rpc stream message. This could cause reactor stalls when the rpc stream compression is turned on, because the compression compresses the whole message without any split and compression. This patch solves the problem at the higher level by reducing the message size that is sent to the rpc stream. Tests are added to make sure the message split works. Fixes #24808	2025-07-29 13:43:53 +08:00
Dawid Mędrek	b41151ff1a	test: Enable RF-rack-valid keyspaces in all Python suites We're enabling the configuration option `rf_rack_valid_keyspaces` in all Python test suites. All relevant tests have been adjusted to work with it enabled. That encompasses the following suites: * alternator, * broadcast_tables, * cluster (already enabled in scylladb/scylladb@ee96f8dcfc), * cql, * cqlpy (already enabled in scylladb/scylladb@be0877ce69), * nodetool, * rest_api. Two remaining suites that use tests written in Python, redis and scylla_gdb, are not affected, at least not directly. The redis suite requires creating an instance of Scylla manually, and the tests don't do anything that could violate the restriction. The scylla_gdb suite focuses on testing the capabilities of scylla-gdb.py, but even then it reuses the `run` file from the cqlpy suite. Fixes scylladb/scylladb#25126 Closes scylladb/scylladb#24617	2025-07-28 16:32:59 +02:00

1 2 3 4 5 ...

719 Commits