scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-30 19:46:48 +00:00

Author	SHA1	Message	Date
Andrzej Jackowski	39bfad48cc	test: use ManagerClient in wait_until_driver_service_level_created Pass a ManagerClient instead of a `cql` session to `wait_until_driver_service_level_created`. This makes it easier to add additional functionality to the helper later (e.g. waiting for a Raft read barrier in a subsequent commit). Refs: scylladb/scylladb#27019	2025-11-17 14:55:14 +01:00
Andrzej Jackowski	8642629e8e	test: add test_anonymous_user to test_raft_service_levels The primary goal of this test is to reproduce scylladb/scylladb#26040 so the fix (`278019c328`) can be backported to older branches. Scenario: connect via CQL as an anonymous user and verify that the `sl:default` scheduling group is used. Before the fix for #26040 `main` scheduling group was incorrectly used instead of `sl:default`. Control connections may legitimately use `sl:driver`, so the test accepts those occurrences while still asserting that regular anonymous queries use `sl:default`. This adds explicit coverage on master. After scylladb#24411 was implemented, some other tests started to fail when scylladb#26040 was unfixed. However, none of the tests asserted this exact behavior. Refs: scylladb/scylladb#26040 Refs: scylladb/scylladb#26581 Closes scylladb/scylladb#26589	2025-10-24 12:23:34 +02:00
Andrzej Jackowski	f720ce0492	test: add test to verify use of `sl:driver` `sl:driver` is expected to be used for new and control connections, but other connections that run user load should not use it after the user is authenticated. Refs: scylladb/scylladb#24411	2025-10-08 08:25:33 +02:00
Andrzej Jackowski	0ddf46c7b4	test: service_levels: add tests for sl:driver creation and removal Refs: scylladb/scylladb#24411	2025-10-08 08:25:02 +02:00
Andrzej Jackowski	c59a7db1c9	service_level_controller: automatically create `sl:driver` This commit: - Increases the number of allowed scheduling groups to allow the creation of `sl:driver`. - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating `sl:driver` until all nodes have increased the number of scheduling groups. - Starts using `get_create_driver_service_level_mutations` to unconditionally create `sl:driver` on `raft_initialize_discovery_leader`. The purpose of this code path is ensuring existence of `sl:driver` in new system and tests. - Starts using `migrate_to_driver_service_level` to create `sl:driver` if it is not already present. The creation of `sl:driver` is managed by `topology_coordinator`, similar to other system keyspace updates, such as the `view_builder` migration. The purpose of this code path is handling upgrades. - Modifies related tests to pass after `sl:driver` is added. Later in this patch series, `sl:driver` will be used by `transport/server` to handle selected traffic, such as the driver's schema and topology fetches. Refs: scylladb/scylladb#24411	2025-10-08 08:24:43 +02:00
Andrzej Jackowski	7d2db37831	test: add MAX_USER_SERVICE_LEVELS Previously, tests used the hardcoded value 7 for the maximum number of user service levels. This commit introduces a named variable that can be shared across tests to avoid cases where this magic number goes out of sync.	2025-10-08 08:24:17 +02:00
Andrzej Jackowski	c3dd383e9e	test: add reproduction of name reuse bug to service level tests This commit adds a reproduction test for scylladb/scylladb#26190 to the service levels test suite. Although the bug was fixed internally in Seastar, the corner-case service level name reuse scenario should be covered by tests to prevent regressions. Refs: https://github.com/scylladb/scylladb/issues/26190 Closes scylladb/scylladb#26379	2025-10-06 14:19:22 +02:00
Michał Hudobski	ae4d4908ba	configure: increase SCHEDULING_GROUPS_COUNT to 20 We would like to have an additional service level available for users of the Vector Store service, which would allow us to de/prioritize vector operations as needed. To allow that, we increase the number of scheduling groups from 19 to 20 and adjust the related test accordingly. Closes scylladb/scylladb#26316	2025-09-30 12:41:28 +03:00
Avi Kivity	1258e7c165	Revert "Merge 'transport: service_level_controller: create and use `driver` service level' from Andrzej Jackowski" This reverts commit `fe7e63f109`, reversing changes made to `b5f3f2f4c5`. It is causing test.py failures around cqlpy. Fixes #26163 Closes scylladb/scylladb#26174	2025-09-22 09:32:46 +03:00
Andrzej Jackowski	452313f5a5	test: add test to verify use of `sl:driver` `sl:driver` is expected to be used for new and control connections, but other connections that run user load should not use it after the user is authenticated. Refs: scylladb/scylladb#24411	2025-09-18 09:29:37 +02:00
Andrzej Jackowski	43a0eb7b0b	test: service_levels: add tests for sl:driver creation and removal Refs: scylladb/scylladb#24411	2025-09-18 09:28:32 +02:00
Andrzej Jackowski	6f678a2d1f	service_level_controller: automatically create `sl:driver` This commit: - Increases the number of allowed scheduling groups to allow the creation of `sl:driver`. - Adds the `DRIVER_SERVICE_LEVEL` feature, which prevents creating `sl:driver` until all nodes have increased the number of scheduling groups. - Starts using `get_create_driver_service_level_mutations` to unconditionally create `sl:driver` on `raft_initialize_discovery_leader`. The purpose of this code path is ensuring existence of `sl:driver` in new system and tests. - Starts using `migrate_to_driver_service_level` to create `sl:driver` if it is not already present. The creation of `sl:driver` is managed by `topology_coordinator`, similar to other system keyspace updates, such as the `view_builder` migration. The purpose of this code path is handling upgrades. - Modifies related tests to pass after `sl:driver` is added. Later in this patch series, `sl:driver` will be used by `transport/server` to handle selected traffic, such as the driver's schema and topology fetches. Refs: scylladb/scylladb#24411	2025-09-18 09:28:32 +02:00
Andrzej Jackowski	d30590c1d0	test: add MAX_USER_SERVICE_LEVELS Previously, tests used the hardcoded value 7 for the maximum number of user service levels. This commit introduces a named variable that can be shared across tests to avoid cases where this magic number goes out of sync.	2025-09-18 09:28:32 +02:00
Dawid Mędrek	e929279d74	service/qos: Reload effective SL cache conditionally Since `service_level_controller` outlives `auth_integration`, it may happen that we try to access it when it has already been deinitialized. To prevent that, we only try to reload or clear the effective service level cache when the object is still alive. These changes solve an existing problem with an invalid memory access. For more context, see issue scylladb/scylladb#24792. We provide a reproducer test that consistently fails before these changes but passes after them. Fixes scylladb/scylladb#24792	2025-08-26 18:41:40 +02:00
Patryk Jędrzejczak	193a74576a	test/cluster/conftest: cluster_con: provide default values for port and use_ssl Some cluster tests use `cluster_con` when they need a different load balancing policy or auth provider. However, no test uses a port other than 9042 or enables SSL, but all tests must pass `9042, False` because these parameters don't have default values. This makes the code more verbose. Also, it's quite obvious that 9042 stands for port, but it's not obvious what `False` is related to, so there is a need to check the definition of `cluster_con` while reading any test that uses it. No reason to backport, it's only a minor refactoring. Closes scylladb/scylladb#25516	2025-08-22 09:51:24 +03:00
Patryk Jędrzejczak	03cc34e3a0	test: test_maintenance_socket: use cluster_con for driver sessions The test creates all driver sessions by itself. As a consequence, all sessions use the default request timeout of 10s. This can be too low for the debug mode, as observed in scylladb/scylla-enterprise#5601. In this commit, we change the test to use `cluster_con`, so that the sessions have the request timeout set to 200s from now on. Fixes scylladb/scylla-enterprise#5601 This commit changes only the test and is a CI stability improvement, so it should be backported all the way to 2024.2. 2024.1 doesn't have this test. Closes scylladb/scylladb#25510	2025-08-15 09:32:20 +03:00
Pavel Emelyanov	34608450c5	Merge 'qos: don't populate effective service level cache until auth is migrated to raft' from Piotr Dulikowski Right now, service levels are migrated in one group0 command and auth is migrated in the next one. This has a bad effect on the group0 state reload logic - modifying service levels in group0 causes the effective service levels cache to be recalculated, and to do so we need to fetch information about all roles. If the reload happens after SL upgrade and before auth upgrade, the query for roles will be directed to the legacy auth tables in system_auth - and the query, being a potentially remote query, has a timeout. If the query times out, it will throw an exception which will break the group0 apply fiber and the node will need to be restarted to bring it back to work. In order to solve this issue, make sure that the service level module does not start populating and using the service level cache until both service levels and auth are migrated to raft. This is achieved by adding the check both to the cache population logic and the effective service level getter - they now look at service level's accessor new method, `can_use_effective_service_level_cache` which takes a look at the auth version. Fixes: scylladb/scylladb#24963 Should be backported to all versions which support upgrade to topology over raft - the issue described here may put the cluster into a state which is difficult to get out of (group0 apply fiber can break on multiple nodes, which necessitates their restart). Closes scylladb/scylladb#25188 * github.com:scylladb/scylladb: test: sl: verify that legacy auth is not queried in sl to raft upgrade qos: don't populate effective service level cache until auth is migrated to raft	2025-07-31 13:05:27 +03:00
Sergey Zolotukhin	4f63e1df58	test: Set `request_timeout_on_shutdown_in_seconds` to `request_timeout_in_ms`, decrease request timeout. In debug mode, queries may sometimes take longer than the default 30 seconds. To address this, the timeout value `request_timeout_on_shutdown_in_seconds` during tests is aligned with other request timeouts. Change request timeout for tests from 180s to 90s since we must keep the request timeout during shutdown significantly lower than the graceful shutdown timeout(2m), or else a request timeout would cause a graceful shutdown timeout and fail a test.	2025-07-29 15:37:47 +02:00
Piotr Dulikowski	3a082d314c	test: sl: verify that legacy auth is not queried in sl to raft upgrade Adjust `test_service_levels_upgrade`: right before upgrade to topology on raft, enable an error injection which triggers when the standard role manager is about to query the legacy auth tables in the system_auth keyspace. The preceding commit which fixes scylladb/scylladb#24963 makes sure that the legacy tables are not queried during upgrade to topology on raft, so the error injection does not trigger and does not cause a problem; without that commit, the test fails.	2025-07-29 11:39:17 +02:00
Marcin Maliszkiewicz	5e7ac34822	test: auth_cluster: add test for password reset procedure	2025-06-26 12:28:08 +02:00
Marcin Maliszkiewicz	67a4bfc152	test: auth_cluster: add test for replacing default superuser This test demonstrates creating custom superuser guide: https://opensource.docs.scylladb.com/stable/operating-scylla/security/create-superuser.html	2025-06-26 12:28:08 +02:00
Andrzej Jackowski	555d897a15	test: wait for normal state propagation in test_auth_v2_migration By default, cluster tests have skip_wait_for_gossip_to_settle=0 and ring_delay_ms=0. In tests with gossip topology, it may lead to a race, where nodes see different state of each other. In case of test_auth_v2_migration, there are three nodes. If the first node already knows that the third node is NORMAL, and the second node does not, the system_auth tables can return incomplete results. To avoid such a race, this commit adds a check that all nodes see other nodes as NORMAL before any writes are done. Refs: #24163 Closes scylladb/scylladb#24185	2025-05-27 11:41:09 +03:00
Botond Dénes	fcdae20fd1	Merge 'Add tablet enforcing option' from Benny Halevy This series add a new config option: `tablets_mode_for_new_keyspaces` that replaces the existing `enable_tablets` option. It can be set to the following values: disabled: New keyspaces use vnodes by default, unless enabled by the tablets={'enabled':true} option enabled: New keyspaces use tablets by default, unless disabled by the tablets={'disabled':true} option enforced: New keyspaces must use tablets. Tablets cannot be disabled using the CREATE KEYSPACE option `tablets_mode_for_new_keyspaces=disabled` or `tablets_mode_for_new_keyspaces=enabled` control whether tablets are disabled or enabled by default for new keyspaces, respectively. In either cases, tablets can be opted-in or out using the `tablets={'enabled':...}` keyspace option, when the keyspace is created. `tablets_mode_for_new_keyspaces=enforced` enables tablets by default for new keyspaces, like `tablets_mode_for_new_keyspaces=enabled`. However, it does not allow to opt-out when creating new keyspaces by setting `tablets = {'enabled': false}` Refs scylladb/scylla-enterprise#4355 * Requires backport to 2025.1 Closes scylladb/scylladb#22273 * github.com:scylladb/scylladb: boost/tablets_test: verify failure to create keyspace with tablets and non network replication strategy tablets: enforce tablets using tablets_mode_for_new_keyspaces=enforced config option db/config: add tablets_mode_for_new_keyspaces option	2025-04-03 16:32:19 +03:00
Benny Halevy	a4aa4d74c1	test/pylib: servers_add: add auto_rack_dc parameter To quickly populate nodes in a single dc, each node in its own rack. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-30 19:23:40 +03:00
Benny Halevy	c62865df90	db/config: add tablets_mode_for_new_keyspaces option The new option deprecates the existing `enable_tablets` option. It will be extended in the next patch with a 3rd value: "enforced" while will enable tablets by default for new keyspace but without the posibility to opt out using the `tablets = {'enabled': false}` keyspace schema option. Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-03-24 14:54:45 +02:00
Patryk Jędrzejczak	ca5c223505	test: mark tests with the gossip-based recovery procedure This patch makes it clear which Raft recovery procedure is used in each test. Tests with "This test uses the gossip-based recovery procedure." are the tests that use the gossip-based topology. This tests should be deleted once we make the Raft-based topology mandatory. Tests with the new FIXME are the tests that use the Raft-based topology. They should be changed to use the Raft-based recovery procedure or removed if they don't test anything important with the new procedure.	2025-03-14 13:53:05 +01:00
Artsiom Mishuta	a283b391c2	test.py: merge auth_cluster into cluster folter Now that we support suite subfolders, there is no need to create an own suite for auth_cluster	2025-03-04 10:32:44 +01:00

27 Commits