Commit Graph

32 Commits

Author SHA1 Message Date
Andrei Chekun
f54b7f5427 test.py: Increase pool size
Increase pool size changes were recently reverted because of the flakiness for the test_gossip_boot test. Test started
to fail on adding the node to the cluster without any issues in the Scylla log file. In test logs it looked like the
installation process for the new node just hanged. After investigating the problem, I've found out that the issue is that
test.py was draining the io_executor pool for cleaning the directory during install that was set to eight workers. So
to fix the issue, io_executor pool should be increased to more or less the same ratio as it was: doubled cluster pool size.

Closes scylladb/scylladb#20276
2024-08-25 19:59:18 +03:00
Michał Jadwiszczak
b62a8b747a test/auth_cluster/test_raft_service_levels: add test for automatic
connection update
2024-08-08 10:42:09 +02:00
Michał Jadwiszczak
d643d5637c test/auth/test_auth_v2_migration: create sl1 in the test
Test `test_auth_v2_migration` creates auth data where role `users`
has assigned service level `sl:fefe` but the service level isn't
actually created.

In following patches, we are going to introduce effective service levels
cache which depends on auth and is refreshed when mutations are applied
to v2 auth tables.

Without this changes, this test will fail because the service level
doesn't exist.
Also the name `sl:fefe` is change to `sl1`.
2024-08-08 10:42:08 +02:00
Emil Maskovsky
9ab25e5cbf test: raft: replace the use of read_barrier work-around
Replaced the old `read_barrier` helper from "test/pylib/util.py"
by the new helper from "test/pylib/rest_client.py" that is calling
the newly introduced direct REST API.

Replaced in all relevant tests and decommissioned the old helper.

Introduced a new helper `get_host_api_address` to retrieve the host API
address - which in come cases can be different from the host address
(e.g. if the RPC address is changed).

Fixes: scylladb/scylladb#19662

Closes scylladb/scylladb#19739
2024-07-19 19:20:44 +02:00
Marcin Maliszkiewicz
b708c5701f test: auth: add random tag to resources in test_auth_v2_migration
Those tests are sometimes failing on CI and we have two hypothesis:
1. Something wrong with consistency of statements
2. Interruption from another test run (e.g. same queries performed
  concurrently or data remained after previous run)

To exclude or confirm 2. we add random marker to avoid potential collision,
in such case it will be clearly visible that wrong data comes from
a different run.

Related scylladb/scylladb#18931
Related scylladb/scylladb#18319
2024-06-27 09:28:27 +02:00
Marcin Maliszkiewicz
794440eb85 test: skip checking default role in test_auth_v2_migration
Default role creation in auth-v1 is asynchronous and all nodes race to
create it so we'd need to delay the test and wait. Checking this particular
role doesn't bring much value to the test as we check other roles
to demonstrate correctness.

Fixes scylladb/scylladb#19039

Closes scylladb/scylladb#19424
2024-06-23 19:50:55 +03:00
Piotr Dulikowski
fa142a9ce7 Merge 'qos/raft_service_level_distributed_data_accessor: print correct error message when trying to modify a service level in recovery mode' from Michał Jadwiszczak
Raft service levels are read-only in recovery mode. This patch adds check and proper error message when a user tries to modify service levels in recovery mode.

Fixes https://github.com/scylladb/scylladb/issues/18827

Closes scylladb/scylladb#18841

* github.com:scylladb/scylladb:
  test/auth_cluster/test_raft_service_levels: try to create sl in recovery
  service/qos/raft_sl_dda: reject changes to service levels in recovery mode
  service/qos/raft_sl_dda: extract raft_sl_dda steps to common function
2024-05-27 13:26:06 +02:00
Marcin Maliszkiewicz
2ab143fb40 db: auth: move auth tables to system keyspace
Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.

Fixes https://github.com/scylladb/scylladb/issues/18098

Closes scylladb/scylladb#18769
2024-05-26 22:30:42 +03:00
Michał Jadwiszczak
af0b6bcc56 test/auth_cluster/test_raft_service_levels: try to create sl in recovery 2024-05-23 17:49:59 +02:00
Kamil Braun
4dcae66380 Merge 'test: {auth,topology}: use manager.rolling_restart' from Piotr Dulikowski
Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior.

In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed.

Fixes: scylladb/scylladb#18369

Closes scylladb/scylladb#18379

* github.com:scylladb/scylladb:
  test: get rid of server-side server_restart
  test: util: get rid of the `restart` helper
  test: {auth,topology}: use manager.rolling_restart
2024-05-08 09:45:08 +02:00
Piotr Dulikowski
5459cfed6a Merge 'auth: don't run legacy migrations in auth-v2 mode' from Marcin Maliszkiewicz
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables

We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
  it errors when executing `list roles` or `list users` if
  there is no system_auth keyspace, it does support case when
  there is no expected tables)

Fixes https://github.com/scylladb/scylladb/issues/17737

Closes scylladb/scylladb#17939

* github.com:scylladb/scylladb:
  auth: don't run legacy migrations on auth-v2 startup
  auth: fix indent in password_authenticator::start
  auth: remove unused service::has_existing_legacy_users func
2024-05-06 19:53:35 +02:00
Piotr Dulikowski
897e603bf0 test: {auth,topology}: use manager.rolling_restart
Instead of performing a rolling restart by calling `restart` in a loop
over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other
nodes see the currently processed node as up or down before proceeding
to the next step. Not doing so may lead to surprising behavior.

In particular, in scylladb/scylladb#18369, a test failed shortly after
restarting three nodes. Because nodes were restarted one after another
too fast, when the third node was restarted it didn't send a
notification to the second node because it still didn't know that the
second node was alive. This led the second node to notice that the third
node restarted by observing that it incremented its generation in gossip
(it restarted too fast to be marked as down by the failure detector). In
turn, this caused the second node to send "third node down" and "third
node up" notifications to the driver in a quick succession, causing it
to drop and reestablish all connections to that node. However, this
happened _after_ rolling upgrade finished and _after_ the test logic
confirmed that all nodes were alive. When the notifications were sent to
the driver, the test was executing some statements necessary for the
test to pass - as they broke, the test failed.

Fixes: scylladb/scylladb#18369
2024-05-06 12:24:40 +02:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
213f2f6882 storage_service: join_cluster: replace force_gossip_based_join with force-gossip-topology-changes
The `force_gossip_based_join` error injection does exactly what we
expect from `force-gossip-topology-changes` so we can do a simple
replacement.

We prefer a flag over an error injection because we will use it
a lot in CI jobs' configurations, some tests, manual testing etc.
It's much more convenient.

Moreover, the flag can be used in the release mode, so we re-enable
all tests that were disabled in release mode only because of using
the `force_gossip_based_join` error injection.

The name of the `force-gossip-topology-changes` flag suggests that
using it should always succesfully force the gossip-based topology
or, if forcing is not possible, the booting should fail. We don't
want a node with `force-gossip-topology-changes=true` that silently
boots in the raft-topology mode. We achieve it by throwing a
runtime error from `join_cluster` in two cases:
- the node is restarting in the cluster that is using raft topology
- the node is joining the cluster that is using raft topology
2024-04-25 14:33:21 +02:00
Marcin Maliszkiewicz
7e749cd848 auth: don't run legacy migrations on auth-v2 startup
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables

We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
  it errors when executing `list roles` or `list users` if
  there is no system_auth keyspace, it does support case when
  there is no expected tables)
2024-04-15 12:09:39 +02:00
Piotr Dulikowski
baae811142 Merge 'auth: keep auth version in scylla_local' from Marcin Maliszkiewicz
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.

Fixes https://github.com/scylladb/scylladb/issues/17736

Closes scylladb/scylladb#18039

* github.com:scylladb/scylladb:
  auth: keep auth version in scylla_local
  auth: coroutinize service::start
2024-04-03 12:25:56 +02:00
Marcin Maliszkiewicz
562caaf6c6 auth: keep auth version in scylla_local
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.

During recovery auth works in read-only mode, writes
will fail.
2024-04-02 19:04:21 +02:00
Marcin Maliszkiewicz
50e0032bca test: auth: remove if not exists from auth cql statement
They were added due to https://github.com/scylladb/python-driver/issues/296
but looks like it no longer reproduces.

Change was tested with ./test.py -vv --repeat=100 test_auth
to minimize chance of introducing flakiness.

Closes scylladb/scylladb#18043
2024-03-28 06:06:45 +01:00
Michał Jadwiszczak
c0853b461c test: test service levels v2 works in recovery mode 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
c551a85cda test: add test for service levels migration 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
5811f696be test: add test for service levels snapshot 2024-03-21 23:14:57 +01:00
Michał Jadwiszczak
bf3aed1ecb test:topology: extract trigger_snapshot to utils
The function was defined separately in a few tests.
2024-03-21 23:14:57 +01:00
Andrei Chekun
7de28729e7 test: change maintenance socket location to /tmp
Fixes #16912

By default, ScyllaDB stores the maintenance socket in the workdir. Test.py by default uses the location for the ScyllaDB workdir as testlog/{mode}/scylla-#. The Usual location for cloning the repo is the user's home folder. In some cases, it can lead the socket path being too long and the test will start to fail. The simple way is to move the maintenance socket to /tmp folder to eliminate such a possibility.

Closes scylladb/scylladb#17941
2024-03-21 18:22:21 +02:00
Marcin Maliszkiewicz
7b60752e47 test: fix cql connection problem in test_auth_raft_command_split
This is a speculative fix as the problem is observed only on CI.
When run_async is called right after driver_connect and get_cql
it fails with ConnectionException('Host has been marked down or
removed').

If the approach proves to be succesfull we can start to deprecate
base get_cql in favor of get_ready_cql. It's better to have robust
testing helper libraries than try to take care of it in every test
case separately.

Fixes #17713

Closes scylladb/scylladb#17772
2024-03-13 10:36:51 +01:00
Marcin Maliszkiewicz
eb56ae3bb9 test: extend auth-v2 migration test to catch stale static 2024-03-01 16:31:04 +01:00
Marcin Maliszkiewicz
6c30dc6351 test: add auth-v2 migration test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
53996e2557 test: add auth-v2 snapshot transfer test 2024-03-01 16:25:14 +01:00
Marcin Maliszkiewicz
4f65e173cf test: auth: add tests for lost quorum and command splitting
With auth-v2 we can login even if quorum is lost. So test
which checks if error occurs in such situation is deleted
and the opposite test which checks if logging in works was
added.
2024-03-01 16:25:14 +01:00
Mikołaj Grzebieluch
c589793a9e test.py: test_maintenance_socket: remove pytest.xfail
Issue https://github.com/scylladb/python-driver/issues/278 was fixed in
https://github.com/scylladb/python-driver/pull/279.

Closes scylladb/scylladb#16873
2024-01-19 14:54:15 +01:00
Patryk Jędrzejczak
a8513bd41b test: replace multiple server_add calls with servers_add
ManagerClient.servers_add can be used in every test that uses
consistent topology changes. We replace all multiple server_add
calls in such tests with a single servers_add call to make these
tests faster and simplify their code. Additionally, these
servers_add calls will test concurrent bootstraps for free.
2024-01-02 12:19:33 +01:00
Mikołaj Grzebieluch
ef10b497e1 test.py: add maintenance socket test
Test that when connecting to the maintenance socket, the user has superuser permissions,
even if the authentication is enabled on the regular port.
2023-12-18 17:58:13 +01:00
Paweł Zakrzewski
a0dcc154c1 test: add the auth_cluster test suite
This commit adds the auth_cluster test suite to test a custom scenario
involving password authentication:
- create a cluster of 2 nodes with password authentication
- down one node
- the other node should refuse login stating that it couldn't reach
  QUORUM

References ScyllaDB OSS #2339
2023-11-13 14:04:28 +01:00