Separate keyspace which also behaves as system brings
little benefit while creating some compatibility problems
like schema digest mismatch during rollback. So we decided
to move auth tables into system keyspace.
Fixes https://github.com/scylladb/scylladb/issues/18098Closesscylladb/scylladb#18769
(cherry picked from commit 2ab143fb40)
[avi: adjust test/alternator/suite.yaml to reflect new keyspace]
Instead of performing a rolling restart by calling `restart` in a loop over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other nodes see the currently processed node as up or down before proceeding to the next step. Not doing so may lead to surprising behavior.
In particular, in scylladb/scylladb#18369, a test failed shortly after restarting three nodes. Because nodes were restarted one after another too fast, when the third node was restarted it didn't send a notification to the second node because it still didn't know that the second node was alive. This led the second node to notice that the third node restarted by observing that it incremented its generation in gossip (it restarted too fast to be marked as down by the failure detector). In turn, this caused the second node to send "third node down" and "third node up" notifications to the driver in a quick succession, causing it to drop and reestablish all connections to that node. However, this happened _after_ rolling upgrade finished and _after_ the test logic confirmed that all nodes were alive. When the notifications were sent to the driver, the test was executing some statements necessary for the test to pass - as they broke, the test failed.
Fixes: scylladb/scylladb#18369Closesscylladb/scylladb#18379
* github.com:scylladb/scylladb:
test: get rid of server-side server_restart
test: util: get rid of the `restart` helper
test: {auth,topology}: use manager.rolling_restart
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables
We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
it errors when executing `list roles` or `list users` if
there is no system_auth keyspace, it does support case when
there is no expected tables)
Fixes https://github.com/scylladb/scylladb/issues/17737Closesscylladb/scylladb#17939
* github.com:scylladb/scylladb:
auth: don't run legacy migrations on auth-v2 startup
auth: fix indent in password_authenticator::start
auth: remove unused service::has_existing_legacy_users func
Instead of performing a rolling restart by calling `restart` in a loop
over every node in the cluster, use the dedicated
`manager.rolling_restart` function. This method waits until all other
nodes see the currently processed node as up or down before proceeding
to the next step. Not doing so may lead to surprising behavior.
In particular, in scylladb/scylladb#18369, a test failed shortly after
restarting three nodes. Because nodes were restarted one after another
too fast, when the third node was restarted it didn't send a
notification to the second node because it still didn't know that the
second node was alive. This led the second node to notice that the third
node restarted by observing that it incremented its generation in gossip
(it restarted too fast to be marked as down by the failure detector). In
turn, this caused the second node to send "third node down" and "third
node up" notifications to the driver in a quick succession, causing it
to drop and reestablish all connections to that node. However, this
happened _after_ rolling upgrade finished and _after_ the test logic
confirmed that all nodes were alive. When the notifications were sent to
the driver, the test was executing some statements necessary for the
test to pass - as they broke, the test failed.
Fixes: scylladb/scylladb#18369
The `force_gossip_based_join` error injection does exactly what we
expect from `force-gossip-topology-changes` so we can do a simple
replacement.
We prefer a flag over an error injection because we will use it
a lot in CI jobs' configurations, some tests, manual testing etc.
It's much more convenient.
Moreover, the flag can be used in the release mode, so we re-enable
all tests that were disabled in release mode only because of using
the `force_gossip_based_join` error injection.
The name of the `force-gossip-topology-changes` flag suggests that
using it should always succesfully force the gossip-based topology
or, if forcing is not possible, the booting should fail. We don't
want a node with `force-gossip-topology-changes=true` that silently
boots in the raft-topology mode. We achieve it by throwing a
runtime error from `join_cluster` in two cases:
- the node is restarting in the cluster that is using raft topology
- the node is joining the cluster that is using raft topology
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables
We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
it errors when executing `list roles` or `list users` if
there is no system_auth keyspace, it does support case when
there is no expected tables)
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.
During recovery auth works in read-only mode, writes
will fail.
Fixes https://github.com/scylladb/scylladb/issues/17736Closesscylladb/scylladb#18039
* github.com:scylladb/scylladb:
auth: keep auth version in scylla_local
auth: coroutinize service::start
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.
During recovery auth works in read-only mode, writes
will fail.