Add a dedicated auth::config struct that carries all configuration
options needed by auth modules. The config is created per-shard using
sharded_parameter to ensure updateable_value fields are shard-local.
The config is stored as a member in auth::service and passed by
const reference to factories so that each auth module can receive its
configuration when constructed. The modules themselves are not yet
converted — they still read from db::config via the query processor.
The stored config is also used in describe_roles() to read the
superuser name, eliminating the default_superuser() call that reached
into db::config via the query processor.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
GRANT/REVOKE fails on the maintenance socket connections,
because maintenance_auth_service uses allow_all_authorizer.
allow_all_authorizer allows all operations, but not GRANT/REVOKE,
because they make no sense in its context.
This has been observed during PGO run failure in operations from
./pgo/conf/auth.cql file.
This patch introduces maintenance_socket_authorizer that supports
the capabilities of default_authorizer ('CassandraAuthorizer')
without needing authorization.
Refs SCYLLADB-1070
Replace get_auth_ks_name(qp) with db::system_keyspace::NAME directly.
The function always returned the constant "system" and its qp
parameter was unused.
This series adds a global read barrier to raft_group0_client, ensuring that Raft group0 mutations are applied on all live nodes before returning to the caller.
Currently, after a group0_batch::commit, the mutations are only guaranteed to be applied on the leader. Other nodes may still be catching up, leading to stale reads. This patch introduces a broadcast read barrier mechanism. Calling send_group0_read_barrier_to_live_members after committing will cause the coordinator to send a read barrier RPC to all live nodes (discovered via gossiper) and waits for them to complete. This is best effort attempt to get cluster-wide visibility of the committed state before the response is returned to the user.
Auth and service levels write paths are switched to use this new mechanism.
Fixes https://scylladb.atlassian.net/browse/SCYLLADB-650
Backport: no, new feature
Closesscylladb/scylladb#28731
* https://github.com/scylladb/scylladb:
test: add tests for global group0_batch barrier feature
qos: switch service levels write paths to use global group0_batch barrier
auth: switch write paths to use global group0_batch barrier
raft: add function to broadcast read barrier request
raft: add gossiper dependency to raft_group0_client
raft: add read barrier RPC
Introduce maintenance_socket_authenticator and rework
maintenance_socket_role_manager to support role management operations.
Maintenance auth service uses allow_all_authenticator. To allow
role modification statements over the maintenance socket connections,
we need to treat the maintenance socket connections as superusers and
give them proper access rights.
Possible approaches are:
1. Modify allow_all_authenticator with conditional logic that
password_authenticator already does
2. Modify password_authenticator with conditional logic specific
for the maintenance socket connections
3. Extend password_authenticator, overriding the methods that differ
Option 3 is chosen: maintenance_socket_authenticator extends
password_authenticator with authentication disabled.
The maintenance_socket_role_manager is reworked to lazily create a
standard_role_manager once the node joins the cluster, delegating role
operations to it. In maintenance mode role operations remain disabled.
Refs SCYLLADB-409
This patch is part of preparations for dropping 'cassandra::cassandra'
default superuser. When that is implemented, maintenance_socket_role_manager
will have two modes of work:
1. in maintenance mode, where role operations are forbidden
2. in normal mode, where role operations are allowed
To execute the role operations, the node has to join a cluster.
In maintenance mode the node does not join a cluster.
This patch lets maintenance_socket_role_manager know if it works under
maintenance mode and returns appropriate error message when role
operations execution is requested.
Refs SCYLLADB-409
This patch removes class registrator usage in auth module.
It is not used after switching to factory functor initialization
of auth service.
Several role manager, authenticator, and authorizer name variables
are returned as well, and hardcoded inside qualified_java_name method,
since that is the only place they are ever used.
Refs SCYLLADB-409
Auth service can be initialized:
- [current] by passing instantiated authorizer, authenticator, role manager
- [current] by passing service_config, which then uses class registrator to instantiate authorizer, authenticator, role manager
- This approach is easy to use with sharded services
- [new] by passing factory functors which instantiate authorizer, authenticator, role manager
- This approach is also easy to use with sharded services
Refs SCYLLADB-409
In the following commit we'll need to add some
cache related logic (removing resource permissions).
This logic doesn't depend on authorizer so it should
be managed by the service itself.
The alien thread was a solution for reactor stalls caused by indivisible
password‑hashing tasks (scylladb/scylladb#24524). However, because
there is only one alien thread, overall hashing throughput was reduced
(see, e.g., scylladb/scylla-enterprise#5711). To address this,
the alien‑thread solution is reverted, and a hashing implementation
with yielding will be introduced later in this patch series.
This reverts commit 9574513ec1.
This commit adds a new version of command_desc struct
that contains a set of permissions instead of a singular
permission. When this struct is passed to ensure/check_has_permission,
we check if the user has any of the included permission on the resource.
Analysis of customer stalls showed that the `detail::hash_with_salt`
function, called from `passwords::check`, often blocks the reactor.
This function internally uses the `crypt_r` function from an external
library to compute password hashes, which is a CPU-intensive operation.
To prevent such reactor stalls, this commit moves the
`passwords::check` call to a dedicated alien thread. This thread is
created at system startup and is shared by all shards.
Within the alien thread, an `std::mutex` synchronizes access between
the thread and the shards. While this could theoretically cause
frequent lock contentions, in practice, even during connection storms,
the number of new connections per second per shard is limited
(typically hundreds per second). Additionally, the
`_conns_cpu_concurrency_semaphore` in `generic_server` ensures that not
too many connections are processed at once.
Fixesscylladb/scylladb#24524
It's possible to modify 'memtable_flush_period_in_ms' option only and as
single option, not with any other options together
Refs #20999Fixes#21223Closesscylladb/scylladb#22536
Cassandra 4.1 announced a new option to create a role with:
`HASHED PASSWORD`. Example:
```
CREATE ROLE bob WITH HASHED PASSWORD = 'hashed_password';
```
We've already introduced another option following the same
semantics: `SALTED HASH`; example:
```
CREATE ROLE bob WITH SALTED HASH = 'salted_hash';
```
The change hasn't made it to any release yet, so in this commit
we rename it to `HASHED PASSWORD` to be compatible with Cassandra.
Additionally, we adjust existing tests to work against Cassandra too.
Fixesscylladb/scylladb#21350Closesscylladb/scylladb#21352
This change reorganizes the way standard_role_manager startup is
handled: now the future returned by its start() function can be used to
determine when startup has finished. We use this future to ensure the
startup is finished prior to starting the CQL server.
Some clusters are created without auth, and auth is added later. The
first node to recognize that auth is needed must create the superuser.
Currently this is always on restart, but if we were to ever make it
LiveUpdate then it would not be on restart.
This suggests that we don't really need to wait during restart.
This is a preparatory commit, laying ground for implementation of a
start() function that waits for the superuser to be created. The default
implementation returns a ready future, which makes no change in the code
behavior.
We introduce a function `describe_auth()` in `auth::service`
responsible for producing a sequence of descriptions whose
corresponding CQL statement can be used to restore the state
of auth.
With big number of shards in the cluster (e.g. 500+) due to cache
periodic refresh we experience high load on role_permissions table
(e.g. 1k op/s). The load on roles table is amplified because to populate
single entry in the cache we do several selects on roles table. Some
of this can't be avoided because roles are arranged in a tree-like
structure where permissions can be inherited.
This patch tries to reuse queries which are simply duplicated. It should
reduce the load on roles table by up to 50%.
Fixesscylladb/scylladb#19299Closesscylladb/scylladb#19300
* github.com:scylladb/scylladb:
auth: reuse roles select query during cache population
auth: coroutinize service::get_uncached_permissions
auth: coroutinize service::has_superuser
With big number of shards in the cluster (e.g. 500+) due to cache
periodic refresh we experience high load on role_permissions table
(e.g. 1k op/s). The load on roles table is amplified because to populate
single entry in the cache we do several selects on roles table. Some
of this can't be avoided because roles are arranged in a tree-like
structure where permissions can be inherited.
This patch tries to reuse queries which are simply duplicated. It should
reduce the load on roles table by up to 50%.
Fixesscylladb/scylladb#19299
This description is readable from raft log table.
Previously single description was provided for the whole
announce call but since it can contain mutations from
various subsystems now description was moved to
add_mutation(s)/add_generator function calls.
The main theme of this commit is executing drop
keyspace/table/aggregate/function statements in a single
transaction together with auth auto-revoke logic.
This is the logic which cleans related permissions after
resource is deleted.
It contains serveral parts which couldn't easily be split
into separate commits mainly because mutation collector related
paths can't be mixed together. It would require holding multiple
guards which we don't support. Another reason is that with mutation
collector the changes are announced in a single place, at the end
of statement execution, if we'd announce something in the middle
then it'd lead to raft concurrent modification infinite loop as it'd
invalidate our guard taken at the begining of statement execution.
So this commit contains:
- moving auto-revoke code to statement execution from migration_listener
* only for auth-v2 flow, to not break the old one
* it's now executed during statement execution and not merging schemas,
which means it produces mutations once as it should and not on each
node separately
* on_before callback family wasn't used because I consider it much
less readable code. Long term we want to remove
auth_migration_listener.
- adding mutation collector to revoke_all
* auto-revoke uses this function so it had to be changed,
auth::revoke_all free function wrapper was added as cql3
layer should not use underlying_authorizer() directly.
- adding mutation collector to drop_role
* because it depends on revoke_all and we can't mix old and new flows
* we need to switch all functions auth::drop_role call uses
* gradual use of previously introduced modify_membership, otherwise
we would need to switch even more code in this commit
This is done to achieve single transaction semantics.
The change includes auto-grant feature. In particular
for schema related auto-grant we don't use normal
mutation collector announce path but follow migration manager,
this may be unified in the future.
This is done to achieve single transaction semantics.
grant_permissions_to_creator is logically part of create role
but its change will be included in following commits
as it spans multiple usages.
Additinally we disabled rollback during create role as
it won't work and is not needed with single transaction logic.
We need this later as we'll add condition
based on legacy_mode(qp) and free function
doesn't have access to qp.
Moreover long term we should get rid of this
weird free function pattern bloat.
Statements code have only access to client_state from
which it takes auth::service. It doesn't have abort_source
nor group0_client so we need to add them to auth::service.
Additionally since abort_source can't be const the whole
announce_mutations method needs non const auth::service
so we need to remove const from the getter function.
Auth interface is quite mixed-up but general rule is that cql
statements code calls auth::* free functions from auth/service.hh
to execute auth logic.
There are many exceptions where underlying_authorizer or
underlying_role_manager or auth::service method is used instead.
Service should not leak it's internal APIs to upper layers so
functions like underlying_role_manager should not exists.
In this commit we fix tiny fragment related to auth write path.
We won't run:
- old pre auth-v1 migration code
- code creating auth-v1 tables
We will keep running:
- code creating default rows
- code creating auth-v1 keyspace (needed due to cqlsh legacy hack,
it errors when executing `list roles` or `list users` if
there is no system_auth keyspace, it does support case when
there is no expected tables)
Before the patch selection of auth version depended
on consistent topology feature but during raft recovery
procedure this feature is disabled so we need to persist
the version somewhere to not switch back to v1 as this
is not supported.
During recovery auth works in read-only mode, writes
will fail.
During raft topology upgrade procedure data from
system_auth keyspace will be migrated to system_auth_v2.
Migration works mostly on top of CQL layer to minimize
amount of new code introduced, it mostly executes SELECTs
on old tables and then INSERTs on new tables. Writes are
not executed as usual but rather announced via raft.
Alternator doesn't do any writes to auth
tables so it's simply change of keyspace
name.
Docs will be updated later, when auth-v2
is enabled as default.
In a follow-up patch abort_source will be used
inside those methods. Current pattern is that abort_source
is passed everywhere as non const so it needs to be
executed in non const context.
Closesscylladb/scylladb#17312
The maintenance socket is created before joining the cluster. When maintenance auth service
is started it creates system_auth keyspace if it's missing. It is not synchronized
with other nodes, because this node hasn't joined the group0 yet. Thus a node has
a mismatched schema and is unable to join the cluster.
The maintenance socket doesn't use role management, thus the problem is solved
by not creating system_auth keyspace when maintenance auth service is created.
The logic of regular CQL port's auth service won't be changed. For the maintenance
socket will be created a new separate auth service.
Expose cql3::query_processor in auth::service
to get to the topology via storage_proxy.replica::database
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>