Commit Graph

179 Commits

Author SHA1 Message Date
Avi Kivity
eb74fe784d auth: convert sprint() to format()
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().

Mechanically converted with https://github.com/avikivity/unsprint.
2018-11-01 13:16:17 +00:00
Duarte Nunes
e46ef6723b Merge seastar upstream
* seastar d152f2d...c1e0e5d (6):
  > scripts: perftune.py: properly merge parameters from the command line and the configuration file
  > fmt: update to 5.2.1
  > io_queue: only increment statistics when request is admitted
  > Adds `read_first_line.cc` and `read_first_line.hh` to CMake.
  > fstream: remove default extent allocation hint
  > core/semaphore: Change the access of semaphore_units main ctor

Due to a compile-time fight between fmt and boost::multiprecision, a
lexical_cast was added to mediate.

sprint("%s", var) no longer accepts numeric values, so some sprint()s were
converted to format() calls. Since more may be lurking we'll need to remove
all sprint() calls.

Signed-off-by: Duarte Nunes <duarte@scylladb.com>
2018-10-25 12:53:30 +03:00
Jesse Haber-Kucharsky
9d27045c76 auth: Shorten random_device instance life-span
On Fedora 28, creating an instance of `std::random_device` opens a file
descriptor for `/dev/urandom` (observed via `strace`).

By declaring static thread-local instances of `std::random_device`,
these descriptors will be open (barring optimization by the compiler)
for the entire duration of the Scylla process's life.

However, the `std::random_device` instance is only necessary for
initializing the `RandomNumberEngine` for generating salts. With this
change, the file-descriptor is closed immediately after the engine is
initialized.

I considered generalizing this pattern of initialization into a
function, but with only two uses (and simple ones) I think this would
only obscure things.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Tests: unit (release)
Message-Id: <f1b985d99f66e5e64d714fd0f087e235b71557d2.1536697368.git.jhaberku@scylladb.com>
2018-09-12 12:14:21 +01:00
Jesse Haber-Kucharsky
682805b22c auth: Use finite time-out for all QUORUM reads
Commit e664f9b0c6 transitioned internal
CQL queries in the auth. sub-system to be executed with finite time-outs
instead of infinite ones.

It should have also modified the functions in `auth/roles-metadata.cc`
to have finite time-outs.

This change fixes some previously failing dtests, particularly around
repair. Without this change, the QUORUM query fails to terminate when
the necessary consistency level cannot be achieved.

Fixes #3736.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <e244dc3e731b4019f3be72c52a91f23ee4bb68d1.1536163859.git.jhaberku@scylladb.com>
2018-09-05 21:55:26 +03:00
Jesse Haber-Kucharsky
b95bbb2e72 auth: Clean up implementation comments 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
9519a03351 auth: Remove unnecessary local variable
The variable could be declared `const`, but removing it outright seems
more clear and this way we don't have to come up with a name.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
52d3ff057a auth: Allow different random engines for salt
This makes the function useable in more contexts due to
flexibility (including in tests), since the state is not captured and
the characteristics of salt generation can be customized to the caller's
needs.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
836fd954e1 auth: Correct modulo bias in salt generation
Instead of reducing the large value via `%`, which can produce
non-uniformly distributed values when the range is small, we specify the
range in the distribution, which is uniform by construction.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
fe58a0b207 auth: Extract random byte generation for salt 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
fd60d61ebf auth: Split out test for best supported scheme
The `generate_salt` function invokes this function internally now.

This change means that `generate_salt` is now thread-safe and therefore
does not have to be invoked by a single thread only when starting the
`password_authenticator`.

This further means that `generate_salt` does not need to be part of the
public interface of the module, and can be moved to the implementation
file.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
adf058bd1f auth: Rename function to use full words 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
9b8cbb8542 auth: Add domain-specific exception for passwords 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
dbea3f5a01 auth: Document passwords interface 2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
b272d622f8 auth: Move passsword stuff to its own namespace
For clarity and nicer function names.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
de01aaf181 auth: Identify password hashing errors correctly
See fce10f2c6e for reference.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
2a40bcb281 auth: Move password handling to its own files
While the `password_authenticator` is a complex component with lots of
dependencies, password hashing and checking itself is a process with
limited logical state and dependencies, which makes it easy to isolate
and test.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
03cf57db62 auth: Construct std::random_device instances once
`std::random_device` has a lot of implementation-specific behavior, and
as a result we cannot assume much about its performance characteristics.

We initialize thread-specific static instances of `std::random_device`
once so that we don't have the overhead of invoking the ctor during
every invocation of `gensalt`.
2018-08-13 13:24:45 -04:00
Jesse Haber-Kucharsky
fce10f2c6e auth: Don't use unsupported hashing algorithms
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.

This is consistent with the documentation of the function in its man
page.

As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".

The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):

    Some implementations return `NULL` on failure, and others return an
    _invalid_ hashed passphrase, which will begin with a `*` and will
    not be the same as SALT.

Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.

With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.

Fixes #3637.

Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
2018-08-05 08:57:36 +03:00
Jesse Haber-Kucharsky
e664f9b0c6 Use finite time-outs for internal auth. queries 2018-07-31 11:38:16 -04:00
Nadav Har'El
25bd139508 cross-tree: clean up use of std::random_device()
std::random_device() uses the relatively slow /dev/urandom, and we rarely if
ever intend to use it directly - we normally want to use it to seed a faster
random_engine (a pseudo-random number generator).

In many places in the code, we first created a random_device variable, and then
using it created a random_engine variable. However, this practice created the
risk of a programmer accidentally using the random_device object, instead of the
random_engine object, because both have the same API; This hurts performance.

This risk materialized in just two places in the code, utils/uuid.cc and
gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is
not included in this patch, and the fix for gossiper.{cc,hh} is included here.

To avoid risking the same mistake in the future, this patch switches across the
code to an idiom where the random_device object is not *named*, so cannot be
accidentally used. We use the following idiom:

   std::default_random_engine _engine{std::random_device{}()};

Here std::random_device{}() creates the random device (/dev/urandom) and pulls
a random integer from it. It then uses this seed to create the random_engine
(the pseudo-random number generator). The std::random_device{} object is
temporary and unnamed, and cannot be unintentionally used directly.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180726154958.4405-1-nyh@scylladb.com>
2018-07-26 16:54:58 +01:00
Avi Kivity
187ebdbe46 auth: fix possible use of disengaged optional in has_salted_hash()
untyped_result_set_row's cell data type is bytes_opt, and the
get_block() accessor accesses the value assuming it's engaged
(relying on the caller to call has()).

has_unsalted_hash() calls get_blob() without calling has() beforehand,
potentially triggering undefined behavior.

Fix by using get_or() instead, which also simplifies the caller.

I observed failures in Jenkins in this area. It's hard to be sure
this is the root cause, since the failures triggered an internal
consistency assertion in asan rather than an asan report. However,
the error is hard to reproduce and the fix makes sense even if it
doesn't prevent the error.

See #3480 for the asan error.

Fixes #3480 (hopefully).
Message-Id: <20180602181919.29204-1-avi@scylladb.com>
2018-06-02 19:46:32 +01:00
Avi Kivity
a99e820bb9 query_processor: require clients to specify timeout configuration
Remove implicit timeouts and replace with caller-specified timeouts.
This allows removing the ambiguity about what timeout a statement is
executed with, and allows removing cql_statement::execute_internal(),
which mostly overrode timeouts and consistency levels.

Timeout selection is now as follows:

  query_processor::*_internal: infinite timeout, CL=ONE
  query_processor::process(), execute(): user-specified consisistency level and timeout

All callers were adjusted to specify an infinite timeout. This can be
further adjusted later to use the "other" timeout for DCL and the
read or write timeout (as needed) for authentication in the normal
query path.

Note that infinite timeouts don't mean that the query will hang; as
soon as the failure detector decides that the node is down, RPC
responses will termiante with a failure and the query will fail.
2018-05-14 09:41:06 +03:00
Jesse Haber-Kucharsky
cd0553ca6a auth: Query custom options from the authenticator
None of the `authenticator` implementations we have support custom
options, but we should support this operation to support the relevant
CQL statements.
2018-05-09 21:12:50 -04:00
Jesse Haber-Kucharsky
e149e48609 auth: Add type alias for custom auth. options 2018-05-09 21:12:47 -04:00
Tomasz Grabiec
52c61df930 Relax includes
To avoid unnecessary recompilations.
Message-Id: <1522168295-994-1-git-send-email-tgrabiec@scylladb.com>
2018-03-28 10:49:07 +03:00
Jesse Haber-Kucharsky
af24637565 auth: Increase delay before background tasks start
I've observed failures due to "missing" the peer nodes by about 1
second. Adding 5 second to the existing delay should cover most false
negative test results.

Fixes #3320.
2018-03-26 00:52:55 -04:00
Jesse Haber-Kucharsky
00f7bc676d auth: Remove ordering dependence
If `auth::password_authenticator` also creates `system_auth.roles` and
we fix the existence check for the default superuser in
`auth::standard_role_manager` to only search for the columns that it
owns (instead of the column itself), then both modules' initialization
are independent of one another.

Fixes #3319.
2018-03-25 22:38:11 -04:00
Jesse Haber-Kucharsky
968c61c296 auth: Don't warn on rescheduled task
Apache Cassandra also prints at the `info` level. This change prevents
tasks which we expect to be rescheduled from failing tests and scaring
users.

A good example of this importance of this change is when queries with a
quorum consistency level (for the default superuser) fail because a
quorum is not available. We will try again in this case, and this should
not cause integration tests to fail.
2018-03-25 22:38:11 -04:00
Jesse Haber-Kucharsky
881656cea4 auth: Wait for schema agreement
Some modules of `auth` create a default superuser if it does not already
exist.

The existence check is through a SELECT query with quorum consistency
level. If the schema for the applicable tables has not yet propagated to
a peer node at the time that it processes this query, then the
`storage_proxy` will print an error message to the log and the query
will be retried.

Eventually, the schema will propagate and the default superuser will be
created. However, the error message in the log causes integration tests
to fail (and is somewhat annoying).

Now, prior to querying for existing data, we wait for all gossip peers
to have the same schema version as we do.

Fixes #2852.
2018-03-25 22:38:08 -04:00
Jesse Haber-Kucharsky
6a360c2d17 auth: Grant all permissions to object creator
When a table, keyspace, or role is created, the creator now is
automatically granted all applicable permissions on the object.

This behavior is consistent with Apache Cassandra.

Fixes #3216.
2018-03-14 01:54:31 -04:00
Jesse Haber-Kucharsky
c502fe24ce auth: Unify handling for unsupported errors
Instead of some functions in `allow_all_authorizer` throwing exceptions
and others being silently pass-through, we consistently return exception
futures with `auth::unsupported_authorization_operation`. These errors
are converted to `invalid_request_exception` in the CQL error and
ignored where appropriate in the auth subsystem.
2018-03-14 01:54:28 -04:00
Jesse Haber-Kucharsky
97235445d3 auth: Fix life-time issue with parameter 2018-03-14 01:32:53 -04:00
Jesse Haber-Kucharsky
9117a689cf auth: Fix const correctness
This patch came about because of an important (and obvious, in
hindsight) realization: instances of the authorizer, role manager, and
authenticator are clients for access-control state and not the state
itself. This is reflected directly in Scylla: `auth::service` is
sharded across cores and this is possible because each instance queries
and modifies the same global state.

To give more examples, the value of an instance of `std::vector<int>` is
the structure of the container and its contents. The value of `int
file_descriptor` is an identifier for state maintained elsewhere.

Having watched an excellent talk by Herb Sutter [1] and having read an
informative blog post [2], it's clear that a member function marked
`const` communicates that the observable state of the instance is not
modified.

Thus, the member functions of the role-manager, authenticator, and
authorizer clients should not be marked `const` only if the state of the
client itself is observably changed. By this principle, member functions
which do not change the state of the client, but which mutate the global
state the client is associated with (for example, by creating a role)
are marked `const`.

The `start` (and `stop`) functions of the client have the dual role of
initializing (finalizing) both the local client state and the
external state; they are not marked `const`.

[1] https://herbsutter.com/2013/01/01/video-you-dont-know-const-and-mutable/

[2] http://talesofcpp.fusionfenix.com/post-2/episode-one-to-be-or-not-to-be-const
2018-03-14 01:32:43 -04:00
Avi Kivity
d973445a94 Merge "sstable/schema extensions" from Calle
"
Adds extension points to schema/sstables to enable hooking in
stuff, like, say, something that modifies how sstable disk io
works. (Cough, cough, *encryption*)

Extensions are processed as property keywords in CQL. To add
an extension, a "module" must register it into the extensions
object on boot time. To avoid globals (and yet don't),
extensions are reachable from config (and thus from db).

Table/view tables already contain an extension element, so
we utilize this to persist config.

schema_tables tables/views from mutations now require a "context"
object (currently only extensions, but abstracted for easier
further changes.

Because of how schemas currently operate, there is a super
lame workaround to allow "schema_registry" access to config
and by extension extensions. DB, upon instansiation, calls
a thread local global "init" in schema_registry and registers
the config. It, in turn, can then call table_from_mutations
as required.

Includes the (modified) patch to encapsulate compression
into objects, mainly because it is nice to encapsulate, and
isolate a little.
"

* 'calle/extensions-v5' of github.com:scylladb/seastar-dev:
  extensions: Small unit test
  sstables: Process extensions on file open
  sstables::types: Add optional extensions attribute to scylla metadata
  sstables::disk_types: Add hash and comparator(sstring) to disk_string
  schema_tables: Load/save extensions table
  cql: Add schema extensions processing to properties
  schema_tables: Require context object in schema load path
  schema_tables: Add opaque context object
  config_file_impl: Remove ostream operators
  main/init: Formalize configurables + add extensions to init call
  db::config: Add extensions as a config sub-object
  db::extensions: Configuration object to store various extensions
  cql3::statements::property_definitions: Use std::variant instead of any
  sstables: Add extension type for wrapping file io
  schema: Add opaque type to represent extensions
  sstables::compress/compress: Make compression a virtual object
2018-02-26 17:15:29 +02:00
Jesse Haber-Kucharsky
a83af20311 auth: Add alias for set of role names
This shortens some type names considerably.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
39a44e3494 auth: Revoke permissions on dropped role resources
Previously, when a table or keyspace was dropped, the
authorizer (through a `migration_listener`) automatically dropped all
permissions granted on that resource.

Likewise, when a role is granted permissions and the role is dropped,
all permissions granted to the role are dropped.

In this change, we now treat role resources just like table and keyspace
resources: if a permission is granted on a role (like "GRANT AUTHORIZE
ON ROLE qa TO phil") and the "qa" role is dropped, then all permissions
on the "qa" role resource are also dropped.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
e6d9d53eca auth: Move definition to corresponding .cc file 2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
fbc97626c4 auth: Migrate legacy data on boot
This change allows for seamless migration of the legacy users metadata
to the new role-based metadata tables. This process is summarized in
`docs/migrating-from-users-to-roles.md`.

In general, if any nondefault metadata exists in the new tables, then
no migration happens. If, in this case, legacy metadata still exists
then a warning is written to the log.

If no nondefault metadata exists in the new tables and the legacy tables
exist, then each node will copy the data from the legacy tables to the
new tables, performing transformations as necessary. An informational
message is written to the log when the migration process starts, and
when the process ends. During the process of copying, data is
overwritten so that multiple nodes racing to migrate data do not
conflict.

Since Apache Cassandra's auth. schema uses the same table for managing
roles and authentication information, some useful functions in
`roles-metadata.hh` have been added to avoid code duplication.

Because a superuser should be able to drop the legacy users tables from
`system_auth` once the cluster has migrated to roles and is functioning
correctly, we remove the restriction on altering anything in the
"system_auth" keyspace. Individual tables in `system_auth` are still
protected later in the function.

When a cluster is upgrading from one that does not support roles to one
that does, some nodes will be running old code which accesses old
metadata and some will be running new code which access new metadata.

With the help of the gossiper `feature` mechanism, clients connecting to
upgraded nodes will be notified (through code in the relevant CQL
statements) that modifications are not allowed until the entire cluster
has upgraded.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
8be0165713 auth: Check protected resources of the role-manager
A new function `auth::service::is_protected` checks the
protected-resource set of all access-control modules (including the
role-manager).
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
f9f03bc2e1 cql3: Fix error handling for GRANT and REVOKE
This change gets rid of duplicated code for checking if the grantee or
revokee exist by moving this functionality to the auth. service.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
e18adbcb3e auth: Remove unnecessary sstring allocation
The authorizer now accepts parameters by `string_view`.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
5be16247cc auth: Decouple authorization and role management
auth: Decouple authorization and role management

Access control in Scylla consists of three main modules: authentication,
authorization, and role-management.

Each of these modules is intended to be interchangeable with alternative
implementations. The `auth::service` class composes these modules
together to perform all access-control functionality, including caching.

This architecture implies two main properties of the individual
access-control modules:

- Independence of modules. An implementation of authentication should
  have no dependence or knowledge of authorization or role-management,
  for example.

- Simplicity of implementing the interface. Functionality that is common
  to all implementations should not have to be duplicated in each
  implementation. The abstract interface for a module should capture
  only the differences between particular implementations.

Previously, the authorization interface depended on an instance of
`auth::service` for certain operations, since it required aggregation
over all the roles granted to a particular role or required checking if
a given role had superuser.

This change decouples authorization entirely from role-management: the
authorizer now manages only permissions granted directly to a role, and
not those inherited through other roles.

When a query needs to be authorized, `auth::service::get_permissions`
first uses the role manager to check if the role has superuser. Then, it
aggregates calls to `auth::authorizer::authorize` for each role granted
to the role (again, from the role-manager) to determine the sum-total
permission set. This information is cached for future queries.

This structure allows for easier error handling and
management (something I hope to improve in the future for both the
authorizer and authenticator interfaces), easier system testing, easier
implementation of the abstract interfaces, and clearer system
boundaries (so the code is easier to grok).

Some authorizers, like the "TransitionalAuthorizer", grant permissions
to anonymous users. Therefore, we could not unconditionally authorize an
empty permission set in `auth::service` for anonymous users. To account
for this, the interface of the authorizer has changed to accept an
optional name in `authorize`.

One additional notable change to the authorizer is the
`auth::authorizer::list`: previously, the filtering happened at the CQL
query layer and depended on the roles granted to the role in question.
I've changed the function to simply query for all roles and I do the
filtering in `auth::system` in-memory with the STL. This was necessary
to allow the authorizer to be decoupled from role-management. This
function is only called for LIST PERMISSIONS (so performance is not a
concern), and it significantly reduces demand on the implementation.

Finally, we unconditionally create a user in `cql_test_env` since
authorization requires its existence.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
0ac7d9922d auth: Add code to expand a resource family
This will be useful for the next change, where it is used for
refactoring LIST PERMISSIONS.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
13ba128967 auth: Change error messages to pass dtests
The fixed dtests which only failed due to differences in wording and
grammar for error messages are:

- altering_nonexistent_user_throws_exception_test
- cant_create_existing_user_test
- dropping_nonexistent_user_throws_exception_test
- users_cant_alter_their_superuser_status_test
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
ce3be07556 auth: Move resource existence checks
Previously, a "data" auth. resource knew how to check it's own existence by
accessing a global variable.

This patch accomplishes two things: it adds existence checking to all
kinds of resources, and moves these checks outside of `auth::resource`
itself and into `auth::service` (so that global variables are no longer
accessed).
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
cf5f6aa4c5 auth: Fix fragile variable life-times
According to the Seastar convention, a parameter passed to a function
taking a reference parameter must live for the duration of the execution
of the returned future.

When possible, variables are statically allocated. When this is not
possible, we use `do_with`.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
357f3afb60 auth: Remove outdated "TODO"
Authorization never happens at this level of the stack, though it
formally did.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
b1d9d0e4ff auth: Reorder authorizer args for consistency 2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
c1504cd4ff auth: Pass resource by const ref.
This has the dual benefit of not enforcing copying on implementations of
the abstract interface and also limiting unnecessary copies.

As usual with Seastar, we follow the convention that a reference
parameter to a function is assumed valid for the duration of the
`future` that is returned. `do_with` helps here.

By adding some constants for root resources, we can avoid using
`seastar::do_with` at some call-sites involving `resource` instances.
2018-02-14 14:15:59 -05:00
Jesse Haber-Kucharsky
45631604b0 auth: Use string_view for paramters 2018-02-14 14:15:59 -05:00