"
This series restructures the SASL code that was previously internal
to the `password_authenticator` so that it can be used in other contexts.
"
* 'jhk/restructure_sasl/v1' of https://github.com/hakuch/scylla:
auth: Rename SASL challenge class for "PLAIN"
auth: Make a ctor `explicit`
auth: Move `sasl_challenge` to its own file
auth: Decouple SASL code from its parent class
In case salted_hash was NULL, we'd access uninitialized memory when dereferencing
the optional in get_as<>().
Protect against that by using get_opt() and failing authentication if we see a NULL.
Fixes#4168.
Tests: unit (release)
Branches: 3.0, 2.3
Message-Id: <20190211173820.8053-1-avi@scylladb.com>
Replace stdx::optional and stdx::string_view with the C++ std
counterparts.
Some instances of boost::variant were also replaced with std::variant,
namely those that called seastar::visit.
Scylla now requires GCC 8 to compile.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
Message-Id: <20190108111141.5369-1-duarte@scylladb.com>
query_processor uses storage_proxy to access data, and the local
database object to access replicated metadata. While it seems strange
that the database object is not used to access data, it is logical
when you consider that a sharded<database> only contain's this node's
data, not the cluster data.
Take advantage of this to replace sharded<database> with a single database
shard.
db::config is a global class; changes in any module can cause changes
in db::config. Therefore, it is a cause of needless recompilation.
Remove some of these dependencies by having consumers of db::config
declare an intermediate config struct that is contains only
configuration of interest to them, and have their caller fill it out
(in the case of auth, it already followed this scheme and the patchset
only moves the translation function).
In addition, some outright pointless inclusions of db/config.hh are
removed.
The result is somewhat shorter compile times, and fewer needless
recompiles.
* https://github.com/avikivity/scylla unconfig-1/v1:
config: remove inclusions of db/config.hh from header files
repair: remove unneeded config.hh inclusion
batchlog_manager: remove dependency on db::config
auth: remove permissions_cache dependency on db::config
auth: remove auth::service dependency on db::config
auth: remove unneeded db/config.hh includes
auth::service already has its own configuration and a function to create it
from db::config; just move it to the caller. This reduces dependencies on the
global db::config class.
permissions_cache already has its own configuration and a function to create it
from db::config; just move it to the caller. This reduces dependencies on the
global db::config class.
When the auth service is requested to stop during bootstrap,
it might have still not reached schema agreement.
Currently, waiting for this agreement is done in an infinite loop,
without taking abort_source into account.
This patch introduces checking if abort was requested
and breaking the loop in such case, so auth service can terminate.
Tests:
unit (release)
dtest (bootstrap_test.py:TestBootstrap.shutdown_wiped_node_cannot_join_test)
Message-Id: <1b7ded14b7c42254f02b5d2e10791eb767aae7fc.1543914769.git.sarna@scylladb.com>
sprint() recently became more strict, throwing on sprint("%s", 5). Replace
with the more modern format().
Mechanically converted with https://github.com/avikivity/unsprint.
* seastar d152f2d...c1e0e5d (6):
> scripts: perftune.py: properly merge parameters from the command line and the configuration file
> fmt: update to 5.2.1
> io_queue: only increment statistics when request is admitted
> Adds `read_first_line.cc` and `read_first_line.hh` to CMake.
> fstream: remove default extent allocation hint
> core/semaphore: Change the access of semaphore_units main ctor
Due to a compile-time fight between fmt and boost::multiprecision, a
lexical_cast was added to mediate.
sprint("%s", var) no longer accepts numeric values, so some sprint()s were
converted to format() calls. Since more may be lurking we'll need to remove
all sprint() calls.
Signed-off-by: Duarte Nunes <duarte@scylladb.com>
On Fedora 28, creating an instance of `std::random_device` opens a file
descriptor for `/dev/urandom` (observed via `strace`).
By declaring static thread-local instances of `std::random_device`,
these descriptors will be open (barring optimization by the compiler)
for the entire duration of the Scylla process's life.
However, the `std::random_device` instance is only necessary for
initializing the `RandomNumberEngine` for generating salts. With this
change, the file-descriptor is closed immediately after the engine is
initialized.
I considered generalizing this pattern of initialization into a
function, but with only two uses (and simple ones) I think this would
only obscure things.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Tests: unit (release)
Message-Id: <f1b985d99f66e5e64d714fd0f087e235b71557d2.1536697368.git.jhaberku@scylladb.com>
Commit e664f9b0c6 transitioned internal
CQL queries in the auth. sub-system to be executed with finite time-outs
instead of infinite ones.
It should have also modified the functions in `auth/roles-metadata.cc`
to have finite time-outs.
This change fixes some previously failing dtests, particularly around
repair. Without this change, the QUORUM query fails to terminate when
the necessary consistency level cannot be achieved.
Fixes#3736.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <e244dc3e731b4019f3be72c52a91f23ee4bb68d1.1536163859.git.jhaberku@scylladb.com>
This makes the function useable in more contexts due to
flexibility (including in tests), since the state is not captured and
the characteristics of salt generation can be customized to the caller's
needs.
Instead of reducing the large value via `%`, which can produce
non-uniformly distributed values when the range is small, we specify the
range in the distribution, which is uniform by construction.
The `generate_salt` function invokes this function internally now.
This change means that `generate_salt` is now thread-safe and therefore
does not have to be invoked by a single thread only when starting the
`password_authenticator`.
This further means that `generate_salt` does not need to be part of the
public interface of the module, and can be moved to the implementation
file.
While the `password_authenticator` is a complex component with lots of
dependencies, password hashing and checking itself is a process with
limited logical state and dependencies, which makes it easy to isolate
and test.
`std::random_device` has a lot of implementation-specific behavior, and
as a result we cannot assume much about its performance characteristics.
We initialize thread-specific static instances of `std::random_device`
once so that we don't have the overhead of invoking the ctor during
every invocation of `gensalt`.
In previous versions of Fedora, the `crypt_r` function returned
`nullptr` when a requested hashing algorithm was not supported.
This is consistent with the documentation of the function in its man
page.
As of Fedora 28, the function's behavior changes so that the encrypted
text is not `nullptr` on error, but instead the string "*0".
The info pages for `crypt_r` clarify somewhat (and contradict the man
pages):
Some implementations return `NULL` on failure, and others return an
_invalid_ hashed passphrase, which will begin with a `*` and will
not be the same as SALT.
Because of this change of behavior, users running Scylla on a Fedora 28
machine which was upgraded from a previous release would not be able to
authenticate: an unsupported hashing algorithm would be selected,
producing encrypted text that did not match the entry in the table.
With this change, unsupported algorithms are correctly detected and
users should be able to continue to authenticate themselves.
Fixes#3637.
Signed-off-by: Jesse Haber-Kucharsky <jhaberku@scylladb.com>
Message-Id: <bcd708f3ec195870fa2b0d147c8910fb63db7e0e.1533322594.git.jhaberku@scylladb.com>
std::random_device() uses the relatively slow /dev/urandom, and we rarely if
ever intend to use it directly - we normally want to use it to seed a faster
random_engine (a pseudo-random number generator).
In many places in the code, we first created a random_device variable, and then
using it created a random_engine variable. However, this practice created the
risk of a programmer accidentally using the random_device object, instead of the
random_engine object, because both have the same API; This hurts performance.
This risk materialized in just two places in the code, utils/uuid.cc and
gms/gossiper.cc. A patch for to uuid.cc was sent previously by Pawel and is
not included in this patch, and the fix for gossiper.{cc,hh} is included here.
To avoid risking the same mistake in the future, this patch switches across the
code to an idiom where the random_device object is not *named*, so cannot be
accidentally used. We use the following idiom:
std::default_random_engine _engine{std::random_device{}()};
Here std::random_device{}() creates the random device (/dev/urandom) and pulls
a random integer from it. It then uses this seed to create the random_engine
(the pseudo-random number generator). The std::random_device{} object is
temporary and unnamed, and cannot be unintentionally used directly.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Message-Id: <20180726154958.4405-1-nyh@scylladb.com>
untyped_result_set_row's cell data type is bytes_opt, and the
get_block() accessor accesses the value assuming it's engaged
(relying on the caller to call has()).
has_unsalted_hash() calls get_blob() without calling has() beforehand,
potentially triggering undefined behavior.
Fix by using get_or() instead, which also simplifies the caller.
I observed failures in Jenkins in this area. It's hard to be sure
this is the root cause, since the failures triggered an internal
consistency assertion in asan rather than an asan report. However,
the error is hard to reproduce and the fix makes sense even if it
doesn't prevent the error.
See #3480 for the asan error.
Fixes#3480 (hopefully).
Message-Id: <20180602181919.29204-1-avi@scylladb.com>
Remove implicit timeouts and replace with caller-specified timeouts.
This allows removing the ambiguity about what timeout a statement is
executed with, and allows removing cql_statement::execute_internal(),
which mostly overrode timeouts and consistency levels.
Timeout selection is now as follows:
query_processor::*_internal: infinite timeout, CL=ONE
query_processor::process(), execute(): user-specified consisistency level and timeout
All callers were adjusted to specify an infinite timeout. This can be
further adjusted later to use the "other" timeout for DCL and the
read or write timeout (as needed) for authentication in the normal
query path.
Note that infinite timeouts don't mean that the query will hang; as
soon as the failure detector decides that the node is down, RPC
responses will termiante with a failure and the query will fail.
None of the `authenticator` implementations we have support custom
options, but we should support this operation to support the relevant
CQL statements.
I've observed failures due to "missing" the peer nodes by about 1
second. Adding 5 second to the existing delay should cover most false
negative test results.
Fixes#3320.
If `auth::password_authenticator` also creates `system_auth.roles` and
we fix the existence check for the default superuser in
`auth::standard_role_manager` to only search for the columns that it
owns (instead of the column itself), then both modules' initialization
are independent of one another.
Fixes#3319.
Apache Cassandra also prints at the `info` level. This change prevents
tasks which we expect to be rescheduled from failing tests and scaring
users.
A good example of this importance of this change is when queries with a
quorum consistency level (for the default superuser) fail because a
quorum is not available. We will try again in this case, and this should
not cause integration tests to fail.
Some modules of `auth` create a default superuser if it does not already
exist.
The existence check is through a SELECT query with quorum consistency
level. If the schema for the applicable tables has not yet propagated to
a peer node at the time that it processes this query, then the
`storage_proxy` will print an error message to the log and the query
will be retried.
Eventually, the schema will propagate and the default superuser will be
created. However, the error message in the log causes integration tests
to fail (and is somewhat annoying).
Now, prior to querying for existing data, we wait for all gossip peers
to have the same schema version as we do.
Fixes#2852.
When a table, keyspace, or role is created, the creator now is
automatically granted all applicable permissions on the object.
This behavior is consistent with Apache Cassandra.
Fixes#3216.
Instead of some functions in `allow_all_authorizer` throwing exceptions
and others being silently pass-through, we consistently return exception
futures with `auth::unsupported_authorization_operation`. These errors
are converted to `invalid_request_exception` in the CQL error and
ignored where appropriate in the auth subsystem.