now that we are allowed to use C++23. we now have the luxury of using
`std::views::values`.
in this change, we:
- replace `boost::adaptors::map_values` with `std::views::values`
- update affected code to work with `std::views::values`
- the places where we use `boost::join()` are not changed, because
we cannot use `std::views::concat` yet. this helper is only
available in C++26.
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21265
It's somewhat common to ask for the partition key and clustering key
columns, or for the static and regular columsn. Provide accessors for them
rather than requiring the user to glue them.
Some callers are converted.
Closesscylladb/scylladb#21191
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.
in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
now that we are allowed to use C++23. we now have the luxury of using
`std::views::keys`.
in this change, we:
- replace `boost::adaptors::map_keys` with `std::views::keys`
- update affected code to work with `std::views::keys`
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21198
This includes way too much, including <boost/regex.hpp>, which is huge.
Drop includes of adaptors.hpp and replace by what is needed.
Closesscylladb/scylladb#21187
To reduce dependency load, change uses of boost ranges to std::ranges.
The first patch is preparation, replacing a construct that isn't easy to support with std ranges with something simpler.
No backport as this is a code cleanup.
Closesscylladb/scylladb#21122
* github.com:scylladb/scylladb:
schema: replace boost ranges with std ranges
schema: precompute all_columns_in_select_order()
implement cassandra original schema option memtable_flush_period_in_ms:
Milliseconds before memtables associated with the table are flushed.
there are few things concerning this patch:
* milliseconds look strange and scary for this option. Unlike Cassandra
we use 60000ms (1min) minimum value for this option.
* This is limitation of Cassandra but it is impossible to set this option
for system tables. However sometimes it could be very useful to use
automatic flushing for such a tables: some system tables have small
traffic and as a result prevent tombstone garbage collection.
Fixes#20270Closesscylladb/scylladb#20999
To reduce dependency load, use std ranges instead of boost ranges.
The std::ranges::{lower,upper}_bound don't support heterogeneous lookup,
but a more natural solution is to use a projection to search for the name,
so we use that and the custom comparator is removed.
Many callers are converted as well due to poor interoperability between
boost ranges and std ranges.
all_columns_in_select_order() returns a complicated boost range type
that has no analog in std::ranges. To ease the transition to std::ranges,
precompute most of the work done in that function, and only convert
pointers to references in the function itself.
Since boost ranges and std::ranges don't fully interoperate, one of
the user has to be adjusted.
The schema module (everything in schema/) is supposed to be towards the
leafs in the ScyllaDB inter-module dependency graph. In other words, it
should not depend on many other modules. On the other hand, almost the
entire codebase depends on the schema module itself.
Currently there is a circular dependency between schema and
replica::database, as the latter is a required argument for
schema::describe(). This is bad, not just because of the dependency mess
it introduces, but also because now schema::describe() can only be used
by code which has a reference to the database handy.
This patch breaks this circular dependency, by introducing the
schema_describe_helper interface and providing an implementation for it
in database.hh.
There is another circular dependency: schema <-> replica::table. This is
not addressed by this patch.
Closesscylladb/scylladb#20893
Scylla doesn't allow for the types of arguments or the return type of a UDF
to be frozen. As a result, before these changes, create statements
produced to restore UDFs as part of `DESCRIBE` statements could not
be executed.
Fixesscylladb/scylladb#20256
Backport: necessary as the restore process may not work correctly without these changes. The affected versions span from 5.2 to the current master, but we only want to apply the fix to the live versions, so 6.0, 6.1, and 6.2.
Closesscylladb/scylladb#20816
* github.com:scylladb/scylladb:
cql3/functions/user_function: Print arguments and return type without frozen
cql3/functions/user_function: Use fmt to format create statement
Scylla doesn't allow for the types of arguments or the return type
to be frozen. As a result, before these changes, create statements
produced to restore UDFs as part of `DESCRIBE` statements could not
be executed.
We fix that and add a reproducer test and another one to verify that
the implementation is correct.
Tablets load balancer is unable to process more than a single pending
replica, thus ALTER tablets KS cannot accept an ALTER statement which
would result in creating 2+ pending replicas, hence it has to validate
if the sum of absoulte differences of RFs specified in the statement is
not greter than 1.
A bug has been discovered while trying to ALTER tablets KS and
specifying only 1 out of 2 DCs - the not specified DC's RF has been
zeroed. This is because ALTER tablets KS updated the KS only with the
RF-per-DC mapping specified in the ALTER tablets KS statement, so if a
DC was ommitted, it was assigned a value of RF=0.
This commit fixes that plus additionally passes all the KS options, not
only the replication options, to the topology coordinator, where the KS
update is performed.
`initial_tablets` is a special case, which requires a special handling
in the source code, as we cannot simply update old initial_tablet's
settings with the new ones, because if only ` and TABLETS = {'enabled':
true}` is specified in the ALTER tablets KS statement, we should not zero the `initial_tablets`, but
rather keep the old value - this is tested by the
`test_alter_preserves_tablets_if_initial_tablets_skipped` testcase.
Other than that, the above mentioned testcase started to fail with
these changes, and it appeared to be an issue with the test not waiting
until ALTER is completed, and thus reading the old value, hence the
test's body has been modified to wait for ALTER to complete before
performing validation.
The validation has been corrected with:
1. Checking if a DC specified in ALTER exists.
2. Removing `REPLICATION_STRATEGY_CLASS_KEY` key from a map of RFs that
needs their RFs to be validated.
This function assumed that strings passed as arguments will be of
integer types, but that wasn't the case, and we missed that because this
function didn't have any validation, so this change adds proper
validation and error logging.
Arguments passed to this function were forwarded from a call to
`ks_prop_defs::get_replication_options`, which, among rf-per-dc mapping, returns also
`class:replication_strategy` pair. Second pair's member has been casted
into an `int` type and somehow the code was still running fine, but only
extra testing added later discovered a bug in here.
ALTER tablets KS validated if RF is not changed by more than 1 for DCs
that already had replicas, but not for DCs that didn't have them yet, so
specifying an RF jump from 0 to 2 was possible when listing a new DC in
ALTER tablets KS statement, which violated internal invariants of
tablets load balancer.
This PR fixes that bug and adds a multi-dc testcases to check if adding
replicas to a new DC and removing replicas from a DC is honoring the RF
change constraints.
Refs: #20039
Before these changes, we could create a materialized
view specifying its ID, but the option was ignored.
This commit makes Scylla respect the option. Now specifying
the ID results in the MV being created with that specific ID.
This way, Scylla's behavior is consistent with Cassandra's.
Because Cassandra doesn't mention the option in its
user documentation, we don't update it either in case
the semantics of it changes in the future -- we want
to have an open door for any modifications.
Note that Cassandra returns a server error if the provided
ID is already in use, both in the case of regular tables
and MVs. That's most likely a bug. Instead of following that
behavior, we stay consistent with the current semantics of
creating a regular table in Scylla: if the provided ID is
already used, return an InvalidRequest.
The last thing worth pointing out is Cassandra handles
`WITH ID = null` as a special case; normally, specifying
an invalid ID results in a ConfigurationException, but a null
is treated as a syntax error. As in the previous paragraph,
we stay consistent with the semantics of regular tables and
all invalid IDs, null included, lead to a ConfigurationException.
We also add a few short tests verifying that the implementation
works as intended.
Fixesscylladb/scylladb#20616
Backport not needed: the semantics of the option was never documented in either Cassandra, or Scylla.
Closesscylladb/scylladb#20773
* github.com:scylladb/scylladb:
test/cql-pytest: Get rid of unnecessary processing describe statements
cql3: Make creating MV respect ID option
CI started reporting warnings about including `bytes.hh` in
several files. The reason is they actually only use code
introduced in `bytes_fwd.hh` (which is also included by `bytes.hh`).
Clang-include-cleaner suggests that we get rid of that indirection
and only include `bytes_fwd.hh`. That's what happens in this commit.
We include `bytes.hh` in `exceptions/exceptions.cc` because
it relies on the formatting utilities declared and defined
in `bytes.hh`.
Closesscylladb/scylladb#20842
Before these changes, we could create a materialized
view specifying its ID, but the option was ignored.
This commit makes Scylla respect the option. Now specifying
the ID results in the MV being created with that specific ID.
This way, Scylla's behavior is consistent with Cassandra's.
Because Cassandra doesn't mention the option in its
user documentation, we don't update it either in case
the semantics of it changes in the future -- we want
to have an open door for any modifications.
Note that Cassandra returns a server error if the provided
ID is already in use, both in the case of regular tables
and MVs. That's most likely a bug. Instead of following that
behavior, we stay consistent with the current semantics of
creating a regular table in Scylla: if the provided ID is
already used, return an InvalidRequest.
The last thing worth pointing out is Cassandra handles
`WITH ID = null` as a special case; normally, specifying
an invalid ID results in a ConfigurationException, but a null
is treated as a syntax error. As in the previous paragraph,
we stay consistent with the semantics of regular tables and
all invalid IDs, null included, lead to a ConfigurationException.
We also add a few short tests verifying that the implementation
works as intended.
Using the standard library is preffered over boost.
In cql3/expr/expression.cc to_sorted_vector got more of a
face-list and was modernized to use also std::unique
and while at it, to move its input range in the uniquely sorted
result vector.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
'static inline' is always wrong in headers - if the same header is
included multiple times, and the function happens not to be inlined,
then multiple copies of it will be generated.
Fix by mechanically changing '^static inline' to 'inline'.
There are two bits that control whenter replication strategy for a
keyspace will use tablets or not -- the configuration option and CQL
parameter. This patch tunes its parsing to implement the logic shown
below:
if (strategy.supports_tablets) {
if (cql.with_tablets) {
if (cfg.enable_tablets) {
return create_keyspace_with_tablets();
} else {
throw "tablets are not enabled";
}
} else if (cql.with_tablets = off) {
return create_keyspace_without_tablets();
} else { // cql.with_tablets is not specified
if (cfg.enable_tablets) {
return create_keyspace_with_tablets();
} else {
return create_keyspace_without_tablets();
}
}
} else { // strategy doesn't support tablets
if (cql.with_tablets == on) {
throw "invalid cql parameter";
} else if (cql.with_tablets == off) {
return create_keyspace_without_tablets();
} else { // cql.with_tablets is not specified
return create_keyspace_without_tablets();
}
}
closes: #20088
In order to enable tablets "by default" for NetworkTopologyStrategy
there's explicit check near ks_prop_defs::get_initial_tablets(), that's
not very nice. It needs more care to fix it, e.g. provide feature
service reference to abstract_replication_strategy constructor. But
since ks_prop_defs code already highjacks options specifically for that
strategy type (see prepare_options() helper), it's OK for now.
There's also #20768 misbehavior that's preserved in this patch, but
should be fixed eventually as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#20779
It doesn't need it apart from a forward declaration.
Files that lost necessary includes are adjusted, and some users
of auth_version_t are redirected to the definition outside system_keyspace.
A global index has a primary key of the form
(indexed_column, token, partition_key_column..., clustering_key_column...)
The primary key columns are used to point at the base table row, and
the token (computed as token(partition_key_column...) is used to maintain
sort order.
The query planner has an optimization: if the partition key is fully
constrained to a unique value, then we compute the token from the partition
key and use that to seek directly into the clustering row range for
that base table partition. If the clustering key is also partially
constrained, it is used to refine the index clustering key.
Currently, this optimization is implemented as a hack: the partition key
is extracted from the prepared statement + query options in
get_global_index_token_clustering_ranges(), then used to calculate
the token, which is then substituted in the expression passed to
get_single_column_clustering_bounds() (the expression is shared across
all running queries, so this is quite dangerous).
We simplify the whole thing:
- Let prepare_index_global() recognize that if the partition key is not
fully constrained, then there is no way that we'll be able to compute
the token (as it needs all partition key columns). Since the token
is the first clustering key column of the index table, we can truncate
it to length zero and bail out.
- Otherwise, the partition key is fully constrained. We refactor the
predicate (pk1 = :a AND pk2 = :b) to (pk1, pk2) := (:a, :b). We then
pass expressions representing the partition key to the token function,
ending up with token(:a, :b). We then substitute this expression into
(*_idx_tbl_ck_prefix)[0], which computes the first clustering key
column for the index table.
- Remove the runtime component in get_global_index_clustering_ranges().
Note this include the early return if the partition key wasn't fully
constrained (though the comment only mentions over-constraining), and
the token computation, which is now done by evaluate().
Closesscylladb/scylladb#20733
Auth has been managed via Raft since Scylla 6.0. Restoring data
following the usual procedure (1) is error-prone and so a safer
method must have been designed and implemented. That's what
happens in this PR.
We want to extend `DESC SCHEMA` by auth and service levels
to provide a safe way to backup and restore those two components.
To realize that, we change the meaning of `DESC SCHEMA WITH INTERNALS`
and add a new "tier": `DESC SCHEMA WITH INTERNALS AND PASSWORDS`.
* `DESC SCHEMA` -- no change, i.e. the statement describes the current
schema items such as keyspaces, tables, views, UDTs, etc.
* `DESC SCHEMA WITH INTERNALS` -- does the same as the previous tier
and also describes auth and service levels. No information about
passwords is returned.
* `DESC SCHEMA WITH INTERNALS AND PASSWORDS` -- does the same
as the previous tier and also includes information about the salted
hashes corresponding to the passwords of roles.
To restore existing roles, we extend the `CREATE ROLE` statement
by allowing to use the option `WITH SALTED HASH = '[...]'`.
---
Implementation strategy:
* Add missing things/adjust existing ones that will be used later.
* Implement creating a role with salted hash.
* Add tests for creating a role with salted hash.
* Prepare for implementing describe functionality of auth and service levels.
* Implement describe functionality for elements of auth and service levels.
* Extend the grammar.
* Add tests for describe auth and service levels.
* Add/update documentation.
---
(1): https://opensource.docs.scylladb.com/stable/operating-scylla/procedures/backup-restore/restore.html
In case the link stops working, restoring a schema was realised
by managing raw files on disk.
Fixesscylladb/scylladb#18750Fixesscylladb/scylladb#18751Fixesscylladb/scylladb#20711Closesscylladb/scylladb#20168
* github.com:scylladb/scylladb:
docs: Update user documentation for backup and restore
docs/dev: Add documentation for DESC SCHEMA
test: Add tests for describing auth and service levels
cql3/functions/user_function: Remove newline character before and after UDF body
cql3: Implement DESCRIBE SCHEMA WITH INTERNALS AND PASSWORDS
auth: Implement describing auth
auth/authenticator: Add member functions for querying password hash
service/qos/service_level_controller: Describe service levels
data_dictionary: Remove keyspace_element.hh
treewide: Start using new overloads of describe
treewide: Fix indentation in describe functions
treewide: Return create statement optionally in describe functions
treewide: Add new describe overloads to implementations of data_dictionary::keyspace_element
treewide: Start using schema::ks_name() instead of schema::keyspace_name()
cql3: Refactor `description`
cql3: Move description to dedicated files
test: Add tests for `CREATE ROLE WITH SALTED HASH`
cql3/statements: Restrict CREATE ROLE WITH SALTED HASH
auth: Allow for creating roles with SALTED HASH
types: Introduce a function `cql3_type_name_without_frozen()`
cql3/util: Accept std::string_view rather than const sstring&
We add documentation for developers addressing
`DESCRIBE SCHEMA`. It covers the following aspects
of it:
* motivation,
* synopsis of the solution,
* implementation of the solution,
as well as a few subsections explaining the details:
* restoring process and its side effects,
* restoring roles with passwords,
* list of statements generated by `DESC SCHEMA`
with examples,
* implementation details.
We remove newline characters that are printed before and after
a UDF's body. This way, we want to keep the create statement
as close to what was actually provided as possible. Although
there should be no semantic differences with or without the
newline characters, it's a lot more convenient in testing when
they're not present.
Fixesscylladb/scylladb#20711
When executing `DESC SCHEMA WITH INTERNALS`, Scylla now also returns
statements that can be used to recreate service levels and restore
the state of auth. That encompasses granting roles and permissions
as well as attaching service levels to roles.
If the additional parameter `WITH PASSWORDS` is provided,
the statements corresponding to recreating roles in the system
will also contain the stored salted hashes.
This change allows the user to fully set the page size for the query.
There's still an internal hard-limit of 1MB anyway, so there's no need
to limit it to our default value (because using a larger page size might
be a query optimization sometimes)
Fixes#20612Closesscylladb/scylladb#20692
Move all of the blatantly restriction-related expression utilities
to statement_restrictions.cc.
Some are so blatant as to include the word "restriction" in their name.
Others are just so specialized that they cannot be used for anything else.
The motivation is that further refactoring will be simplified if it can
happen within the same module, as there will not be a need to prove
it has no effect elsewhere.
Most of the declarations are made non-public (in .cc file) to limit
proliferation. A few are needed for tests or in select_statement.cc
and so are kept public.
Other than that, the only changes are namespace qualifications and
removal of a now-duplicate definition ("inclusive").
Closesscylladb/scylladb#20732
The interface is not used anywhere anymore, so we can
remove it safely. It has been replaced by custom
functions for each keyspace element and `cql3::description`.
We continue removing `data_dictionary::keyspace_element`.
In this commit, we start using the overloads returning
`cql3::description` in places where the methods specified
by `data_dictionary::keyspace_element` were used.
We add a new parameter in functions used to generate instances
of `cql3::description` for types related to situations where we
might not need a create statement. An example of such a scenario
could be `DESCRIBE TYPES`.
We're removing `data_dictionary::keyspace_element`.
Before we can do that, we need to substitute the existing
methods used for describing keyspace elements with their
new versions returning `cql3::description`.
That's what happens in this commit.
In these changes, we describe the purpose of the type
and make it reusable for other parts of the code.
That includes ditching the existing constructors,
leaving the formatting of its fields to the user
of the interface.
The removed constructors have been replaced by
free functions so that existing code can still
use them the way it did before.
We move the declaration of `description` to dedicated files
to be able to create instances of it from other parts of
the code.
`describe_statement.cc` has been functioning as an
intermediary between objects that can be described
and the end user. It will still perform that duty,
but we want to let other modules be able to generate
descriptions on their own, without having to share
an additional layer of abstraction in form of types
inheriting from `data_dictionary::keyspace_element`.
Those types may not perform any other function than
that and thus may be redundant.
Adjusting `description` to its new purpose will happen
in an upcoming commit.
We start requiring that the user issuing `CREATE ROLE
WITH SALTED HASH` be a superuser. The rationale for
that is the statement directly modifies a system
tables, circumventing the hashing algorithm.
Additionally, we correct a possible existing problem.
`_options.is_superuser` in `create_role_statement`
may be an empty optional, so dereferencing it
without a prior check could lead to undefined
behavior in the future.
We introduce a way to create a role with explictly
provided salted hash.
The algorithm for creating a role with a password works
like this:
1. The user issues a statement `CREATE ROLE <role> WITH
PASSWORD = '<password>' <...>`.
2. Scylla produces a hash based on the value of
`<password>`.
3. Scylla puts the produced hash in `system.roles`,
in the column `salted_hash`.
The newly introduced way to create a role is based
on a new form of the create statement:
`CREATE ROLE <role> WITH SALTED HASH = '<salted_hash>`
The difference in the algorithm used for processing
this statement is that we insert `<salted_hash>`
into `system.roles` directly, without hashing it.
The rationale for introducing this new statement is that
we want to be able to restore roles. The original password
isn't stored anywhere in the database (as intended),
so we need to rely on the column `salted_hash`.
Allow to specify service level used in select statement `SELECT ... USING SERVICE LEVEL sl_name`.
In OSS, this only affects statement's timeout.
In case both service level and timeout are specified `SELECT ... USING SERVICE LEVEL sl_name AND TIMEOUT 1h`, the timeout has higher priority as statement's timeout.
Fixesscylladb/scylladb#18471Closesscylladb/scylladb#20523
* github.com:scylladb/scylladb:
test/cql-pytest: add test for `SELECT ... USING SERVICE LEVEL`
cql3/Cql.g: extend grammar to allow `SELECT ... USING SERVICE LEVEL`
cql3/statements/select_statement: use service level timeout
cql3/attributes: add service level name field
qos/service_level_controller: add method to check if service level exists in cache
The statement_restrictions class started life in the object-oriented style - an
object that interacts with its environment via mutators and is observed via
observers.
This is however not suitable for its objective: to analyze the WHERE clause,
select a query plan, and partition the WHERE clause atoms to the various
parts demanded by the query plan (read_command and filters). Furthermore,
the object oriented style makes it hard to work with as you can only call some
observers after the related mutators were called.
Fix this by transforming the code info a more functional style: we call
a function that returns an immutable statement_restrictions object that
can only be observed. This makes it easier to further change in the future,
as changes will not have to consider interaction with the environment.
No backport as this is a refactoring
Closesscylladb/scylladb#20672
* github.com:scylladb/scylladb:
cql3: statement_restrictions: use functional style
cql3: statement_restrictions: calculate the index only once
cql3: statement_restrictions: make it a const object
During a query execution, the query can be re-bounced to another shard if the requested data is located there. Previous implementation assumed that the shard cannot be changed after first re-bounce, however with the introduction of Tablets, data could be migrated to another shard after the query was already re-bounced, causing a failure of the query execution. To avoid this issue, the query is re-bounced as needed until it is executed on the correct shard.
Fixes#15465Closesscylladb/scylladb#20493
* github.com:scylladb/scylladb:
cql_server: Add a test for multiple query msg rebounces.
cql_server::connection: process: rebounce msg if needed
cql_server::connection: process: co-routinize connection::process_on_shard
cql_server: connection: process: fixup indentation
cql_server: connection: process_on_shard: drop permit parameter
transport: server: pass bounce_to_shard as foreign shared ptr
cql_server: connection: process: add template concept for process_fn
cql_server: move process_fn_return_type to class definition
Instead of a constructor, use a new function
analyze_statement_restrictions() as the entry point. It returns an
immutable statement_restrictions object.
This opens the door to returning a variant, with each arm of the variant
corresponding to a different query plan.