Integrates audit functionality into CQL statement processing to enable tracking of database operations. Key changes:
- Add audit_info and statement_category to all CQL statements
- Implement audit categories for different statement types:
- DDL: Schema altering statements (CREATE/ALTER/DROP)
- DML: Data manipulation (INSERT/UPDATE/DELETE/TRUNCATE/USE)
- DCL: Access control (GRANT/REVOKE/CREATE ROLE)
- QUERY: SELECT statements
- ADMIN: Service level operations
- Add audit inspection points in query processing:
- Before statement execution
- After access checks
- After statement completion
- On execution failures
- Add password sanitization for role management statements
- Mask plaintext passwords in audit logs
- Handle both direct password parameters and options maps
- Preserve query structure while hiding sensitive data
- Modify prepared statement lifecycle to carry audit context
- Pass audit info during statement preparation
- Track audit info through statement execution
- Support batch statement auditing
This change enables comprehensive auditing of CQL operations while ensuring sensitive data is properly masked in audit logs.
This patch introduces the skip_when_empty flag to all CQL counters that
previously lacked this setting.
The skip_when_empty flag is a metric optimization that prevents
reporting on counters that have never been used. Once a counter has been
used (i.e., it holds a positive value), it will continue to be reported
consistently from that point onward.
Fixes#21046
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Closesscylladb/scylladb#21565
instead of making a copy of the warnings vector, make the warnings a non const in prepared_statement
and move the warnings vector to execute_maybe_with_guard
Closesscylladb/scylladb#20361Closesscylladb/scylladb#21083
Our "sstring_view" is an historic alias for the standard std::string_view.
The cql3/ directory used this old alias in a few of random places, let's
change them to use the standard type name.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
A dialect is a different way to interpret the same CQL statement.
Examples:
- how duplicate bind variable names are handled (later in this series)
- whether `column = NULL` in LWT can return true (as is now) or
whether it always returns NULL (as in SQL)
Currently, dialect is an empty structure and will be filled in later.
It is passed to query_processor methods that also accept a CQL string,
and from there to the parser. It is part of the prepared statement cache
key, so that if the dialect is changed online, previous parses of the
statement are ignored and the statement is prepared again.
The patch is careful to pick up the dialect at the entry point (e.g.
CQL protocol server) so that the dialect doesn't change while a statement
is parsed, prepared, and cached.
The hint contains information related to what exactly changed, allowing
listeners to do partial updates, instead of reloading all metadata on
each notification.
forward_service is nondescriptive and misnamed, as it does more than
forward requests. It's a classic map/reduce algorithm (and in fact one
of its parameters is "reducer"), so name it accordingly.
The name "forward" leaked into the wire protocol for the messaging
service RPC isolation cookie, so it's kept there. It's also maintained
in the name of the logger (for "nodetool setlogginglevel") for
compatibility with tests.
Closesscylladb/scylladb#19444
Said method has a func parameter (called just f), which it receives as
rvalue ref and just uses as a reference. This means that if caller
doesn't keep the func alive, for_each_cql_result() will run into
use-after-free after the first suspention point. This is unexpected for
callers, who don't expect to have to keep something alive, which they
passed in with std::move().
Adjust the signature to take a value instead, value parameters are moved
to the coro frame and survive suspention points.
Adjust internal callers (query_internal()) the same way.
There are no known vulnerable external callers.
this change was created in the same spirit of ebff5f5d.
despite that we include Seastar as a submodule, Seastar is not a
part of scylla project. so we'd better include its headers using
brackets.
ebff5f5d addressed this cosmetic issue a while back. but probably
clangd's header-insertion helped some of contributor to insert
the missing headers with `"`. so this style of `include` returned
to the tree with these new changes.
unfortunately, clangd does not allow us to configure the style
of `include` at the time of writing.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#19406
thrift support was deprecated since ScyllaDB 5.2
> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.
so let's drop it. in this change,
* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is preserved for backward compatibility, as we could load from an existing system.local table which still contains this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for backward compatibility with java-based nodetool.
Fixes#3811Fixes#18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
- [x] not a fix, no need to backport
Closesscylladb/scylladb#18453
* github.com:scylladb/scylladb:
config: expand on rpc_keepalive's description
api: s/rpc/thrift/
db/system_keyspace: drop thrift_version from system.local table
transport: do not return client_type from cql_server::connection::make_client_key()
treewide: drop thrift support
cql3: always return created event in create ks/table/type/view statement
In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.
Similar situation occurs with other schema change statements.
In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.
Fixes https://github.com/scylladb/scylladb/issues/16909
**backport: no, it's not a regression**
Closesscylladb/scylladb#18819
* github.com:scylladb/scylladb:
cql3: always return created event in create ks/table/type/view statement
cql3: auth: move auto-grant closer to resource creation code
cql3: extract create ks/table/type/view event code
And, while at it, rename local variable to refer to it to as "manager"
not "wasm". Query processor and database also have getters named
"wasm()", these are not renamed yet to keep patch smaller (and those
getters are going to be reworked further anyway).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.
Similar situation occurs with other schema change statements.
In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.
This should reduce the risk of re-introducing issue similar to
the one fixed in ab6988c52f
When grant code is closer to actual creation code (announcing mutations)
there is lower chance of those two effects being triggered differently,
if we ever call grant_permissions_to_creator and not announce mutations
that's very likely a security vulnerability.
Additionally comment was rewritten to be more accurate.
thrift support was deprecated since ScyllaDB 5.2
> Thrift API - legacy ScyllaDB (and Apache Cassandra) API is
> deprecated and will be removed in followup release. Thrift has
> been disabled by default.
so let's drop it. in this change,
* thrift protocol support is dropped
* all references to thrift support in document are dropped
* the "thrift_version" column in system.local table is
preserved for backward compatibility, as we could load
from an existing system.local table which still contains
this clolumn, so we need to write this column as well.
* "/storage_service/rpc_server" is only preserved for
backward compatibility with java-based nodetool.
* `rpc_port` and `start_rpc` options are preserved, but
they are marked as "Unused". so that the new release
of scylladb can consume existing scylla.yaml configurations
which might contain these settings. by making them
deprecated, user will be able get warned, and update
their configurations before we actually remove them
in the next major release.
Fixes#3811Fixes#18416
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The main theme of this commit is executing drop
keyspace/table/aggregate/function statements in a single
transaction together with auth auto-revoke logic.
This is the logic which cleans related permissions after
resource is deleted.
It contains serveral parts which couldn't easily be split
into separate commits mainly because mutation collector related
paths can't be mixed together. It would require holding multiple
guards which we don't support. Another reason is that with mutation
collector the changes are announced in a single place, at the end
of statement execution, if we'd announce something in the middle
then it'd lead to raft concurrent modification infinite loop as it'd
invalidate our guard taken at the begining of statement execution.
So this commit contains:
- moving auto-revoke code to statement execution from migration_listener
* only for auth-v2 flow, to not break the old one
* it's now executed during statement execution and not merging schemas,
which means it produces mutations once as it should and not on each
node separately
* on_before callback family wasn't used because I consider it much
less readable code. Long term we want to remove
auth_migration_listener.
- adding mutation collector to revoke_all
* auto-revoke uses this function so it had to be changed,
auth::revoke_all free function wrapper was added as cql3
layer should not use underlying_authorizer() directly.
- adding mutation collector to drop_role
* because it depends on revoke_all and we can't mix old and new flows
* we need to switch all functions auth::drop_role call uses
* gradual use of previously introduced modify_membership, otherwise
we would need to switch even more code in this commit
This is done to achieve single transaction semantics.
The change includes auto-grant feature. In particular
for schema related auto-grant we don't use normal
mutation collector announce path but follow migration manager,
this may be unified in the future.
This effectively removes "finally" block so if
authorized_prepared_cache.stop() resolves with exception, the
prepared_cache.stop() is skipped. But that's not a problem -- even if
.stop() throws the shole scylla stop aborts so we don't really care if
it was clean or not.
Also, authorized_prepared_cache.stop() closes the gate and cancels the
timer. None of those can resolve with exception.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#19001
Up until now we waited until mutations are in place and then returned
directly to the caller of the ALTER statement, but that doesn't imply
that tablets were deleted/created, so we must wait until the whole
processing is done and return only then.
This commit adds support for executing ALTER KS for keyspaces with
tablets and utilizes all the previous commits.
The ALTER KS is handled in alter_keyspace_statement, where a global
topology request in generated with data attached to system.topology
table. Then, once topology state machine is ready, it starts to handle
this global topology event, which results in producing mutations
required to change the schema of the keyspace, delete the
system.topology's global req, produce tablets mutations and additional
mutations for a table tracking the lifetime of the whole req. Tracking
the lifetime is necessary to not return the control to the user too
early, so the query processor only returns the response while the
mutations are sent.
With current implementation only 1 global topo req can be executed at a
time, so when ALTER KS is executed, we'll have to check if any other
global topo req is ongoing and fail the req if that's the case.
In order to correctly restore schema from `DESC SCHEMA WITH INTERNALS`, we need a way to drop a column with a timestamp in the past.
Example:
- table t(a int pk, b int)
- insert some data1
- drop column b
- add column b int
- insert some data2
If the sstables weren't compacted, after restoring the schema from description:
- we will loss column b in data2 if we simply do `ALTER TABLE t DROP b` and `ALTER TABLE t ADD b int`
- we will resurrect column b in data1 if we skip dropping and re-adding the column
Test for this: https://github.com/scylladb/scylla-dtest/pull/4122Fixes#16482Closesscylladb/scylladb#18115
* github.com:scylladb/scylladb:
docs/cql: update ALTER TABLE docs
test/cqlpytest: add test for prepared `ALTER TABLE ... DROP ... USING TIMESTAMP ?`
test/cql-pytest: remove `xfail` from alter table with timestamp tests
cql3/statements: extend `ALTER TABLE ... DROP` to allow specifying timestamp of column drop
cql3/statements: pass `query_options` to `prepare_schema_mutations()`
cql3/statements: add bound terms to alter table statement
cql3/statements: split alter_table_statement into raw and prepared
schema: allow to specify timestamp of dropped column
We have a concurrent modification conflict in tests and suspect
duplicated requests but since we don't log successful requests
we have no way to verify if that's the case. get_mutations_internal log
will help to tell wchich nodes are trying to push auth or
service levels mutations into raft.
Refs scylladb/scylladb#18319Closesscylladb/scylladb#18413
To migrate service levels to be raft managed, obtain `group0_guard` to
be able to pass it to service_level_controller's methods.
Using this mechanism also automatically provides retries in case of
concurrent group0 operation.
To make table modifications go via raft we need to publish
mutations. Currently many system tables (especially auth) use
CQL to generate table modifications. Added function is a missing
link which will allow to do a seamless transition of certain
system tables to raft.
Because we'll be doing group0 operations we need to run on shard 0. Additional benefit
is that with needs_guard set query_processor will also do automatic retries in case of
concurrent group0 operations.
When a base table changes and altered, so does the views that might
refer to the added column (which includes "SELECT *" views and also
views that might need to use this column for rows lifetime (virtual
columns).
However the query processor implementation for views change notification
was an empty function.
Since views are tables, the query processor needs to at least treat them
as such (and maybe in the future, do also some MV specific stuff).
This commit adds a call to `on_update_column_family` from within
`on_update_view`.
The side effect true to this date is that prepared statements for views
which changed due to a base table change will be invalidated.
Fixes#16392
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>
Add overloads for execute_internal and friends
accepting a vector of optional<data_value>.
The caller can pass nullopt for any unset value.
The vector of optionals is translated internally to
`cql3::raw_value_vector_with_unset` by `make_internal_options`.
This path will be called by system_keyspace::update_peer_info
for updating a subset of the system.peers columns.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
data_value_list is a wrapper around std::initializer_list<data_value>.
Use it for passing values to `cql3::query_processor::execute_internal`
and friends.
A following path will add a std::variant for data_value_or_unset
and extend data_value_list to support unset values.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Replacing `restrict_replication_simplestrategy` config option with
2 config options: `replication_strategy_{warn,fail}_list`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on replication strategy when creating/altering a keyspace.
The reason to rather replace than extend `restrict_replication_simplestrategy` config
option is that it was not used and we wanted to generalize it.
Only soft guardrail is enabled by default and it is set to SimpleStrategy,
which means that we'll generate a CQL warning whenever replication strategy
is set to SimpleStrategy. For new cloud deployments we'll move
SimpleStrategy from warn to the fail list.
Guardrails violations will be tracked by metrics.
Resolves#5224
Refs #8892 (the replication strategy part, not the RF part)
Closesscylladb/scylladb#15399
This series mark multiple high cardinality counters with skip_when_empty flag.
After this patch the following counters will not be reported if they were never used:
```
scylla_transport_cql_errors_total
scylla_storage_proxy_coordinator_reads_local_node
scylla_storage_proxy_coordinator_completed_reads_local_node
scylla_transport_cql_errors_total
```
Also marked, the CAS related CQL operations.
Fixes#12751Closesscylladb/scylladb#13558
* github.com:scylladb/scylladb:
service/storage_proxy.cc: mark counters with skip_when_empty
cql3/query_processor.cc: mark cas related metrics with skip_when_empty
transport/server.cc: mark metric counter with skip_when_empty
Replacing `minimum_keyspace_rf` config option with 4 config options:
`{minimum,maximum}_replication_factor_{warn,fail}_threshold`, which
allow us to impose soft limits (issue a warning) and hard limits (not
execute CQL) on RF when creating/altering a keyspace.
The reason to rather replace than extend `minimum_keyspace_rf` config
option is to be aligned with Cassandra, which did the same, and has the
same parameters' names.
Only min soft limit is enabled by default and it is set to 3, which means
that we'll generate a CQL warning whenever RF is set to either 1 or 2.
RF's value of 0 is always allowed and means that there will not be any
replicas on a given DC. This was agreed with PM.
Because we don't allow to change guardrails' values when scylla is
running (per PM), there're no tests provided with this PR, and dtests will be
provided separately.
Exceeding guardrails' thresholds will be tracked by metrics.
Resolves#8619
Refs #8892 (the RF part, not the replication-strategy part)
Closes#14262
This patch mark the conditional metrics counters with skip_when_empty
flag, to reduce metrics reporting when cas is not in used.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.
Fixes: #13942
Message-ID: <ZNsynXayKim2XAFr@scylladb.com>
This reverts commit 70b5360a73. It generates
a failure in group0_test .test_concurrent_group0_modifications in debug
mode with about 4% probability.
Fixes#15050
Currently we hold group0_guard only during DDL statement's execute()
function, but unfortunately some statements access underlying schema
state also during check_access() and validate() calls which are called
by the query_processor before it calls execute. We need to cover those
calls with group0_guard as well and also move retry loop up. This patch
does it by introducing new function to cql_statement class take_guard().
Schema altering statements return group0 guard while others do not
return any guard. Query processor takes this guard at the beginning of a
statement execution and retries if service::group0_concurrent_modification
is thrown. The guard is passed to the execute in query_state structure.
Fixes: #13942
Message-ID: <ZNSWF/cHuvcd+g1t@scylladb.com>
The system.group0_history table provides useful descriptions
for each command committed to Raft group 0. One way of applying
a command to group 0 is by calling migration_manager::announce.
This function has the description parameter set to empty string
by default. Some calls to announce use this default value which
causes null values in system.group0_history. We want
system.group0_history to have an actual description for every
command, so we change all default descriptions to reasonable ones.
We can't provide a reasonable description to announce in
query_processor::execute_thrift_schema_command because this
function is called in multiple situations. To solve this issue,
we add the description parameter to this function and to
handler::execute_schema_command that calls it.
When the q.p. stops it also "stops" the wasm manager. Move this call
into main. The cql test env doesn't need this change, it stops the whole
sharded service which stops instances on its own
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The wasm::manager is just cql3::wasm_context renamed. It now sits in
lang/wasm* and is started as a sharded service in main (and cql test
env). This move also needs some headers shuffling, but it's not severe
This change is required to make it possible for the wasm::manager to be
shared (by reference) between q.p. and replica::database further
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are three wasm-only fields on q.p. -- engine, cache and runner.
This patch groups them on a single wasm_context structure to make it
earier to manipulate them in the next patches
The 'friend' declaration it temporary, will go away soon
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
After changing the prepare_ methods of migration_manager to
functions, the migration_manager& parameter of
query_processor::execute_thrift_schema_command and
thrift::handler::execute_schema_command (that calls
query_processor::execute_thrift_schema_command) has been unused.