these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
In the current scenario, topology_change_kind variable, was been handled using
_manage_topology_change_kind_from_group0 variable. This method was brittle
and had some bugs(e.g. for restart case, it led to a time gap between group0
server start and topology_change_kind being managed via group0)
Post _manage_topology_change_kind_from_group0 removal, careful management of
topology_change_kind variable was needed for maintaining correct
topology_change_kind in all scenarios. So this PR also performs a refactoring
to populate all init data to system tables even before group0 creation(via
raft_initialize_discovery_leader function). Now because raft_initialize_discovery_leader
happens before the group 0 creation, we write mutations directly to system
tables instead of a group 0 command. Hence, post group0 creation, the node
can read the correct values from system tables and correct values are
maintained throughout.
Added a new function initialize_done_topology_upgrade_state which takes
care of updating the correct upgrade state to system tables before starting
group0 server. This ensures that the node can read the correct values from
system tables and correct values are maintained throughout.
By moving raft_initialize_discovery_leader logic to happen before starting
group0 server, and not as group0 command post server start, we also get rid
of the potential problem of init group0 command not being the 1st command on
the server. Hence ensuring full integrity as expected by programmer.
Fixes: scylladb/scylladb#21114
Replace remaining uses of boost::adaptors::transformed with std::views::transform
to reduce Boost dependencies, following the migration pattern established in
bab12e3a. This change addresses recently merged code that reintroduced Boost
header dependencies through boost::adaptors::transformed usage.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22365
Introduces a comprehensive audit system to track database operations for security
and compliance purposes. This change includes:
Core Components:
- New audit subsystem for logging database operations
- Service level integration for proper resource management
- CQL statement tracking with operation categories
- Login process integration for tenant management
Key Features:
- Configurable audit logging (syslog/table)
- Operation categorization (QUERY/DML/DDL/DCL/AUTH/ADMIN)
- Selective auditing by keyspace/table
- Password sanitization in audit logs
- Service level shares support (1-1000) for workload prioritization
- Proper lifecycle management and cleanup
I ran the dtests for audit (manually enabled) and they pass.
The in-repo tests pass.
Notably, there should be no non-whitespace changes between this and scylla-enterprise
Fixesscylladb/scylla-enterprise#4999Closesscylladb/scylladb#22147
* github.com:scylladb/scylladb:
audit: Add shares support to service level management
audit: Add service level support to CQL login process
audit: Add support to CQL statements
audit: Integrate audit subsystem into Scylla main process
audit: Add documentation for the audit subsystem
audit: Add the audit subsystem
Introduces shares-based workload prioritization for service levels, allowing
fine-grained control over resource allocation between tenants. Key changes:
- Add shares option to service level configuration:
- Valid range: 1-1000 shares
- Default value: 1000 shares
- Enterprise-only feature gated by WORKLOAD_PRIORITIZATION feature flag
- Extend CQL interface:
- Add shares parameter to CREATE/ALTER SERVICE_LEVEL
- Add shares column to system_distributed.service_levels
- Add percentage calculation to LIST SERVICE_LEVELS
- Add shares to DESCRIBE EFFECTIVE SERVICE_LEVEL output
- Add validation:
- Enforce shares range (1-1000)
- Validate enterprise feature flag
- Handle unset/delete markers properly
- Update service level statements:
- Add shares validation to CREATE/ALTER operations
- Preserve shares through default value replacement
- Add proper decomposition for shares values in result sets
This change enables operators to control relative resource allocation between
tenants using proportional share scheduling, while maintaining backward
compatibility with existing service level configurations.
This change integrates service level functionality into the CQL authentication and connection handling:
- Add scheduling_group_name to client_data to track service level assignments
- Extend SASL challenge interface to expose authenticated username
- Modify connection processing to support tenant switching:
- Add switch_tenant() method to handle scheduling group changes
- Add process_until_tenant_switch() to handle request processing boundaries
- Implement no_tenant() default executor
- Add execute_under_tenant_type for scheduling group management
- Update connection lifecycle to properly handle service level changes:
- Initialize connections with default scheduling group
- Support dynamic scheduling group updates when service levels change
- Ensure proper cleanup of scheduling group assignments
The changes enable proper scheduling group assignment and management based on
authenticated users' service levels, while maintaining backward compatibility
for connections without service level assignments.
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
now that we are allowed to use C++23. we now have the luxury of using
`std::views::reverse`.
- replace `boost::adaptors::transformed` with `std::views::transform`
- remove unused `#include <boost/range/adaptor/reversed.hpp>`
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Replace usages of `boost::algorithm::join()` with `fmt::join()` to improve
performance and reduce dependency on Boost. `fmt::join()` allows direct
formatting of ranges and tuples with custom separators without creating
intermediate strings.
When formatting comma-separated values into another string, fmt::join()
avoids the overhead of temporary string creation that
`boost::algorithm::join()` requires. This change also helps streamline
our dependencies by leveraging the existing fmt library instead of
Boost.Algorithm.
To avoid the ambiguity, some caller sites were updated to call
`seastar::format()` explicitly.
See also
- boost::algorithm::join():
https://www.boost.org/doc/libs/1_87_0/doc/html/string_algo/reference.html#doxygen.join_8hpp
- fmt::join():
https://fmt.dev/11.0/api/#ranges-api
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22082
Now, the CREATE statements generated for each service level by the
DESCRIBE SCHEMA WITH INTERNALS statement will account for the service
level's shares.
Now, when the user logs in and the connection becomes authenticated, the
processing loop of the connection is switched to the scheduling group
that corresponds to the service level assigned to the logged in user.
The scheduling group is also updated when the service level assigned to
this user changes.
Starting from this commit, the scheduling groups managed by the service
level controller are actually being used by user workload.
In order to make sure that the scheduling group carries over RPC, and
also to prevent priority inversion issues between different service
levels, modify the messaging service to use separate RPC connections for
each service level in order to serve user traffic.
The above is achieved by reusing the existing concept of "tenants" in
messaging service: when a new service level (or, more accurately,
service-level specific scheduling group) is first used in an RPC, a
new tenant is created.
In addition, extend the service level controller to be able to quickly
look up the service level name of the currently active scheduling group
in order to speed up the logic for choosing the tenant.
Replace the reader concurrency semaphores for user reads and view
updates with the newly introduced reader concurrency semaphore group,
which assigns a semaphore for each service level.
Each group is statically assigned to some pool of memory on startup and
dynamically distribute this memory between the semaphores, relative to
the number of shares of the corresponding scheduling group.
The intent of having a separate reader concurrency semaphore for each
scheduling group is to prevent priority inversion issues due to reads
with different priorities waiting on the same semaphore, as well as make
memory allocation more fair between service levels due to the adjusted
number of shares.
Introduce the core logic of workload prioritization, responsible for
assigning scheduling groups to service levels.
The service level controller maintains a pool of scheduling groups for
the currently present service levels, as well as a pool of unused
scheduling groups which were previously used by some service level that
was deleted during node's lifetime.
When a new service level is created, the SL controller either assigns a
scheduling group from the unused SG pool, or creates a new one if the
pool is empty. The scheduling group is renamed to "sl:<scheduling group
name>".
When updating shares of a service level (and also when creating a new
service level), the shares of the corresponding scheduling group are
synchronized with those of the service level.
When a service level is deleted, its group is released to the
aforementioned pool of unused scheduling groups and the prefix of its
name is changed from "sl:" to "sl_deleted:".
For now, these scheduling groups are not used by any user operations.
This will be changed in subsequent commits.
Add service level shares related fields to service_level_options and
slo_effective_names structs, and adjust the existing methods of the
former (merge_with, init_effective_names) to account for them.
The service levels table is queried with a `SELECT * ...` query, by
using the `execute_internal` method which prepares and caches the query
in an special cache for internal queries, separate from the user query
cache.
During rolling upgrade from a version which does not support service
level shares to the one that does, the `shares` column is added. The
aforementioned internal query cache is _not_ invalidated on schema
change, so the cache might still contain the prepared query from the
time before the column was added, and that prepared query will fetch the
old set of column without the new `shares` column.
In order to solve this, explicitly specify the columns in the query
string, using the full set of column names from the time when the query
is executed.
Note that this is a problem only for the legacy, non-raft service
levels. Raft-based service levels use a local table for which the schema
is determined on startup.
Also note that this code only fetches values from the `shares` column
but does not make any use of it otherwise. It will be handled by later
commits in this series.
The `nonexistant_service_level_exception` can be thrown by service
levels code and propagated up to the CQL server layer, where it is
converted into a CQL protocol error. The aforementioned exception
inherits from `service_level_argument_exception`, which in turn inherits
from `std::invalid_argument` - which doesn't mean much to the CQL layer
and is converted to a generic SERVER_ERROR.
We can do better and return a more meaningful error code for this
exception. Change the base class of service_level_argument_exception to
exceptions::invalid_request_exception which gets converted to an INVALID
error.
The INVALID error code was already being used by the enterprise version,
so this commit just synchronizes error handling with enterprise.
The function get_service_levels is used to retrieve all service levels
and it is called from multiple different contexts.
Importantly, it is called internally from the context of group0 state reload,
where it should be executed with a long timeout, similarly to other
internal queries, because a failure of this function affects the entire
group0 client, and a longer timeout can be tolerated.
The function is also called in the context of the user command LIST
SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is
preferred.
The commit introduces a function parameter to indicate whether the
context is internal or not. For internal context, a long timeout is
chosen for the query. Otherwise, the timeout is shorter, the same as
before. When the distinction is not important, a default value is
chosen which maintains the same behavior.
The main purpose is to fix the case where the timeout is too short and causes
a failure that propagates and fails the group0 client.
Fixesscylladb/scylladb#20483Closesscylladb/scylladb#21748
This patch reverts 324b3c43c0 and adds synchronous versions of `service_level_controller::find_effective_service_level()` and `client_state::maybe_update_per_service_level_params()`.
It isn't safe to do asynchronous calls in `for_each_gently`, as the
connection may be disconnected while a call in callback preempts.
Fixesscylladb/scylladb#21801Closesscylladb/scylladb#21761
* github.com:scylladb/scylladb:
Revert "generic_server: use async function in `for_each_gently()`"
transport/server: use synchronous calls in `for_each_gently` callback
service/client_state: add synchronous method to update service level params
qos/service_level_controller: add `find_cached_effective_service_level`
The method is a synchronous equivalent of
`find_effective_service_level`.
It uses recently introduced effective service level cache, so retrieve
user's effective service level is done by quick lookup to the cache.
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.
in this change, we:
- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
`std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
to feed `std::views::transform()` a view range.
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
limitations:
there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21700
`coroutine::parallel_for_each` accepts both a range and a pair of
iterators. let's use the former when appropriate. it is simpler this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21684
The later includes the former and in addition to `seastar::format()`,
`print.hh` also provides helpers like `seastar::fprint()` and
`seastar::print()`, which are deprecated and not used by scylladb.
Previously, we include `seastar/core/print.hh` for using
`seastar::format()`. and in seastar 5b04939e, we extracted
`seastar::format()` into `seastar/core/format.hh`. this allows us
to include a much smaller header.
In this change, we just include `seastar/core/format.hh` in place of
`seastar/core/print.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21574
This includes way too much, including <boost/regex.hpp>, which is huge.
Drop includes of adaptors.hpp and replace by what is needed.
Closesscylladb/scylladb#21187
It doesn't need it apart from a forward declaration.
Files that lost necessary includes are adjusted, and some users
of auth_version_t are redirected to the definition outside system_keyspace.
There is `service_level_controller::get_service_level()` method,
which searches for service level in the controller cache and returns
default service level if SL with given name doesn't exist.
Added method allows to check whether a service level exists in the
controller cache.
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Add a default constructor and a constructor which explicitly
initializes all fields of the service_level structure.
This is done in order to make sure that removal of the
marked_for_deletion field can be done safely - otherwise, for example,
service_level could be aggregate-initialized with an incomplete list of
values for the fields, and removing marked_for_deletion which is in the
middle of the struct would cause the is_static field to be initialized
with the value that was designated for marked_for_deletion.
As a bonus, make sure that marked_for_deletion and is_static bool fields
are initialized in the default constructor to false in order to avoid
potential undefined behavior.
Tenant names starting with `$` are reserved for internal ones.
Forbid creating new service level which name starts with `$`
and log a warning for existing service levels with `$` prefix.
Closesscylladb/scylladb#20122
Updates to `system.role_members` and `system.role_attributes` affect
effective service levels cache, so applying mutations to those tables
should reload the effective SL cache.
Add a second layer of service_level_controller cache which contains
role name -> effective service level mapping.
To build the mapping, controller uses first cache layer (service level
name -> service level) and 2 queries to auth tables (one to `roles` and
one to `role_members`).
Write down definitions of `service level` and `effective service level`
in service/qos/service_level_controller.hh.
Until now, effective service level was only used as result of
`LIST EFFECTIVE SERVICE LEVEL OF <role>`.
Now we want to have quick access to effective service level of
each role and introduce cache of effective sl to do it.
New definitions clarify things.
The commit also renames:
- `update_service_levels_from_distributed_data` -> `update_service_levels_cache`
Later we will introduce effective_service_level_cache, so this change
standarizes the names.
- `find_service_level` -> `find_effective_service_level`
The function actualy returns effective service level.
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
We want to call `service_level_controller::do_abort()` in all cases.
The current code (introduced in
535e5f4ae7)
calls do_abort if abort was not requested, however, since
it does so by checking the subscription bool operator,
it would miss the case where abort was already requested
before the subscription took place (in service_level_controller
ctor).
With scylladb/seastar@470b539b1c and
scylladb/seastar@8ecce18c51
we can just unconditionally call the subscription `on_abort`
method, that ensures only-once semantics, even if abort
was already requested at subscription time.
Fixesscylladb/scylladb#19075
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#19929
Before this, the notification semaphore was broken() in do_abort(),
which was triggered by early abort source.
However we are going to reload sl cache on topology state reload
and it can happen after the early abort source is triggered, so
it may throw broken_semaphore exception.
We can move semaphore breaking to stop() method. Legacy update loop
is still stopped in do_abort(), so it doesn't change the order of
service level controller shutdown.