can_use_effective_service_level_cache() always returns true now, so the
function can be dropped entirely and all the code that assumes it may
return false can be dropped as well.
This patch changes the layout of user-facing scheduling groups from
/
`- statement
`- sl:default
`- sl:*
`- other groups (compaction, streaming, etc.)
into
/
`- user (supergroup)
`- statement
`- sl:default
`- sl:*
`- other groups (compaction, streaming, etc.)
The new supergroup has 1000 static shares and is name-less, in a sense
that it only have a variable in the code to refer to and is not exported
via metrics (should be fixed in seastar if we want to).
The moved groups don't change their names or shares, only move inside
the scheduling hierarchy.
The goal of the change is to improve resource consumption of sl:*
groups. Right now activities in low-shares service levels are scheduled
on-par with e.g. streaming activity, which is considered to be low-prio
one. By moving all sl:* groups into their own supergroup with 1000
shares changes the meaning of sl:* shares. From now on these shares
values describe preirities of service level between each-other, and the
user activities compete with the rest of the system with 1000 shares,
regardless of how many service levels are there.
Unit tests keep their user groups under root supergroup (for simplicity)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#28235
Right now, service levels are migrated in one group0 command and auth
is migrated in the next one. This has a bad effect on the group0 state
reload logic - modifying service levels in group0 causes the effective
service levels cache to be recalculated, and to do so we need to fetch
information about all roles. If the reload happens after SL upgrade and
before auth upgrade, the query for roles will be directed to the legacy
auth tables in system_auth - and the query, being a potentially remote
query, has a timeout. If the query times out, it will throw
an exception which will break the group0 apply fiber and the node will
need to be restarted to bring it back to work.
In order to solve this issue, make sure that the service level module
does not start populating and using the service level cache until both
service levels and auth are migrated to raft. This is achieved by adding
the check both to the cache population logic and the effective service
level getter - they now look at service level's accessor new method,
`can_use_effective_service_level_cache` which takes a look at the auth
version.
Fixes: scylladb/scylladb#24963
We have a lot of places in the code where
a token_metadata_ptr is kept in an automatic
variable and destroyed when it leaves the scope.
since it's a referenced counted lw_shared_ptr,
the token_metadata object is rarely destroyed in
those cases, but when it is, it doesn't go through
clear_gently, and in particular its tablet_metadata
is not cleared gently, leading to inefficient destruction
of potentially many foreign_ptr:s.
This patch calls clear_and_destroy_impl that gently
clears and destroys the impl object in the background
using the shared_token_metadata.
Fixes#13381
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Adjust some of the existing tests in service_level_controller_test.cc
and add some more in order to test the workload prioritization features,
i.e. the service level shares.
Introduce the core logic of workload prioritization, responsible for
assigning scheduling groups to service levels.
The service level controller maintains a pool of scheduling groups for
the currently present service levels, as well as a pool of unused
scheduling groups which were previously used by some service level that
was deleted during node's lifetime.
When a new service level is created, the SL controller either assigns a
scheduling group from the unused SG pool, or creates a new one if the
pool is empty. The scheduling group is renamed to "sl:<scheduling group
name>".
When updating shares of a service level (and also when creating a new
service level), the shares of the corresponding scheduling group are
synchronized with those of the service level.
When a service level is deleted, its group is released to the
aforementioned pool of unused scheduling groups and the prefix of its
name is changed from "sl:" to "sl_deleted:".
For now, these scheduling groups are not used by any user operations.
This will be changed in subsequent commits.
Currently the locator::topology object, when created, does not contain
local node, but it is started to be used to access local database. It
sort of work now because there are explicit checks in the code to handle
this special case like in topology::get_location for instance. We do not
want to hack around it and instead rely on an invariant that the local
node is always there. To do that we add local node during
locator::topology creation. There is a catch though. Unlike with IP host
ID is not known during startup. We actually need to read from the
database to know it, so the topology starts with host ID zero and then
it changes once to the real one. This is not a problem though. As long as
the (one node) topology is consistent (_cfg.this_host_id is equal to the
node's id) local access will work.
since we are moving away for operator<< based formatter, more and more
types now only have {fmt} based formatters. the same will apply to the
STL container types after ditching the generic homebrew formatter in
to_string.hh, so to be prepared for the change, let's add the
fmt::formatter for tests as well.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
this change was created in the same spirit of 505900f18f. because
we are deprecating the operator<< for vector and unorderd_map in
Seastar, some tests do not compile anymore if we disable these
operators. so to be prepared for the change disabling them, let's
include test/lib/test_utils.hh for accessing the printer dedicated
for Boost.test. and also '#include <fmt/ranges.h>' when necessary,
because, in order to format the ranges using {fmt}, we need to
use fmt/ranges.h.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
There's stop-signal in main that fires an abort source on stop. Lots of
other services are subscribed in it, add the sl-controller too. For now
it's a no-op, but next patches will make use of it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
in C++20, compiler generate operator!=() if the corresponding
operator==() is already defined, the language now understands
that the comparison is symmetric in the new standard.
fortunately, our operator!=() is always equivalent to
`! operator==()`, this matches the behavior of the default
generated operator!=(). so, in this change, all `operator!=`
are removed.
in addition to the defaulted operator!=, C++20 also brings to us
the defaulted operator==() -- it is able to generated the
operator==() if the member-wise lexicographical comparison.
under some circumstances, this is exactly what we need. so,
in this change, if the operator==() is also implemented as
a lexicographical comparison of all memeber variables of the
class/struct in question, it is implemented using the default
generated one by removing its body and mark the function as
`default`. moreover, if the class happen to have other comparison
operators which are implemented using lexicographical comparison,
the default generated `operator<=>` is used in place of
the defaulted `operator==`.
sometimes, we fail to mark the operator== with the `const`
specifier, in this change, to fulfil the need of C++ standard,
and to be more correct, the `const` specifier is added.
also, to generate the defaulted operator==, the operand should
be `const class_name&`, but it is not always the case, in the
class of `version`, we use `version` as the parameter type, to
fulfill the need of the C++ standard, the parameter type is
changed to `const version&` instead. this does not change
the semantic of the comparison operator. and is a more idiomatic
way to pass non-trivial struct as function parameters.
please note, because in C++20, both operator= and operator<=> are
symmetric, some of the operators in `multiprecision` are removed.
they are the symmetric form of the another variant. if they were
not removed, compiler would, for instance, find ambiguous
overloaded operator '=='.
this change is a cleanup to modernize the code base with C++20
features.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closes#13687
We have enabled the command line options without changing a
single line of code, we only had to replace old include
with scylla_test_case.hh.
Next step is to add x-log-compaction-groups options, which will
determine the number of compaction groups to be used by all
instantiations of replica::table.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937
In order to ease future extensions to the information being sent
by the service level configuration change API, we pack the additional
parameters (other the the service level options) to the interface in a
structure. This will allow an easy expansion in the future if more
parameters needs to be sent to the observer.i
Signed-off-by: Eliran Sinvani <eliransin@scylladb.com>