the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.
in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Skip the advertisement of the group0 state id in case the gossiper is
not active (ready).
Sending the application state when the gossiper is not active caused
a warning being shown in the log about the local endpoint not being
found in the gossiper endpoint state map on a (graceful) node restart.
The local endpoint is initialized on the gossiper startup, so we skip
the state id advertisement until the startup is finished.
Fixes: scylladb/scylladb#21117
No backport: Fixes an issue that is currently only present in master
Closesscylladb/scylladb#21119
* github.com:scylladb/scylladb:
raft: consider the gossiper state then sending the group0 state id
raft: add the test for GROUP0_STATE_ID gossip application state
The stop assertion check in the group0 state id handler was triggering
under some circumstances (stopping server during restart). In that case
it might be that the stop is initiated before the server is fully
initialized, and then the handler destructor is being called without
calling to the `stop()` method first. This is a valid scenario.
The whole `stop()` in the group0 state id handler is not necessary,
as the only operation being done is cancelling the timer which is done
by the timer destructor automatically anyway.
There is the concern of a currently running timer callback, but it
doesn't preempt (not async) so the timer shouldn't be destroyed before
the callback finishes.
Fixes: scylladb/scylladb#21074Closesscylladb/scylladb#21127
Skip the advertisement of the group0 state id in case the gossiper is
not active (ready).
Sending the application state when the gossiper is not active caused
a warning being shown in the log about the local endpoint not being
found in the gossiper endpoint state map on a (graceful) node restart.
The local endpoint is initialized on the gossiper startup, so we skip
the state id advertisement until the startup is finished.
Fixes: scylladb/scylladb#21117
Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load.
fixes#20043Closesscylladb/scylladb#21020
* github.com:scylladb/scylladb:
tablets: Validate system.tablets update
group0_client: Introduce change validation
group0_client: Add shared_token_metadata dependency
Implement change validation for raft topology_change command. For now
the only check is that the "pending replicas" contains at most one
entry. The check mirrors similar one in `process_one_row` function.
If not passed, this prevents system.tablets from being updated with the
mutation(s) that will not be loaded later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add validate_change() methods (well, a template and an overload) that
are called by prepare_command() and are supposed to validate the
proposed change before it hits persistent storage
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It will be needed later to get tablet_metadata from.
The dependency is "OK", shared_token_metadata is low-level sharded
service. Client already references db::system_keyspace, which in turn
references replica::database which, finally, references token_metadata
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Set the tombstone GC time for group0-managed tables to the minimal state
id of the group0 nodes.
The check is being done based on a timer, iterating through each node
(according to the group0 topology configuration) and taking the minimum
across all nodes.
This miminum timestamp is then be used to set the tombstone GC time
for the tombstone GC of all the group0-managed tables.
Fixes: scylladb/scylla#15607
Implemented the group0 state_id handler (based on the gossip) that will
broadcast the group0 state id of each node.
This will be used to set the tombstone GC time for the group0 tables.
Group0 server is often used in asynchronous context, but we do not wait
for them to complete before destroying the server. We already have
shutdown gate for it, so lets use it in those asynch functions.
Also make sure to signal group0 abort source if initialization fails.
Fixesscylladb/scylladb#20701
Backport to 6.2 since it contains af83c5e53e and it made the race easier to hit, so tests became flaky.
Closesscylladb/scylladb#20891
* github.com:scylladb/scylladb:
group: hold group0 shutdown gate during async operations
group0: Stop group0 if node initialization fails
It doesn't need it apart from a forward declaration.
Files that lost necessary includes are adjusted, and some users
of auth_version_t are redirected to the definition outside system_keyspace.
For each new node added to the raft config populate its ID to IP mapping
in raft address map from the gossiper. The mapping may have expired if a
node is added to the raft configuration long after it first appears in
the gossiper.
Fixesscylladb/scylladb#20600
Backport to all supported versions since the bug may cause bootstrapping failure.
Closesscylladb/scylladb#20601
* github.com:scylladb/scylladb:
test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
group0: make sure that address map has an entry for each new node in the raft configuration
What it called "leader" is actually the destination of the RPC.
Trivial fix, should be backported to all affected versions.
Closesscylladb/scylladb#20789
ID->IP mapping is added to the raft address map when the mapping first
appears in the gossiper, but it is added as expiring entry. It becomes
non expiring when a node is added to raft configuration. But when a node
joins those two events may be distant in time (since the node's request
may sit in the topology coordinator queue for a while) and mappings may
expire already from the map. This patch makes sure to transfer the
mapping from the gossiper for a node that is added to the raft
configuration instead of assuming that the mapping is already there.
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.
that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:
```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
265 | return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
| ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
4290 | format(format_string<_Args...> __fmt, _Args&&... __args)
| ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
143 | format(fmt::format_string<A...> fmt, A&&... a) {
| ^
```
in this change, we
change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
`seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
because, `sstring::operator std::basic_string` would incur a deep
copy.
we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
- raft_sys_table_storage::store_snapshot_descriptor now receives a number of
preserved items in the log, rather than _config.snapshot_trailing value;
- Incorrect check for truncated number of items in store_snapshot_descriptor
was removed.
Fixesscylladb/scylladb#16817Fixesscylladb/scylladb#20080
In raft_sys_table_storage::store_snapshot_descriptor(), the condition
preserve_log_entries > snap.idx
*both* relies on implicit conversion of the tagged integer type "index_t"
to the underlying "uint64_t", *and* is a logic bug, as reported at
<https://github.com/scylladb/scylladb/issues/20080>.
Ticket#20080 explains that this condition always evaluates to false in
practice, and that the "else" branch handles all cases correctly anyway.
For now, wean the buggy expression off the disappearing
tagged-to-untaggged conversion.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
With conversion of tagged integers to untagged ones going away, replace
static_cast<uint64_t>(snap.idx)
with
snap.idx.value()
Furthermore, casting "preserve_log_entries" (of type "size_t") to
"uint64_t" is redundant (both "snap.idx" and "preserve_log_entries" carry
nonnegative values, and the mathematical difference is expected to be
nonnegative); remove the cast.
Finally, simplify the initialization syntax.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
With implicit conversion of tagged integers to untagged ones going away,
explicitly untag index_t and term_t values in the following two contexts:
- when they are passed to CQL queries as int64_t,
- when they are default-constructed as fallbacks for int64_t fields
missing from CQL result sets.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Updates to `system.role_members` and `system.role_attributes` affect
effective service levels cache, so applying mutations to those tables
should reload the effective SL cache.
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.
Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.
To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.
[1] 66ef711d68Closesscylladb/scylladb#20006
Some of the calls inside the `raft_group0_client::start_operation()`
method were missing the abort source parameter. This caused the repair
test to be stuck in the shutdown phase - the abort source has been
triggered, but the operations were not checking it.
This was in particular the case of operations that try to take the
ownership of the raft group semaphore (`get_units(semaphore)`) - these
waits should be cancelled when the abort source is triggered.
This should fix the following tests that were failing in some percentage
of dtest runs (about 1-3 of 100):
* TestRepairAdditional::test_repair_kill_1
* TestRepairAdditional::test_repair_kill_3
Fixesscylladb/scylladb#19223
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
This patch is a follow-up to scylladb/scylladb#16585.
Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s).
Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids.
Fixes: scylladb/scylladb#18060Closesscylladb/scylladb#18758
* github.com:scylladb/scylladb:
test: remove `sleep()`s which were required to reload service levels configuration
test/cql_test_env: remove unit test service levels data accessors
service/storage_service: reload SL cache on topology_state_load()
service/qos/service_level_controller: move semaphore breaking to stop
service/qos/service_level_controller: maybe start and stop legacy update loop
service/qos/service_level_controller: make update loop legacy
raft/group0_state_machine: update submodules based on table_id
service/storage_service: add a proxy method to reload sl cache
We want to update service levels cache when any new mutations are
applied to service levels table.
To not create new raft command type, this commit changes design of
`write_mutations` to updated in-memory structures based on
mutations' table_id.
cql3: always return created event in create ks/table/type/view statement
In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.
Similar situation occurs with other schema change statements.
In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.
Fixes https://github.com/scylladb/scylladb/issues/16909
**backport: no, it's not a regression**
Closesscylladb/scylladb#18819
* github.com:scylladb/scylladb:
cql3: always return created event in create ks/table/type/view statement
cql3: auth: move auto-grant closer to resource creation code
cql3: extract create ks/table/type/view event code
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.
Fixesscylladb/scylladb#18863Closesscylladb/scylladb#19138
This should reduce the risk of re-introducing issue similar to
the one fixed in ab6988c52f
When grant code is closer to actual creation code (announcing mutations)
there is lower chance of those two effects being triggered differently,
if we ever call grant_permissions_to_creator and not announce mutations
that's very likely a security vulnerability.
Additionally comment was rewritten to be more accurate.
This description is readable from raft log table.
Previously single description was provided for the whole
announce call but since it can contain mutations from
various subsystems now description was moved to
add_mutation(s)/add_generator function calls.
This is done to achieve single transaction semantics.
The change includes auto-grant feature. In particular
for schema related auto-grant we don't use normal
mutation collector announce path but follow migration manager,
this may be unified in the future.
To achieve write atomicity across different tables we need to announce
mutations in a single transaction. So instead of each function doing
a separate announce we need to collect mutations and announce them once
at the end.