Commit Graph

422 Commits

Author SHA1 Message Date
Kefu Chai
6ead5a4696 treewide: move log.hh into utils/log.hh
the log.hh under the root of the tree was created keep the backward
compatibility when seastar was extracted into a separate library.
so log.hh should belong to `utils` directory, as it is based solely
on seastar, and can be used all subsystems.

in this change, we move log.hh into utils/log.hh to that it is more
modularized. and this also improves the readability, when one see
`#include "utils/log.hh"`, it is obvious that this source file
needs the logging system, instead of its own log facility -- please
note, we do have two other `log.hh` in the tree.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-10-22 06:54:46 +03:00
Kamil Braun
f02afefd34 Merge 'raft: consider the gossiper state then sending the group0 state id' from Emil Maskovsky
Skip the advertisement of the group0 state id in case the gossiper is
not active (ready).

Sending the application state when the gossiper is not active caused
a warning being shown in the log about the local endpoint not being
found in the gossiper endpoint state map on a (graceful) node restart.

The local endpoint is initialized on the gossiper startup, so we skip
the state id advertisement until the startup is finished.

Fixes: scylladb/scylladb#21117

No backport: Fixes an issue that is currently only present in master

Closes scylladb/scylladb#21119

* github.com:scylladb/scylladb:
  raft: consider the gossiper state then sending the group0 state id
  raft: add the test for GROUP0_STATE_ID gossip application state
2024-10-17 13:41:15 +03:00
Emil Maskovsky
e082fef32c raft: remove the group0 state id handler stop check
The stop assertion check in the group0 state id handler was triggering
under some circumstances (stopping server during restart). In that case
it might be that the stop is initiated before the server is fully
initialized, and then the handler destructor is being called without
calling to the `stop()` method first. This is a valid scenario.

The whole `stop()` in the group0 state id handler is not necessary,
as the only operation being done is cancelling the timer which is done
by the timer destructor automatically anyway.

There is the concern of a currently running timer callback, but it
doesn't preempt (not async) so the timer shouldn't be destroyed before
the callback finishes.

Fixes: scylladb/scylladb#21074

Closes scylladb/scylladb#21127
2024-10-17 13:41:15 +03:00
Emil Maskovsky
3f1af268c2 raft: consider the gossiper state then sending the group0 state id
Skip the advertisement of the group0 state id in case the gossiper is
not active (ready).

Sending the application state when the gossiper is not active caused
a warning being shown in the log about the local endpoint not being
found in the gossiper endpoint state map on a (graceful) node restart.

The local endpoint is initialized on the gossiper startup, so we skip
the state id advertisement until the startup is finished.

Fixes: scylladb/scylladb#21117
2024-10-16 19:26:25 +02:00
Kefu Chai
32f508d450 raft: fix typo in logging message
s/miminum/minimum/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#21073
2024-10-16 06:33:43 +03:00
Tomasz Grabiec
3e438d23e1 Merge 'Check system.tablets update before putting it into the table' from Pavel Emelyanov
Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load.

fixes #20043

Closes scylladb/scylladb#21020

* github.com:scylladb/scylladb:
  tablets: Validate system.tablets update
  group0_client: Introduce change validation
  group0_client: Add shared_token_metadata dependency
2024-10-15 00:38:59 +02:00
Pavel Emelyanov
1863ccd900 tablets: Validate system.tablets update
Implement change validation for raft topology_change command. For now
the only check is that the "pending replicas" contains at most one
entry. The check mirrors similar one in `process_one_row` function.

If not passed, this prevents system.tablets from being updated with the
mutation(s) that will not be loaded later.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-10 12:39:58 +03:00
Pavel Emelyanov
e5bf376cbc group0_client: Introduce change validation
Add validate_change() methods (well, a template and an overload) that
are called by prepare_command() and are supposed to validate the
proposed change before it hits persistent storage

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-10 12:31:52 +03:00
Pavel Emelyanov
f09fe4f351 group0_client: Add shared_token_metadata dependency
It will be needed later to get tablet_metadata from.
The dependency is "OK", shared_token_metadata is low-level sharded
service. Client already references db::system_keyspace, which in turn
references replica::database which, finally, references token_metadata

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-10-10 12:27:46 +03:00
Emil Maskovsky
0c9308cf48 raft: add the check for the group0 tables
Added the runtime check to ensure that all the tables that are used with
the group0 commands are marked as group0 tables.
2024-10-08 21:08:11 +02:00
Emil Maskovsky
a03e98d6e8 raft: fast tombstone GC for group0-managed tables
Set the tombstone GC time for group0-managed tables to the minimal state
id of the group0 nodes.

The check is being done based on a timer, iterating through each node
(according to the group0 topology configuration) and taking the minimum
across all nodes.

This miminum timestamp is then be used to set the tombstone GC time
for the tombstone GC of all the group0-managed tables.

Fixes: scylladb/scylla#15607
2024-10-08 21:07:30 +02:00
Emil Maskovsky
baea9cfa67 gossip: broadcast the group0 state id
Implemented the group0 state_id handler (based on the gossip) that will
broadcast the group0 state id of each node.

This will be used to set the tombstone GC time for the group0 tables.
2024-10-08 20:53:54 +02:00
Emil Maskovsky
a840949ea0 treewide: code cleanup and refactoring
Fix the clang-tidy warnings, code cleanup and improvements.

Applied the clang format to the updated places.
2024-10-08 20:53:54 +02:00
Kamil Braun
1b9337bf99 Merge 'Wait for all users of group0 server to complete before destroying it' from Gleb Natapov
Group0 server is often used in asynchronous context, but we do not wait
for them to complete before destroying the server. We already have
shutdown gate for it, so lets use it in those asynch functions.

Also make sure to signal group0 abort source if initialization fails.

Fixes scylladb/scylladb#20701

Backport to 6.2 since it contains af83c5e53e and it made the race easier to hit, so tests became flaky.

Closes scylladb/scylladb#20891

* github.com:scylladb/scylladb:
  group: hold group0 shutdown gate during async operations
  group0: Stop group0 if node initialization fails
2024-10-08 13:46:54 +02:00
Gleb Natapov
e642f0a86d group: hold group0 shutdown gate during async operations
Wait for all outstanding async work that uses group0 to complete before
destroying group0 server.

Fixes scylladb/scylladb#20701
2024-10-06 17:20:52 +03:00
Avi Kivity
884297ae2e raft_group0_client: uninclude "raft_group0_registry.hh"
Reduce unnecessary recompilations.
2024-09-28 17:25:11 +03:00
Avi Kivity
67cdd0d389 raft_group_registry: extract raft_timeout
It is a vocabulary term that shouldn't need the registry to be visible.
Extract it to a new header.
2024-09-28 17:25:03 +03:00
Avi Kivity
93afc77307 raft_group0_client: uninclude "mutation/mutation.hh"
Lighten the dependency load. Some constructors and destructors
are uninlined to avoid the header depending on the mutation class.
2024-09-28 16:31:53 +03:00
Avi Kivity
5d68efe0bd raft_group0_client: uninclude "db/system_keyspace.hh"
It doesn't need it apart from a forward declaration.

Files that lost necessary includes are adjusted, and some users
of auth_version_t are redirected to the definition outside system_keyspace.
2024-09-28 16:31:53 +03:00
Kamil Braun
9224e48d6b Merge 'Populate raft address map from gossiper on raft configuration change' from Gleb Natapov
For each new node added to the raft config populate its ID to IP mapping
in raft address map from the gossiper. The mapping may have expired if a
node is added to the raft configuration long after it first appears in
the gossiper.

Fixes scylladb/scylladb#20600

Backport to all supported versions since the bug may cause bootstrapping failure.

Closes scylladb/scylladb#20601

* github.com:scylladb/scylladb:
  test: extend existing test to check that a joining node can map addresses of all pre-existing nodes during join
  group0: make sure that address map has an entry for each new node in the raft configuration
2024-09-26 12:41:25 +02:00
Kamil Braun
09c68c0731 service: raft: fix rpc error message
What it called "leader" is actually the destination of the RPC.

Trivial fix, should be backported to all affected versions.

Closes scylladb/scylladb#20789
2024-09-25 15:46:37 +03:00
Gleb Natapov
bddaf498df group0: make sure that address map has an entry for each new node in the raft configuration
ID->IP mapping is added to the raft address map when the mapping first
appears in the gossiper, but it is added as expiring entry. It becomes
non expiring when a node is added to raft configuration. But when a node
joins those two events may be distant in time (since the node's request
may sit in the topology coordinator queue for a while) and mappings may
expire already from the map. This patch makes sure to transfer the
mapping from the gossiper for a node that is added to the raft
configuration instead of assuming that the mapping is already there.
2024-09-18 13:42:38 +03:00
Kefu Chai
3e84d43f93 treewide: use seastar::format() or fmt::format() explicitly
before this change, we rely on `using namespace seastar` to use
`seastar::format()` without qualifying the `format()` with its
namespace. this works fine until we changed the parameter type
of format string `seastar::format()` from `const char*` to
`fmt::format_string<...>`. this change practically invited
`seastar::format()` to the club of `std::format()` and `fmt::format()`,
where all members accept a templated parameter as its `fmt`
parameter. and `seastar::format()` is not the best candidate anymore.
despite that argument-dependent lookup (ADT for short) favors the
function which is in the same namespace as its parameter, but
`using namespace` makes `seastar::format()` more competitive,
so both `std::format()` and `seastar::format()` are considered
as the condidates.

that is what is happening scylladb in quite a few caller sites of
`format()`, hence ADT is not able to tell which function the winner
in the name lookup:

```
/__w/scylladb/scylladb/mutation/mutation_fragment_stream_validator.cc:265:12: error: call to 'format' is ambiguous
  265 |     return format("{} ({}.{} {})", _name_view, s.ks_name(), s.cf_name(), s.id());
      |            ^~~~~~
/usr/bin/../lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/format:4290:5: note: candidate function [with _Args = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
 4290 |     format(format_string<_Args...> __fmt, _Args&&... __args)
      |     ^
/__w/scylladb/scylladb/seastar/include/seastar/core/print.hh:143:1: note: candidate function [with A = <const std::basic_string_view<char> &, const seastar::basic_sstring<char, unsigned int, 15> &, const seastar::basic_sstring<char, unsigned int, 15> &, const utils::tagged_uuid<table_id_tag> &>]
  143 | format(fmt::format_string<A...> fmt, A&&... a) {
      | ^
```

in this change, we

change all `format()` to either `fmt::format()` or `seastar::format()`
with following rules:
- if the caller expects an `sstring` or `std::string_view`, change to
  `seastar::format()`
- if the caller expects an `std::string`, change to `fmt::format()`.
  because, `sstring::operator std::basic_string` would incur a deep
  copy.

we will need another change to enable scylladb to compile with the
latest seastar. namely, to pass the format string as a templated
parameter down to helper functions which format their parameters.
to miminize the scope of this change, let's include that change when
bumping up the seastar submodule. as that change will depend on
the seastar change.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-09-11 23:21:40 +03:00
Abhi
9b09439065 raft: Add descriptions for requested abort errors
Fixes: scylladb/scylladb#18902

Closes scylladb/scylladb#20291
2024-09-10 17:56:29 +02:00
Evgeniy Naydanov
769424723b test: error injections for Raft-based topology
Add following error injections:
 - stop_after_init_of_system_ks
 - stop_after_init_of_schema_commitlog
 - stop_after_starting_gossiper
 - stop_after_starting_raft_address_map
 - stop_after_starting_migration_manager
 - stop_after_starting_commitlog
 - stop_after_starting_repair
 - stop_after_starting_cdc_generation_service
 - stop_after_starting_group0_service
 - stop_after_starting_auth_service
 - stop_during_gossip_shadow_round
 - stop_after_saving_tokens
 - stop_after_starting_gossiping
 - stop_after_sending_join_node_request
 - stop_after_setting_mode_to_normal_raft_topology
 - stop_before_becoming_raft_voter
 - topology_coordinator_pause_after_updating_cdc_generation
 - stop_before_streaming
 - stop_after_streaming
 - stop_after_bootstrapping_initial_raft_configuration
2024-09-05 22:11:31 +00:00
Kamil Braun
504bf68ebb raft_group0_client: remove hold_read_apply_mutex overload without abort_source&
Ensure that every caller passes `abort_source&`.
2024-09-03 15:52:05 +02:00
Kamil Braun
a7097fb985 group0_state_machine: pass _abort_source to hold_read_apply_mutex
`transfer_snapshot` was already passing `_abort_source` when trying to
take the lock but other member functions didn't.
2024-09-03 15:52:05 +02:00
Sergey Zolotukhin
c3e52ab942 raft: Invoke store_snapshot_descriptor with actually preserved items.
- raft_sys_table_storage::store_snapshot_descriptor now receives a number of
preserved items in the log, rather than _config.snapshot_trailing value;
- Incorrect check for truncated number of items in store_snapshot_descriptor
was removed.

Fixes scylladb/scylladb#16817
Fixes scylladb/scylladb#20080
2024-08-20 15:22:49 +02:00
Laszlo Ersek
5dcc627465 service/raft_sys_table_storage: tweak dead code
In raft_sys_table_storage::store_snapshot_descriptor(), the condition

  preserve_log_entries > snap.idx

*both* relies on implicit conversion of the tagged integer type "index_t"
to the underlying "uint64_t", *and* is a logic bug, as reported at
<https://github.com/scylladb/scylladb/issues/20080>.

Ticket#20080 explains that this condition always evaluates to false in
practice, and that the "else" branch handles all cases correctly anyway.
For now, wean the buggy expression off the disappearing
tagged-to-untaggged conversion.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 21:35:34 +02:00
Laszlo Ersek
d87d1ae29d service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries)
With conversion of tagged integers to untagged ones going away, replace

  static_cast<uint64_t>(snap.idx)

with

  snap.idx.value()

Furthermore, casting "preserve_log_entries" (of type "size_t") to
"uint64_t" is redundant (both "snap.idx" and "preserve_log_entries" carry
nonnegative values, and the mathematical difference is expected to be
nonnegative); remove the cast.

Finally, simplify the initialization syntax.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
e781046739 service/raft_sys_table_storage: untag index_t and term_t for queries
With implicit conversion of tagged integers to untagged ones going away,
explicitly untag index_t and term_t values in the following two contexts:

- when they are passed to CQL queries as int64_t,

- when they are default-constructed as fallbacks for int64_t fields
  missing from CQL result sets.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Botond Dénes
ca302d9e28 service/raft/group0_state_machine: provide tablet change hint on topology change
So that when reloading tablet state metadata from the disk, only the
changed parts are reloaded.
2024-08-11 09:53:19 -04:00
Michał Jadwiszczak
5f8132c13c service/raft/group0_state_machine: update effective service levels cache
Updates to `system.role_members` and `system.role_attributes` affect
effective service levels cache, so applying mutations to those tables
should reload the effective SL cache.
2024-08-08 10:42:09 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Emil Maskovsky
5dfc50d354 raft: fix the shutdown phase being stuck
Some of the calls inside the `raft_group0_client::start_operation()`
method were missing the abort source parameter. This caused the repair
test to be stuck in the shutdown phase - the abort source has been
triggered, but the operations were not checking it.

This was in particular the case of operations that try to take the
ownership of the raft group semaphore (`get_units(semaphore)`) - these
waits should be cancelled when the abort source is triggered.

This should fix the following tests that were failing in some percentage
of dtest runs (about 1-3 of 100):
* TestRepairAdditional::test_repair_kill_1
* TestRepairAdditional::test_repair_kill_3

Fixes scylladb/scylladb#19223
2024-07-31 09:18:54 +02:00
Emil Maskovsky
2dbe9ef2f2 raft: use the abort source reference in raft group0 client interface
Most callers of the raft group0 client interface are passing a real
source instance, so we can use the abort source reference in the client
interface. This change makes the code simpler and more consistent.
2024-07-31 09:18:54 +02:00
Piotr Dulikowski
188b4ac0fc Merge 'service_level_controller: update configuration on raft change' from Michał Jadwiszczak
This patch is a follow-up to scylladb/scylladb#16585.

Once we have service levels on raft, we can get rid of update loop, which updates the configuration in a configured interval (default is 10s).
Instead, this PR introduces methods to `group0_state_machine` which look through table ids in mutations in `write_mutation` and update submodules based on that ids.

Fixes: scylladb/scylladb#18060

Closes scylladb/scylladb#18758

* github.com:scylladb/scylladb:
  test: remove `sleep()`s which were required to reload service levels configuration
  test/cql_test_env: remove unit test service levels data accessors
  service/storage_service: reload SL cache on topology_state_load()
  service/qos/service_level_controller: move semaphore breaking to stop
  service/qos/service_level_controller: maybe start and stop legacy update loop
  service/qos/service_level_controller: make update loop legacy
  raft/group0_state_machine: update submodules based on table_id
  service/storage_service: add a proxy method to reload sl cache
2024-07-11 16:18:48 +02:00
Michał Jadwiszczak
5ddf5e3d7d raft/group0_state_machine: update submodules based on table_id
We want to update service levels cache when any new mutations are
applied to service levels table.
To not create new raft command type, this commit changes design of
`write_mutations` to updated in-memory structures based on
mutations' table_id.
2024-07-10 10:23:04 +02:00
Emil Maskovsky
7a69d9070f raft: use raft_timeout in trigger_snapshot
Migrate the "trigger_snapshot" to use the standardized `raft_timeout` approach.
2024-07-08 18:13:31 +02:00
Emil Maskovsky
80986c17c3 test: raft: verify schema updated after read barrier
Regression test for #19213.
2024-07-08 10:50:32 +02:00
Piotr Dulikowski
85128c5b10 Merge 'cql3: always return created event in create keyspace statement' from Marcin Maliszkiewicz
cql3: always return created event in create ks/table/type/view statement

In case multiple clients issue concurrently CREATE KEYSPACE IF NOT EXISTS
and later USE KEYSPACE it can happen that schema in driver's session is
out of sync because it synces when it receives special message from
CREATE KEYSPACE response.

Similar situation occurs with other schema change statements.

In this patch we fix only create keyspace/table/type/view statements
by always sending created event. Behavior of any other schema altering
statements remains unchanged.

Fixes https://github.com/scylladb/scylladb/issues/16909

**backport: no, it's not a regression**

Closes scylladb/scylladb#18819

* github.com:scylladb/scylladb:
  cql3: always return created event in create ks/table/type/view statement
  cql3: auth: move auto-grant closer to resource creation code
  cql3: extract create ks/table/type/view event code
2024-06-17 19:58:38 +02:00
Kefu Chai
a5a5ca0785 auth: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#19312
2024-06-17 17:33:55 +03:00
Gleb Natapov
34cf5c81f6 group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.

Fixes scylladb/scylladb#18863

Closes scylladb/scylladb#19138
2024-06-07 15:31:44 +02:00
Marcin Maliszkiewicz
f6108a72d3 cql3: auth: move auto-grant closer to resource creation code
This should reduce the risk of re-introducing issue similar to
the one fixed in ab6988c52f

When grant code is closer to actual creation code (announcing mutations)
there is lower chance of those two effects being triggered differently,
if we ever call grant_permissions_to_creator and not announce mutations
that's very likely a security vulnerability.

Additionally comment was rewritten to be more accurate.
2024-06-07 10:26:32 +02:00
Marcin Maliszkiewicz
63e6334a64 raft: rename mutations_collector to group0_batch 2024-06-06 13:26:34 +02:00
Marcin Maliszkiewicz
ac0e164a6b raft: rename announce to commit
Old wording was derived from existing code which
originated from schema code. Name commit better
describes what we do here.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
370a5b547e cql3: raft: attach description to each mutations collector group
This description is readable from raft log table.
Previously single description was provided for the whole
announce call but since it can contain mutations from
various subsystems now description was moved to
add_mutation(s)/add_generator function calls.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
0573fee2a9 cql3: auth: use mutation collector for grant and revoke permissions
This is done to achieve single transaction semantics.

The change includes auto-grant feature. In particular
for schema related auto-grant we don't use normal
mutation collector announce path but follow migration manager,
this may be unified in the future.
2024-06-04 15:43:04 +02:00
Marcin Maliszkiewicz
7e0a801f53 auth: add class for gathering mutations without immediate announce
To achieve write atomicity across different tables we need to announce
mutations in a single transaction. So instead of each function doing
a separate announce we need to collect mutations and announce them once
at the end.
2024-06-04 15:43:04 +02:00
Piotr Smaron
06008970fb New raft cmd for both schema & topo changes
Allows executing combined topology & schema mutations under a single RAFT command
2024-05-27 12:48:44 +02:00