Commit Graph

814 Commits

Author SHA1 Message Date
Tomasz Grabiec
d6346e68c1 Merge 'prevent gossiper from marking nodes as down in tests unexpectedly' from Patryk Jędrzejczak
This PR includes two changes that make gossiper much less likely to mark
nodes as down in tests unexpectedly, and cause test flakiness in issues
like SCYLLADB-864:
- fixing false node conviction when echo succeeds,
- increasing the failure_detector_timeout fixture.

Fixes: SCYLLADB-864

No need for backport: related CI failures are rare, and merging #29522
made them even more unlikely (I haven't seen one since then, but it's
still possible to reproduce locally on dev machines).

Closes scylladb/scylladb#29755

* github.com:scylladb/scylladb:
  test/cluster: increase failure_detector_timeout
  gossiper: fix false node conviction when echo succeeds
2026-05-06 14:01:15 +02:00
Patryk Jędrzejczak
efe0e39d85 gossiper: fix false node conviction when echo succeeds
failure_detector_loop_for_node() could falsely convict a healthy node
even when the echo succeeded. The code computed diff = now - last
(time since last successful echo) and checked diff > max_duration
unconditionally, regardless of whether the current echo failed or
succeeded.

This caused flakiness in tests that decrease the failure detector
timeout. We currently run #CPUs tests concurrently, and since cluster
tests start multiple nodes with 2 shards, multiple shards contend for
one CPU. As a result, some tasks can become abnormally slow and block
the failure detector loop execution for a few seconds.

Fix by only checking diff > max_duration when the echo actually
failed.

Note that we send echo with the timeout equal to `max_duration` anyway,
so the receiver will be marked as down if it really doesn't respond.
2026-05-05 15:12:32 +02:00
Yaniv Michael Kaul
93722f2c89 gms/gossiper: fix use-after-move in do_send_ack2_msg
The second logger.debug() call accesses ack2_msg after it was moved
via std::move() in the co_await send_gossip_digest_ack2 call.
This is undefined behavior.

Fix by formatting ack2_msg to a string before the move, then using
that cached string in both debug log calls.

FIXES: https://scylladb.atlassian.net/browse/SCYLLADB-1778

Closes scylladb/scylladb#29227
2026-04-30 07:07:39 +03:00
Andrzej Jackowski
78926d9c96 test/random_failures: remove gossip shadow round injection
Commit c17c4806a1 removed check_for_endpoint_collision() from
the fresh bootstrap path, which was the only code path that
called do_shadow_round() for new nodes. Since the gossip shadow
round is no longer executed during bootstrap, remove the
stop_during_gossip_shadow_round error injection from the test.

The entry is marked as REMOVED_ rather than deleted to preserve
the shuffle order for seed-based test reproducibility.

The injection point in gms/gossiper.cc is also removed since it
is no longer used by any test.

Fixes: SCYLLADB-1466

Closes scylladb/scylladb#29460
2026-04-15 16:30:55 +02:00
Avi Kivity
0ae22a09d4 LICENSE: Update to version 1.1
Updated terms of non-commercial use (must be a never-customer).
2026-04-12 19:46:33 +03:00
Gleb Natapov
d0576c109f gossiper: send shutdown notifications in parallel 2026-04-09 13:31:40 +03:00
Gleb Natapov
e17fc180a0 gossiper: print node state from raft topology in the logs
Raft topology has real node's state now. gossiper sate are now set to
NORMAL and SHUTDOWN only.
2026-04-09 13:31:40 +03:00
Gleb Natapov
8439154851 gossiper: use is_shutdown instead of code it manually 2026-04-09 13:31:39 +03:00
Gleb Natapov
7d700d0377 gossiper: mark endpoint_state(inet_address ip) constructor as explicit
get_live_members function called is_shutdown which inet_address
argument, which caused temporary endpoint_state to be created. Fix
it by prohibiting implicit conversion and calling the correct
is_shutdown function instead.
2026-04-09 13:31:39 +03:00
Gleb Natapov
6df4f572d5 gossiper: remove unused code 2026-04-09 13:31:39 +03:00
Gleb Natapov
67102496c8 gossiper: drop last use of LEFT state and drop the state
The decommission sets left gossiper state only to prevent shutdown
notification be issued by the node during shutdown. Since the
notification code now checks the state in raft topology this is no
longer needed.
2026-04-09 13:31:39 +03:00
Gleb Natapov
7c895ced19 gossiper: rename is_dead_state to is_left since this is all that the function checks now. 2026-04-09 13:31:38 +03:00
Gleb Natapov
7dfb0577b8 gossiper: use raft topology state instead of gossiper one when checking node's state
Raft topology state is a truth source for the nodes state, so use it
instead of a gossiper one.
2026-04-09 13:31:38 +03:00
Gleb Natapov
c17c4806a1 storage_service: drop check_for_endpoint_collision function
All the checks that it does are also done by join coordinator and the
join coordinator uses more reliable raft state instead of gossiper one.
2026-04-09 13:31:37 +03:00
Gleb Natapov
681aa9ebe1 gossiper: remove unused REMOVED_TOKEN state 2026-04-09 13:31:37 +03:00
Gleb Natapov
5af17aa578 gossiper: remove unused advertise_token_removed function 2026-04-09 13:31:36 +03:00
Gleb Natapov
4e56ca3c76 gossiper: do not gossip TOKENS and CDC_GENERATION_ID any more
They were used by legacy topology and cdc code only.
2026-03-10 10:39:58 +02:00
Gleb Natapov
77f8f952b2 gossiper: drop tokens from loaded_endpoint_state 2026-03-10 10:39:58 +02:00
Gleb Natapov
706754dc24 gossiper: remove unused functions 2026-03-10 10:39:58 +02:00
Gleb Natapov
2d8722d204 gossiper: drop is_safe_for_restart() function and its use
The function checks that the node's state is not left or removed in
gossiper during restart, but with raft topology a removed node will
not be able to contact the cluster to get this information since it will
be banned.
2026-03-10 10:39:58 +02:00
Gleb Natapov
d35b83bec8 gossiper: remove the code that was only used in gossiper topology
The topology state machine is always present now and can be passed to
the gossiper during creation.
2026-03-10 10:39:58 +02:00
Gleb Natapov
0b508c5f96 test: remove unused injection points
Also remove test_auth_raft_command_split test which is irrelevant since 5ba7d1b116
because it does not use the function that injects max sized command after the
commit.
2026-03-10 10:09:39 +02:00
Patryk Jędrzejczak
9a9202c909 Merge 'Remove gossiper topology code' from Gleb Natapov
The PR removes most of the code that assumes that group0 and raft topology is not enabled. It also makes sure that joining a cluster in no raft mode or upgrading a node in a cluster that not yet uses raft topology to this version will fail.

Refs #15422

No backport needed since this removes functionality.

Closes scylladb/scylladb#28514

* https://github.com/scylladb/scylladb:
  group0: fix indentation after previous patch
  raft_group0: simplify get_group0_upgrade_state function since no upgrade can happen any more
  raft_group0: move service::group0_upgrade_state to use fmt::formatter instead of iostream
  raft_group0: remove unused code from raft_group0
  node_ops: remove topology over node ops code
  topology: fix indentation after the previous patch
  topology: drop topology_change_enabled parameter from raft_group0 code
  storage_service: remove unused handle_state_* functions
  gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option
  storage_service: fix indentation after the last patch
  storage_service: remove gossiper bootstrapping code
  storage_service: drop get_group_server_if_raft_topolgy_enabled
  storage_service: drop is_topology_coordinator_enabled and its uses
  storage_service: drop run_with_api_lock_in_gossiper_mode_only
  topology: remove code that assumes raft_topology_change_enabled() may return false
  test: schema_change_test: make test_schema_digest_does_not_change_with_disabled_features tests run in raft mode
  test: schema_change_test: drop schema tests relevant for no raft mode only
  topology: remove upgrade to raft topology code
  group0: remove upgrade to group0 code
  group0: refuse to boot if a cluster is still is not in a raft topology mode
  storage_service: refuse to join a cluster in legacy mode
2026-02-27 14:43:41 +01:00
Avi Kivity
511fab1f28 gossiper: exit failure detector sleep faster
When running unit tests, there's a visible ~1-second sleep
when gossip exits the failure detector loop.

Improve this by adding a condition variable for exiting the loop
and signaling it when any of the exit conditions are satisfied:
the abort_source is pulled, the gossiper is shut down, or the sleep
is complete. We can't just use the abort_source because gossip can be
shut down independently of the rest of the system.

To see the improvement, I ran cql_query_test in dev mode:

Before:

$ time ./build/dev/test/boost/combined_tests -t cql_query_test -- --smp 2  > /dev/null 2>&1

real	2m26.904s
user	0m24.307s
sys	0m13.402s

After:

$ time ./build/dev/test/boost/combined_tests -t cql_query_test -- --smp 2  > /dev/null 2>&1

real	0m26.579s
user	0m24.671s
sys	0m13.636s

Two minutes of real-time saved.

Real-life improvement in test.py will be lower, because of the overhead
of launching pytest for each test case.

Closes scylladb/scylladb#28649
2026-02-25 11:41:02 +02:00
Gleb Natapov
1a57f2b22d gossiper: drop wait_for_gossip_to_settle and deprecate correspondent option
The function is unused now and the option that allows to skip the wait
is no longer needed as well.
2026-02-25 10:08:31 +02:00
Gleb Natapov
a8a167623a topology: remove code that assumes raft_topology_change_enabled() may return false
The path removes the code protected by !raft_topology_change_enabled()
since it is no longer reachable. Drop test_lwt_for_tablets_is_not_supported_without_raft
since not raft mode is no longer supported.
2026-02-25 10:08:30 +02:00
Benny Halevy
84caa94340 gossiper: add_expire_time_for_endpoint: replace fmt::localtime with gmtime in log printout
1. fmt::localtime is deprecated.
2. We should really print times in UTC, especially on the cloud.
3. The current log message does not print the timezone so it'd unclear
   to anyone reading the lof message if the expiration time is in the
   local timezone or in GMT/UTC.

Fixes the following warning:
```
gms/gossiper.cc:2428:28: warning: 'localtime' is deprecated [-Wdeprecated-declarations]
 2428 |             endpoint, fmt::localtime(clk::to_time_t(expire_time)), expire_time.time_since_epoch().count(),
      |                            ^
/usr/include/fmt/chrono.h:538:1: note: 'localtime' has been explicitly marked deprecated here
  538 | FMT_DEPRECATED inline auto localtime(std::time_t time) -> std::tm {
      | ^
/usr/include/fmt/base.h:207:28: note: expanded from macro 'FMT_DEPRECATED'
  207 | #  define FMT_DEPRECATED [[deprecated]]
      |                            ^
```

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#28434
2026-02-03 06:36:53 +02:00
Patryk Jędrzejczak
0fed9f94f8 gossiper: add_saved_endpoint: make generations of excluded nodes negative
The explanation is in the new comment in `gossiper::add_saved_endpoint`.

We add a test for this change. It's "extremely white-box", but it's better
than nothing.
2025-12-29 19:13:55 +01:00
Pavel Emelyanov
54edb44b20 code: Stop using seastar::compat::source_location
And switch to std::source_location.
Upcoming seastar update will deprecate its compatibility layer.

The patch is

  for f in $(git grep -l 'seastar::compat::source_location'); do
    sed -e 's/seastar::compat::source_location/std::source_location/g' -i $f;
  done

and removal of few header includes.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#27309
2025-11-27 19:10:11 +02:00
Yaniv Michael Kaul
765a7e9868 gms/gossiper.cc: fix gossip log to show host-id/ip instead of host-id/host-id
Probably a copy-paste error, fixes the log to print host-id/ip.

Backport: no need, benign log issue.

Fixes: https://github.com/scylladb/scylladb/issues/27113
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>

Closes scylladb/scylladb#27225
2025-11-25 20:56:20 +01:00
Patryk Jędrzejczak
9efe250a8f Merge 'gossiper: ensure gossiper operations are executed in gossiper scheduling group' from Sergey Zolotukhin
Sometimes gossiper operations invoked from storage_service and other components run under a non-gossiper scheduling group. If these operations acquire gossiper locks, priority inversion can occur: higher-priority gossiper tasks may wait behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness or even failures.
This patch ensures that gossiper operations requiring locks on gossiper structures are explicitly executed in the gossiper scheduling group. To help detect similar issues in the future, a warning is logged whenever a gossiper lock is acquired under a non-gossiper scheduling group.

Fixes scylladb/scylladb#25907
Refs: scylladb/scylladb#25702

Backport: this patch fixes an issue with gossiper operations scheduling group, that might affect topology operations, therefore backport is needed to 2025.1, 2025.2, 2025.3

Closes scylladb/scylladb#25981

* https://github.com/scylladb/scylladb:
  gossiper: ensure gossiper operations are executed in gossiper scheduling group
  gossiper: fix wrong gossiper instance used in `force_remove_endpoint`
2025-09-16 10:14:15 +02:00
Sergey Zolotukhin
6c2a145f6c gossiper: ensure gossiper operations are executed in gossiper scheduling group
Sometimes gossiper operations invoked from storage_service and other components
run under a non-gossiper scheduling group. If these operations acquire gossiper
locks, priority inversion can occur: higher-priority gossiper tasks may wait
behind lower-priority tasks (e.g. streaming), which can cause gossiper slowness
or even failures.
This patch ensures that gossiper operations requiring locks on gossiper
structures are explicitly executed in the gossiper scheduling group.
To help detect similar issues in the future, a warning is logged whenever
a gossiper lock is acquired under a non-gossiper scheduling group.

Fixes scylladb/scylladb#25907
2025-09-15 12:49:07 +02:00
Sergey Zolotukhin
340413e797 gossiper: fix wrong gossiper instance used in force_remove_endpoint
`gossiper::force_remove_endpoint` is always executed on shard 0 using
`invoke_on`. Since each shard has its own `gossiper` instance, if
`force_remove_endpoint` is called from a shard other than shard 0,
`my_host_id()` may be invoked on the wrong `gossiper` object. This
results in undefined behavior due to unsynchronized access to resources
on another shard.
2025-09-15 08:54:59 +02:00
Emil Maskovsky
99db980899 gossiper: eliminate duplicate code in do_shadow_round
Remove a redundant code block inadvertently introduced in commit 4b3d160f34.
While the duplicate did not affect functionality, its presence could
cause confusion and maintenance issues.

This change does not alter behavior and is purely a cleanup.

Fixes: scylladb/scylladb#25999

Backport: The issue exists in all 2025 branches, so it should be
backported accordingly.

Closes scylladb/scylladb#26001
2025-09-15 08:35:04 +03:00
Sergey Zolotukhin
b34d543f30 gossiper: fix empty initial local node state
This change removes the addition of an empty state to `_endpoint_state_map`.
Instead, a new state is created locally and then published via replicate,
avoiding the issue of an empty state existing in `_endpoint_state_map`
before the preemption point. Since this resolves the issue tested in
`test_gossiper_empty_self_id_on_shadow_round`, the `xfail` mark has been removed.

Fixes: scylladb/scylladb#25831
2025-09-08 11:38:31 +02:00
Sergey Zolotukhin
775642ea23 gossiper: add test for a race condition in start_gossiping
This change adds a test for a race condition in `start_gossiping` that
can lead to an empty self state sent in `gossip_get_endpoint_states_response`.

Test for scylladb/scylladb#25831
2025-09-08 11:38:30 +02:00
Sergey Zolotukhin
f08df7c9d7 gossiper: check for a race condition in do_apply_state_locally
In do_apply_state_locally, a race condition can occur if a task is
suspended at a preemption point while the node entry is not locked.
During this time, the host may be removed from _endpoint_state_map.
When the task resumes, this can lead to inserting an entry with an
empty host ID into the map, causing various errors, including a node
crash.

This change
1. adds a check after locking the map entry: if a gossip ACK update
   does not contain a host ID, we verify that an entry with that host ID
   still exists in the gossiper’s _endpoint_state_map.
2. Removes xfail from the test_gossiper_race test since the issue is now
   fixed.
3. Adds exception handling in `do_shadow_round` to skip responses from
   nodes that sent an empty host ID.

This re-applies the commit 13392a40d4 that
was reverted in 46aa59fe49, after fixing
the issues that caused the CI to fail.

Fixes: scylladb/scylladb#25702
Fixes: scylladb/scylladb#25621

Ref: scylladb/scylla-enterprise#5613
2025-09-08 11:38:30 +02:00
Emil Maskovsky
28e0f42a83 test/gossiper: add reproducible test for race condition during node decommission
This change introduces a targeted test that simulates the gossiper race
condition observed during node decommissioning. The test delays gossip
state application and host ID lookup to reliably reproduce the scenario
where `gossiper::get_host_id()` is called on a removed endpoint,
potentially triggering an abort in `apply_new_states`.

There is a specific error injection added to widen the race window, in
order to increase the likelihood of hitting the race condition. The
error injection is designed to delay the application of gossip state
updates, for the specific node that is being decommissioned. This should
then result in the server abort in the gossiper.

This re-applies the commit 5dac4b38fb that
was reverted in dc44fca67c, but modified
to relax the check from "on_internal_error" to a just warning log. The
more strict can be re-introduced later once we are sure that all
remaining problems are resolved and it will not break the CI.

Refs: scylladb/scylladb#25621
Fixes: scylladb/scylladb#25721
2025-09-08 11:38:30 +02:00
Pavel Emelyanov
dc44fca67c Revert "test/gossiper: add reproducible test for race condition during node decommission"
This reverts commit 5dac4b38fb as per
request from #25803
2025-09-05 09:56:46 +03:00
Pavel Emelyanov
46aa59fe49 Revert "gossiper: check for a race condition in do_apply_state_locally"
This reverts commit 13392a40d4 as per
request from #25803
2025-09-05 09:56:21 +03:00
Sergey Zolotukhin
13392a40d4 gossiper: check for a race condition in do_apply_state_locally
In do_apply_state_locally, a race condition can occur if a task is
suspended at a preemption point while the node entry is not locked.
During this time, the host may be removed from _endpoint_state_map.
When the task resumes, this can lead to inserting an entry with an
empty host ID into the map, causing various errors, including a node
crash.

This change adds a check after locking the map entry: if a gossip ACK update
does not contain a host ID, we verify that an entry with that host ID
still exists in the gossiper’s _endpoint_state_map.

Fixes scylladb/scylladb#25702
Fixes scylladb/scylladb#25621
Ref scylladb/scylla-enterprise#5613

Closes scylladb/scylladb#25727
2025-09-02 20:44:21 +02:00
Emil Maskovsky
5dac4b38fb test/gossiper: add reproducible test for race condition during node decommission
This change introduces a targeted test that simulates the gossiper race
condition observed during node decommissioning. The test delays gossip
state application and host ID lookup to reliably reproduce the scenario
where `gossiper::get_host_id()` is called on a removed endpoint,
potentially triggering an abort in `apply_new_states`.

There is a specific error injection added to widen the race window, in
order to increase the likelihood of hitting the race condition. The
error injection is designed to delay the application of gossip state
updates, for the specific node that is being decommissioned. This should
then result in the server abort in the gossiper.

Refs: scylladb/scylladb#25621
Fixes: scylladb/scylladb#25721

Backport: The test is primarily for an issue found in 2025.1, so it
needs to be backported to all the 2025.x branches.

Closes scylladb/scylladb#25685
2025-09-01 13:59:47 +02:00
Patryk Jędrzejczak
ba5b5c7d2f gossip: add recovery_leader to gossip_digest_syn
In the new Raft-based recovery procedure, live nodes join the new
group 0 one by one during a rolling restart. There is a time window when
some of them are in the old group 0, while others are in the new group
0. This causes a group 0 mismatch in `gossiper::handle_syn_msg`. The
current solution for this problem is to ignore group 0 mismatches if
`recovery_leader` is set on the local node and to ask the administrator
to perform the rolling restart in the following way:
- set `recovery_leader` in `scylla.yaml` on all live nodes,
- send the `SIGHUP` signal to all Scylla processes to reload the config,
- proceed with the rolling restart.

This commit makes `gossiper::handle_syn_msg` ignore group 0 mismatches
when exactly one of the two gossiping nodes has `recovery_leader` set.
We achieve this by adding `recovery_leader` to `gossip_digest_syn`.
This change makes setting `recovery_leader` earlier on all nodes and
reloading the config unnecessary. From now on, the administrator can
simply restart each node with `recovery_leader` set.

However, note that nodes that join group 0 must have `recovery_leader`
set until all nodes join the new group 0. For example, assume that we
are in the middle of the rolling restart and one of the nodes in the new
group 0 crashes. It must be restarted with `recovery_leader` set, or
else it would reject `gossip_digest_syn` messages from nodes in the old
group 0. To avoid problems in such cases, we will continue to recommend
setting `recovery_leader` in `scylla.yaml` instead of passing it as
a command line argument.
2025-07-23 15:36:57 +02:00
Patryk Jędrzejczak
445a15ff45 db/config, gms/gossiper: change recovery_leader to UUID
We change the type of the `recovery_leader` config parameter and
`gossip_config::recovery_leader` from sstring to UUID. `recovery_leader`
is supposed to store host ID, so UUID is a natural choice.

After changing the type to UUID, if the user provides an incorrect UUID,
parsing `recovery_leader` will fail early, but the start-up will
continue. Outside the recovery procedure, `recovery_leader` will then be
ignored. In the recovery procedure, the start-up will fail on:

```
throw std::runtime_error(
        "Cannot start - Raft-based topology has been enabled but persistent group 0 ID is not present. "
        "If you are trying to run the Raft-based recovery procedure, you must set recovery_leader.");
```
2025-07-23 15:36:56 +02:00
Gleb Natapov
a221b2bfde gossiper: do not assume that id->ip mapping is available in failure_detector_loop_for_node
failure_detector_loop_for_node may be started on a shard before id->ip
mapping is available there. Currently the code treats missing mapping
as an internal error, but it uses its result for debug output only, so
lets relax the code to not assume the mapping is available.

Fixes #23407

Closes scylladb/scylladb#24614
2025-07-01 11:33:20 +03:00
Benny Halevy
4bd0845fce gossiper: make send_gossip_echo cancellable
Currently send_gossip_echo has a 22 seconds timeout
during which _abort_source is ignored.

Mark the verb as cancellable so it can be canceled
on shutdown / abort.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-30 11:46:10 +03:00
Benny Halevy
fa1c3e86a9 gossiper: add send_echo helper
CAll send_gossip_echo using a centralized helper.
A following patch will make it abortable.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-30 11:45:51 +03:00
Benny Halevy
e06d226d08 gossiper: failure_detector_loop_for_node: ignore abort_requested_exception
Aborting the failure detector happens normally
when the node shuts down.

There's no need to log anything about it,
as long as we abort the function cleanly.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-30 11:05:24 +03:00
Benny Halevy
83c69642f7 gossiper: failure_detector_loop_for_node: check if abort_requested in loop condition
The same as the loop condition in the direct_failure_detector.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-30 11:05:24 +03:00
Benny Halevy
cecfb6dfd7 gms: gossiper: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:48 +03:00