scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-23 00:02:37 +00:00

Files

Kamil Braun cbdcc944b5 service/raft: specialized verb for failure detector pinger

We used GOSSIP_ECHO verb to perform failure detection. Now we use
a special verb DIRECT_FD_PING introduced for this purpose.

There are multiple reasons to do so.

One minor reason: we want to use the same connection as other Raft
verbs: if we can't deliver Raft append_entries or vote messages
somewhere, that endpoint should be marked dead; if we can, the
endpoint should be marked alive. So putting pings on the same
connection as the other Raft verbs is important when dealing with
weird situations where some connections are available but others are
not. Observe that in `do_get_rpc_client_idx`, we put the new verb in
the right place.

Another minor reason: we remove the awkward gossiper `echo_pinger`
abstraction which required storing and updating gossiper generation
numbers. This also removes one dependency from Raft service code to
gossiper.

Major reason 1: the gossip echo handler has a weird mechanism where a
replacing node returns errors during the replace operation to some of
the nodes. In Raft however, we want to mark servers as alive when they
are alive, including a server running on a node that's replacing
another node.

Major reason 2, related to the previous one: when server B is
replacing server A with the same IP, the failure detector will try to
ping both servers. Both servers are mapped to the same IP by the
address map, so pings to both servers will reach server B. We want
server B to respond to the pings destined for server B, but not to
pings destined for server A, so the sender can mark B alive but keep A
marked dead.

To do this, we include the destination's Raft ID in our RPCs. The
destination compares the received ID with its own. If it's different,
it returns a `wrong_destination` response, and the failure detector
knows that the ping did not reach the destination (it reached someone
else).

Yet another reason: removes "Not ready to respond gossip echo
message" log spam during replace.

2022-12-01 20:54:18 +01:00

messaging_service_fwd.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

messaging_service.cc

service/raft: specialized verb for failure detector pinger

2022-12-01 20:54:18 +01:00

messaging_service.hh

service/raft: specialized verb for failure detector pinger

2022-12-01 20:54:18 +01:00

msg_addr.hh

treewide: use Software Package Data Exchange (SPDX) license identifiers

2022-01-18 12:15:18 +01:00

rpc_protocol_impl.hh

idl: make idl headers self-sufficient

2022-08-08 08:02:27 +03:00