mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-26 11:30:36 +00:00
Implement an RPC to forward add_entry calls from the follower to leader. Bounce & retry in case of not_a_leader. Do not retry in case of uncertainty - this can lead to adding duplicate entries. The feature is added to core Raft since it's needed by all current clients - both topology and schema changes. When forwarding an entry to a remote leader we may get back a term/index pair that conflicts (has the same index, but is with a higher term) with a local entry we're still waiting on. This can happen, e.g. because there was a leader change and the log was truncated, but we still haven't got the append_entries RPC from the new leader, still haven't truncated the log locally, still haven't aborted all the local waits for truncated entries. Only remove the offending entry from the wait list and abort it. There may be entries labeled with an older term to the right (with higher commit index) of the conflicting entry. However, finding them, would require a linear scan. If we allow it, we may end up doing this linear scan for *every* conflicting entry during the transition period, which brings us to N^2 complexity of this step. At the same time, as soon as append_entries that commits a higher-term entry with the same index reaches the follower, the waits for the respective truncated entry will be aborted anyway (see notify_waiters() which sets dropped_entry exception), so the scan is unnecessary. Similarly to being able to add entries, allow to modify Raft group configuration on a follower. The implementation works the same way as adding entries - forwards the command to the leader. Now that add_entry() or modify_config never throws not_a_leader, it's more likely to throw timed_out_error, e.g. in case the network is partitioned. Previously it was only possible due to a semaphore wait timeout, and this scenario was not tested. Handle timed_out_error on RPC level to let the existing tests (specifically the randomized nemesis test) pass.