mirror of
https://github.com/scylladb/scylladb.git
synced 2026-04-26 19:35:12 +00:00
This reverts commit 45f5efb9ba.
The load_and_repair_paxos_state function was introduced in
scylladb/scylladb#24478, but it has never been tested or proven useful.
One set of problems stems from its use of local data structures
from a remote shard. In particular, system_keyspace and schema_ptr
cannot be directly accessed from another shard — doing so is a bug.
More importantly, load_paxos_state on different shards can't ever
return different values. The actual shard from which data is read is
determined by sharder.shard_for_reads, and storage_proxy will jump
back to the appropriate shard if the current one doesn't match. This
means load_and_repair_paxos_state can't observe paxos state from
write-but-not-read shard, and therefore will never be able to
repair anything.
We believe this explicit Paxos state read-repair is not needed at all.
Any paxos state read which drives some paxos round forward is already
accompanied by a paxos state write. Suppose we wrote the state to the
old shard but not to the new shard (because of some error) while
streaming is already finished. The RPC call (prepare or accept) will
return error to the coordinator, such replica response won't affect
the current round. This write won't affect any subsequent paxos rounds
either, unless in those rounds the write actually succeeds on both
shards, effectively 'auto-repairing' paxos state.
Same if we managed to write to the new shard but not to the old shard.
Any subsequent reads will observe either the old state or the new
state (if the tablet already switched reads to the new shard). In any
case, we'll have to write the state to all relevant shards
from sharder.shard_for_writes (one or two) before sending rpc
response, making this state visible for all subsequent reads.
Thus, the monotonicity property ("once observed, the state must always
be observed") appears to hold without requiring explicit read-repair
and load_and_repair_paxos_state is not needed.
Closes scylladb/scylladb#24926
56 lines
1.9 KiB
C++
56 lines
1.9 KiB
C++
/*
|
|
* Copyright (C) 2019-present ScyllaDB
|
|
*
|
|
* Modified by ScyllaDB
|
|
*/
|
|
/*
|
|
* SPDX-License-Identifier: (LicenseRef-ScyllaDB-Source-Available-1.0 and Apache-2.0)
|
|
*/
|
|
#pragma once
|
|
|
|
#include "mutation/frozen_mutation.hh"
|
|
#include <fmt/core.h>
|
|
|
|
namespace service {
|
|
|
|
namespace paxos {
|
|
|
|
// Proposal represents replica's value associated with a given ballot. The origin uses the term
|
|
// "commit" for this object, however, Scylla follows the terminology as set by Paxos Made Simple
|
|
// paper.
|
|
// Each replica persists the proposals it receives in the system.paxos table. A proposal may be
|
|
// new, accepted by a replica, or accepted by a majority. When a proposal is accepted by majority it
|
|
// is considered "chosen" by Paxos, and we call such a proposal "decision". A decision is
|
|
// saved in the paxos table in an own column and applied to the base table during "learn" phase of
|
|
// the protocol. After a decision is applied it is considered "committed".
|
|
class proposal {
|
|
public:
|
|
// The ballot for the update.
|
|
utils::UUID ballot;
|
|
// The mutation representing the update that is being applied.
|
|
frozen_mutation update;
|
|
|
|
proposal(utils::UUID ballot_arg, frozen_mutation update_arg)
|
|
: ballot(ballot_arg)
|
|
, update(std::move(update_arg)) {}
|
|
};
|
|
|
|
// Proposals are ordered by their ballot's timestamp.
|
|
// A proposer uses it to find the newest proposal accepted
|
|
// by some replica among the responses to its own one.
|
|
inline bool operator<(const proposal& lhs, const proposal& rhs) {
|
|
return lhs.ballot.timestamp() < rhs.ballot.timestamp();
|
|
}
|
|
|
|
inline bool operator>(const proposal& lhs, const proposal& rhs) {
|
|
return lhs.ballot.timestamp() > rhs.ballot.timestamp();
|
|
}
|
|
|
|
} // end of namespace "paxos"
|
|
} // end of namespace "service"
|
|
|
|
// Used for logging and debugging.
|
|
template <> struct fmt::formatter<service::paxos::proposal> : fmt::formatter<string_view> {
|
|
auto format(const service::paxos::proposal&, fmt::format_context& ctx) const -> decltype(ctx.out());
|
|
};
|