This commit adds fencing support to all Paxos verbs:
* Pass an optional (for backward compatibility) fencing_token as a
parameter to the prepare, accept, learn, and prune verbs.
* Call apply_fence twice — before and after accessing local data. This
ensures that if the coordinator is fenced out mid-request, the replica
does not return success, which would otherwise incorrectly contribute
to achieving the target CL. Without this, a user might observe
successful writes that become unreadable after the topology
operation completes.
* For prune, call apply_fence only once because it does not return a
response to the LWT coordinator.
Fixesscylladb/scylladb#22332
A future user of position_in_partition.idl doesn't need full_position
and so doesn't want to include full_position.hh to fix compile errors
when including position_in_partition.idl.hh.
Extract it to a separate idl file: it has a single user in a
storage_proxy VERB.
This change introduces a new truncate_with_tablets RPC with a parameter
of type service::frozen_topology_guard. This is materialized on replica
nodes into a topology_guard which guarantees that truncate is performed
under a global session, which, in turn, makes sure that we don't execute
truncate as a result of stale RPCs.
Also, this RPC does not have a timeout. Timeout will be handled on the
coordinator side, and the truncate operation will not be allowed to time
out.
RPCs from old nodes will still use old format so translation will be
used in this case. The change is backwards compatible thanks to RPC
extensibility.
As for regular mutations, we do the check
twice in handle_counter_mutation, before
and after applying the mutations. The last
is important in case fence was moved while
we were handling the request - some post-fence
actions might have already happened at this
time, so we can't treat the request as successful.
For example, if topology change coordinator was
switching to write_both_read_new, streaming
might have already started and missed this update.
In mutate_counters we can use a single fencing_token
for all leaders, since all the erms are processed
without yields and should underneath share the
same token_metadata.
We don't pass fencing token for replication explicitly in
replicate_counter_from_leader since
mutate_counter_on_leader_and_replicate doesn't capture erm
and if the drain on the coordinator timed out the erm for
replication might be different and we should use the
corresponding (maybe the new one) topology version for
outgoing write replication requests. This delayed
replication is similar to any other background activity
(e.g. writing hints) - it takes the current erm and
the current token_metadata version for outgoing requests.
We are going to add fencing for counter mutations,
this means handle_counter_mutation will sometimes throw
stale_topology_exception. RPC doesn't marshall exceptions
transparently, exceptions thrown by server are delivered
to the client as a general remote_verb_error, which is not
very helpful.
The common practice is to embed exceptions into handler
result type. In this commit we use already existing
exception_variant as an exception container. We mark
exception_variant with [[version]] attribute in the idl
file, this should handle the case when the old replica
(without exception_variant in the signature) is replying
to the new one.
In this commit we just pass a fencing_token
through hint_mutation RPC verb.
The hints manager uses either
storage_proxy::send_hint_to_all_replicas or
storage_proxy::send_hint_to_endpoint to send a hint.
Both methods capture the current erm and use the
corresponding fencing token from it in the
mutation or hint_mutation RPC verb. If these
verbs are fenced out, the server stale_topology_exception
is translated to a mutation_write_failure_exception
on the client with an appropriate error message.
The hint manager will attempt to resend the failed
hint from the commitlog segment after a delay.
However, if delivery is unsuccessful, the hint will
be discarded after gc_grace_seconds.
Closes#14580
On the call site we use the version captured in
read_executor/erm/token_metadata. In the handlers
we use apply_fence twice just like in mutation RPC.
Fencing was also added to local query calls, such as
query_result_local in make_data_request. This is for
the case when query coordinator was isolated from
topology change coordinator and didn't receive
barrier_and_drain.
At the call site, we use the version, captured
in erm/token_metadata. In the handler, we use
double checking, apply_fence after the local
write guarantees that no mutations
succeed on coordinators if the fence version
has been updated on the replica during the write.
Fencing was also added to mutate_locally calls
on request coordinator, for the case
if this coordinator was isolated from the
topology change coordinator and missed the
barrier_and_drain command.
read_command, partition_key and paxos::proposal
are marked with [[ref]]. partition_key contains
dynamic allocations and can be big. proposal
contains frozen_mutation, so it's also
contains dynamic allocations.
The call sites are fine, the already passed
by reference.
We had a redundant copies at the call sites of
these methods. Class read_command does not
contain dynamic allocations, but it's quite
but by itself (368 bytes).
We had a redundant copy in receive_mutation_handler
forward_fn callback. This frozen_mutation is
dynamically allocated and can be arbitrary large.
Fixes: #12504
We want to transmit the last position as determined by the replica on
both result and digest reads. Result reads already do that via the
query::result, but digest reads don't yet as they don't return the full
query::result structure, just the digest field from it. Add the last
position to the digest read's return value and collect these in the
digest resolver, along with the returned digests.
Define table_schema_version as a distinct tagged_uuid class,
So it can be differentiated from other uuid-class types,
in particular table_id.
Added reversed(table_schema_version) for convenience
and uniformity since the same logic is currently open coded
in several places.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Add include statements to satisfy dependencies.
Delete, now unneeded, include directives from the upper level
source files.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This commit modifies the read RPC and the storage_proxy logic so that
the coordinator knows whether a read operation failed due to rate limit
being exceeded, and returns `exceptions::rate_limit_exception` if that
happens.
This commit modifies the storage_proxy logic so that the coordinator
knows whether a write operation failed due to rate limit being exceeded,
and returns `exceptions::rate_limit_exception` when that happens.
Instead of lengthy blurbs, switch to single-line, machine-readable
standardized (https://spdx.dev) license identifiers. The Linux kernel
switched long ago, so there is strong precedent.
Three cases are handled: AGPL-only, Apache-only, and dual licensed.
For the latter case, I chose (AGPL-3.0-or-later and Apache-2.0),
reasoning that our changes are extensive enough to apply our license.
The changes we applied mechanically with a script, except to
licenses/README.md.
Closes#9937