In the next commit we want to add an optimization that relies on
precise control over the lifetime of cas_request. In particular, we
want the implementation of this interface in Alternator to operate on
raw references that are guaranteed to remain valid only until the
cas() future is resolved. We already depend on the same lifetime
assumptions in cas_request when used by modification_statement.
However, these assumptions are not clearly expressed in the current
interface: cas_request is taken by shared_ptr, and nothing prevents
cas() from storing that pointer inside paxos_response_handler, which
may outlive the cas() future.
This commit fixes that by taking cas_request by raw reference. This
makes it explicit that cas() does not assume ownership of the object.
Callers must ensure that the referenced object remains valid until
the returned future is resolved.
In #26408 a write_handler_destroy_promise class was introduced to
wait for abstract_write_response_handler instances destruction. We
strived to minimize the memory footprint of
abstract_write_response_handler, with write_handler_destroy_promise-es
we required only a single additional int. It turned our that in some
cases a lot of write handlers can be scheduled for deletion
at the same time, in such cases the
vector<write_handler_destroy_promise> can become big and cause
'oversized allocation' seastar warnings.
Another concern with write_handler_destroy_promise-es was that they
were more complicated than it was worth.
In this commit we replace write_handler_destroy_promise with simple
gates. One or more gates can be attached to an
abstract_write_response_handler to wait for its destruction. We use
utils::small_vector<gate::holder, 2> to store the attached gates.
The limit 2 was chosen because we expect two gates at the same time
in most cases. One is storage_proxy::_write_handlers_gate,
which is used to wait for all handlers in
cancel_all_write_response_handlers. Another one can be attached by
a caller of cancel_write_handlers. Nothing stops several
cancel_write_handlers to be called at the same time, but it should be
rare.
The sizeof(utils::small_vector<gate::holder, 2>) == 40, this is
40.0 / 488 * 100 ~ 8% increase in
sizeof(abstract_write_response_handler), which seems acceptable.
Fixesscylladb/scylladb#26788
Refactor the counter update to split the functions and have them called
by the storage proxy to prepare for a later change.
Previously in mutate_counter the storage proxy calls the replica
function apply_counter_update that does a few things:
1. checks that the operation can be done: check timeout, disk utilization
2. acquire counter locks
3. do read-modify-write and transform the counter mutation
4. apply the mutation in the replica
In this commit we change it so that these functions are split and called
from the storage proxy, so that we have better control from the storage
proxy when we change it later to work across multiple shards. For
example, we will want to acquire locks on multiple shards, transform it
on one shard, and then apply the mutation on multiple shards.
After the change it works as follows in storage proxy:
1. acquire counter locks
2. call replica prepare to check the operation and transform the mutation
3. call replica apply to apply the transformed mutation
Slightly reorganize the mutate counter function to prepare it for a
later change.
Move the code that finds the read shard and invokes the rest of the
function on the read shard to the caller function. This simplifies the
function mutate_counter_on_leader_and_replicate which now runs on the
read shard and will make it easier to extend.
This function is intended to replace start_write() in subsequent
commits. It provides the following benefits:
* Remove duplication: All start_write() call sites must run the fence
check after the operation completes. run_fenceable_write() encapsulates
this pattern.
* Fix a race: To ensure no new stale write operations occur during
cleanup, a fence check before start_write() was previously used.
However, yields in several code paths between the check and
start_write() made it non-atomic, allowing a stale operation to slip in
if the fence_version was updated in between.
* Optimize waiting: We do not need to wait for all operations—only for
vnode-based, non-local tables with versions smaller than the current
fence_version.
Future commits will extend update_fence_version, and it is simpler to do
so if the function resides in storage_proxy. Additionally, fence_version
is the only field this function accesses, and it is used solely within
storage_proxy, making this change natural on its own.
As noted in the code comments, start_write() does not need to be held
during counter replication; it is required only while performing local
storage modifications. Move the start_write() call and the fence
check down to mutate_counter_on_leader_and_replicate().
Additionally, mutate_counters_on_leader() is updated to check for
possible stale_topology_exception() and properly package them
in the resulting exception_variant structure.
shared_ptr<abstract_write_response_handler> instances are captured in
the lmutate/rmutate lambdas of send_to_live_endpoints(). As a result,
an abstract_write_response_handler object may outlive its removal from
the _response_handlers map. We use write_handler_destroy_promise to
wait for such pending instances in cancel_write_handlers() and
cancel_all_write_response_handlers() to prevent use-after-free.
A better long-term solution might be to replace shared_ptr with
unique_ptr for abstract_write_response_handler and use a separate gate
to track the lmutate/rmutate lambdas. We do not actually need to wait
for these lambdas to finish before sending a timeout or error response
to the client, as we currently do in ~abstract_write_response_handler.
Fixesscylladb/scylladb#26355
The cancel_write_handlers() method was assumed to be called in a thread
context, likely because it was first used from gossiper events, where a
thread context already existed. Later, this method was reused in
abort_view_writes() and abort_batch_writes(), where threads are created
on the fly and appear redundant.
The drain_on_shutdown() method also used a thread, justified by some
"delicate lifetime issues", but it is unclear what that actually means.
It seems that a straightforward co_await should work just fine.
This will allow us to communicate with CDC from higher layers. We plan
to use it to reduce the number of read-before-writes with preimages by
passing the row selected in upper layers.
When a node is being replaced, it enters a "left" state while still
owning tokens. Before this patch, this is also the time when we start
draining hints targeted to this node, so the hints may get sent before
the token ownership gets migrated to another replica, and these hints
may get lost.
In this patch we postpone the hint draining for the "left" nodes to
the time when we know that the target nodes no longer hold ownership
of any tokens - so they're no longer referenced in topology. I'm
calling such nodes "released".
Before this change, when we were starting draining hints, we knew the
IP addresses of the target nodes. We lose this information after entering
"left" stage, so when draining hints after a node is "released", we
can't drain the hints targeted to a specific IP instead of host_id.
We may have hints targeted to IPs if the migration rom IP-based to
host_ID-based hints didn't happen yet. The migration happens
when enabling a cluster feature since 2024.2.0, so such hints can
only exist if we perform a direct upgrade from a version before
2024.2.0 to a version that has this change (2025.4.0+).
To avoid losing hints completely when such an upgrade is combined
with a node removal/replacement, we still drain hints when the node
enters a "left" state and the migration of hints to host_id wasn't
performed yet. For these drains, the problematic scenario can't
occur because it only affects tablets, and when upgrading from
a version before 2024.2.0, no tablets can exist yet.
If we perform such a drain, we no longer need to drain hints when
entering the "released" state, so we only drain when entering that
state if the migration was already completed.
With this setup, we'll always drain hints at least once when a node
is leaving. However, if the migration to host_ids finishes between
entering the "left" state and the "released" state, we'll attempt
to drain the hints twice. This shouldn't be problem though because
each `drain_for()` is performed with the `_drain_lock` and after
a `hint_endpoint_manger` is drained, it's removed, so we won't try
to drain it twice.
This patch also includes a test for verifying that hints are properly
replayed after a node replace.
Fixes https://github.com/scylladb/scylladb/issues/24980Closesscylladb/scylladb#24981
The latter is recommended in seastar, and the former was left as
compatibility alias. Latest seastar explicitly marks it as deprecated so
once the submodule is updated, compilation logs will explode.
Most of the patch is generated with
for f in $(git grep -l '\<distributed<[A-Za-z0-9:_]*>') ; do sed -e 's/\<distributed<\([A-Za-z0-9:_]*\)>/sharded<\1>/g' -i $f; done
for f in $(git grep -l distributed.hh); do sed -e 's/distributed.hh/sharded.hh/' -i $f ; done
and a small manual change in test/perf/perf.hh
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#26136
This PR consists of three parts:
* Small refactoring of the fencing APIs in storage_proxy (renames + comments + some functions were extracted)
* Implement the fencing for LWT verbs itself. This includes checking the fencing token before and after local replica data accesses.
* Two new `test.py` tests in `test_fencing.py`, which check the fencing in some real-world scenarios.
Backport: no need -- fencing for LWT requests is needed primarily for LWT over tablets, which is not released yet.
Fixesscylladb/scylladb#22332Closesscylladb/scylladb#25550
* https://github.com/scylladb/scylladb:
test_tablets_lwt: eliminate redundant disable_tablet_balancing
test_fencing: add test_lwt_fencing_upgrade
pylib: extract upgrade helpers from test_sstable_compression_dictionaries_upgrade.py
test_fencing: add test_fenced_out_on_tablet_migration_while_handling_paxos_verb
test_fencing: test_fence_lwt_during_bootstap
pylib/rest_client.py: encode injection name
storage_proxy_stats: add fenced_out_requests metric
storage_proxy: add fencing to Paxos verbs
storage_proxy::apply_fence: add overload that throws on failure
storage_proxy: extract apply_fence_result
sp::apply_fence: rename to apply_fence_on_ready
sp::apply_fence: rename to check_fence
sp::apply_fence: make non-generic
As requested in #22120, moved the files and fixed other includes and build system.
Moved files:
- query.cc
- query-request.hh
- query-result.hh
- query-result-reader.hh
- query-result-set.cc
- query-result-set.hh
- query-result-writer.hh
- query_id.hh
- query_result_merger.hh
Fixes: #22120
This is a cleanup, no need to backport
Closesscylladb/scylladb#25105
This commit refactors a repeated pattern that applies the fence
and embeds the exception into the exception_variant class by
extracting it into a separate method.
This overload performs the fence check only when the future is ready.
In this commit, we give it a more descriptive name to better reflect
its behavior. Additionally, we add extensive comments explaining the
overall fencing scheme and the motivation behind this specific overload.
We plan to introduce several additional apply_fence overloads in
upcoming commits. To avoid ambiguity, this change renames the existing
base function to check_fence.
Consider the following scenario:
- Current replica set is [A, B, C]
- write succeeds on [A, B], and a hint is logged for node C
- before the hint is replayed, D bootstraps and the token migrates from C to D
- hint is replayed to node C while D is pending, but it's too late, since streaming for that token is already done
- C is cleaned up, replayed data is lost, and D has a stale copy until next repair.
In the scenario we effectively fail to send the hint. This scenario is also more likely to happen with tablets,
as it can happen for every tablet migration.
This issue is particularly detrimental to materialized views. View updates use hints by default and a specific
view update may be sent to just one view replica (when a single base replica has a different row state due to
reordering or missed writes). When we lose a hint for such a view update, we can generate a persistent inconsistency
between the base and view - ghost rows can appear due to a lost tombstone and rows may be missing in the view due
to a lost row update. Such inconsistencies can't be fixed neither by repairing the view or the base table.
To handle this, in this patch we add the pending replicas to the list of targets of each hint, even if the original
target is still alive.
This will cause some updates to be redundant. These updates are probably unavoidable for now, but they shouldn't
be too common either. The scenarios for them are:
1. managing to send the hint to the source of a migrating replica before streaming that its token - the write will
arrive on the pending replica anyway in streaming
2. the hint target not being the source of the migration - if we managed to apply the original write of the hint to
the actual source of the migration, the pending replica will get it during streaming
3. sending the same hint to many targets at a similar time - while sending to each target, we'll see the same pending
replica for the hint so we'll send it multiple times
4. possible retries where even though the hint was successfully sent to the main target, we failed to send it to the
pending replica, so we need to retry the entire write
This patch handles both tablet migrations and tablet rebuilds. In the future, for tablet migrations, we can avoid
sending the hint to pending replias if the hint target is not the source fo the migration, which would allow us to
avoid the redundant writes 2 and 3. For rack-aware RF, this will be as simple as checking whether the replicas are
in the same rack.
We also add a test case reproducing the issue.
Co-Authored-By: Raphael S. Carvalho <raphaelsc@scylladb.com>
Fixes https://github.com/scylladb/scylladb/issues/19835Closesscylladb/scylladb#25590
The previous implementation did not handle topology changes well:
* In node_local_only mode with CL=1, if the current node is pending,
the CL is raised to 2, causing unavailable_exception.
* If the current tablet is in write_both_read_old and we read with
node_local_only on the new node, the replica list is empty.
This patch changes node_local_only mode to always use my_host_id as
the replica list. An explicit check ensures the current node is a
replica for the operation; otherwise on_internal_error is called.
We add the remove_non_local_host_ids() helper, which
will be used in the next commit to support the read
path. HostIdVector concept is introduced to be able
to handle both host_id_vector_replica_set and
host_id_vector_topology_change uniformly.
The storage_proxy_coordinator_mutate_options class
is declared outside of storage_proxy to avoid C++
compiler complaints about default field initializers.
In particular, some storage_proxy methods use this
class for optional parameters with default values,
which is not allowed when the class is defined inside
storage_proxy.
Add a per-request flag that restricts query execution
to the local node by filtering out all non-local replicas.
Standard consistency level (CL) rules still apply:
if the local node alone cannot satisfy the
requested CL, an exception is thrown.
This flag is required for Paxos state access, where
reads and writes must target only the local node.
As a side effect, this also enables the implementation
of scylladb/scylladb#16478, which proposes a CQL
extension to expose 'local mode' query execution to users.
Support for this flag in storage_proxy's read and write
code paths will be added in follow-up commits.
In upcoming commits, we want to add a node_local_only flag to both read
and write paths in storage_proxy. This requires passing the flag from
query_processor to the part of storage_proxy where replica selection
decisions are made.
For reads, it's sufficient to add the flag to the existing
coordinator_query_options class. For writes, there is no such options
container, so we introduce coordinator_mutate_options in this commit.
In the future, we may move some of the many mutate() method arguments
into this container to simplify the code.
Most of the create_write_response_handler overloads follow the same
signature pattern to satisfy the sp::mutate_prepare call. The one which
doesn't follow it is invoked by others and is responsible for creating
a concrete handler instance. In this refactoring commit we rename
it to make_write_response_handler to reduce confusion.
This is a refactoring commit. We remove extra lambda parameters from
mutate_prepare since the CreateWriteHandler lambda can simply
capture them.
We can't std::move(permit) in another mutate_prepare overload,
because each handler wants its own copy of this pemit.
Introduce paxos_store abstraction to isolate Paxos state access.
Prepares for supporting either system.paxos or a co-located
table as the storage backend.
When a node shuts down, in storage service, after storage_proxy RPCs are stopped, some write handlers within storage_proxy may still be waiting for background writes to complete. These handlers hold appropriate ERMs to block schema changes before the write finishes. After the RPCs are stopped, these writes cannot receive the replies anymore.
If, at the same time, there are RPC commands executing `barrier_and_drain`, they may get stuck waiting for these ERM holders to finish, potentially blocking node shutdown until the writes time out.
This change introduces cancellation of all outstanding write handlers from storage_service after the storage proxy RPCs were stopped.
Fixesscylladb/scylladb#23665
Backport: since this fixes an issue that frequently causes issues in CI, backport to 2025.1, 2025.2, and 2025.3.
Closesscylladb/scylladb#24714
* https://github.com/scylladb/scylladb:
storage_service: Cancel all write requests on storage_proxy shutdown
test: Add test for unfinished writes during shutdown and topology change
During a graceful node shutdown, RPC listeners are stopped in `storage_service::drain_on_shutdown`
as one of the first steps. However, even after RPCs are shut down, some write handlers in
`storage_proxy` may still be waiting for background writes to complete. These handlers retain the ERM.
Since the RPC subsystem is no longer active, replies cannot be received, and if any RPC commands are
concurrently executing `barrier_and_drain`, they may get stuck waiting for those writes. This can block
the messaging server shutdown and delay the entire shutdown process until the write timeout occurs.
This change introduces the cancellation of all outstanding write handlers in `storage_proxy`
during shutdown to prevent unnecessary delays.
Fixesscylladb/scylladb#23665
On shutdown of batchlog manager, abort all writes of replayed batches
by the batchlog manager.
To achieve this we set the appropriate write_type to BATCH, and on
shutdown cancel all write handlers with this type.
When replaying a batch mutation from the batchlog manager and sending it
to all replicas, create the write response handler as cancellable.
To achieve this we define a new wrapper type for batchlog mutations -
batchlog_replay_mutation, and this allows us to overload
create_write_response_handler for this type. This is similar to how it's
done with hint_wrapper and read_repair_mutation.
Currently mutate_internal has a boolean parameter `counter_write` that
indicates whether the write is of counter type or not.
We replace it with a more general parameter that allows to indicate the
write type.
It is compatible with the previous behavior - for a counter write, the
type COUNTER is passed, and otherwise a default value will be used
as before.
This PR is a step towards enabling LWT for tablet-based tables.
It pursues several goals:
* Make it explicit that the tablet can't migrate after the `cas_shard` check in `selec_statement/modification_statement`. Currently, `storage_proxy::cas` expects that the client calls it on a correct shard -- the one which owns the partition key the LWT is running on. There reasons for that are explained in [this commit](f16e3b0491 (diff-1073ea9ce4c5e00bb6eb614154f523ba7962403a4fe6c8cd877d1c8b73b3f649)) message. The statements check the current shard and invokes `bounce_to_shard` if it's not the right one. However , the erm strong pointer is only captured in `storage_proxy::cas` and until that moment there is no explicit structure in the code which would prevent the ongoing migrations. In this PR we introduce such stucture -- `erm_handle`. We create it before the `cas_check` and pass it down to `storage_proxy::cas` and `paxos_response_handler`.
* Another goal of this PR is an optimization -- we don't want to hold erm for the duration of entire LWT, unless it directly affects the current tablet. The is a `tablet_metadata_guard` class which is used for long running tablet operations. It automatically switches to a new erm if the topology change represented by the new erm doesn't affect the current tablet. We use this class in `erm_handle` if the table uses tablets. Otherwise, `erm_handle` just stores erm directly.
* Fixes [shard bouncing issue in alternator](https://github.com/scylladb/scylladb/issues/17399)
Backport: not needed (new feature).
Closesscylladb/scylladb#24495
* github.com:scylladb/scylladb:
LWT: make cas_shard non-optional in sp::cas
LWT: create cas_shard in select_statement
LWT: create cas_shard in modification and batch statements
LWT: create cas_shard in alternator
LWT: use cas_shard in storage_proxy::cas
do_query_with_paxos: remove redundant cas_shard check
storage_proxy: add cas_shard class
sp::cas_shard: rename to get_cas_shard
token_metadata_guard: a topology guard for a token
tablet_metadata_guard: mark as noncopyable and nonmoveable
Take cas_shard parameter in sp::cas and pass token_metadata_guard down to paxos_response_handler.
We make cas_shard parameter optional in storage_proxy methods
to make the refactoring easier. The sp::cas method constructs a new
token_metadata_guard if it's not set. All call sites pass null
in this commit, we will add the proper implementation in the next
commits.
The sp::cas method must be called on the correct shard,
as determined by sp::cas_shard. Additionally, there must
be no asynchronous yields between the shard check and
capturing the erm strong pointer in sp::cas. While
this condition currently holds, it's fragile and
easy to break.
To address this, future commits will move the capture of
token_metadata_guard to the call sites of sp::cas, before
performing the shard check.
As a first step, this commit introduces a cas_shard class
that wraps both the target shard and a token_metadata_guard
instance. This ensures the returned shard remains valid for
the given tablet as long as the guard is held.
In the next commits, we’ll pass a cas_shard instance
to sp::cas as a separate parameter.
The primary motivation for this change is to reduce the time during which the Effective Replication Map (ERM) is retained by the mapreduce service. This ensures that long aggregate queries do not block topology operations. As ScyllaDB is generally transitioning towards tablets, and using tablets simplifies work dispatching, the decision was made to design the new algorithm specifically for tablets. The goal of the algorithm is to divide the work in such a way that each `tablet_replica` (that is <host, shard> pair) processes two tablets at a time.
The new algorithm can be summarized as follows:
1. Prepare a tablet_replica -> partition_range mapping where the values cover the entire space.
2. For each tablet_replica, in parallel, take two partition ranges and dispatch them to the node hosting the replica. The ERM is released and re-acquired in each iteration, allowing the destination (i.e., tablet_replica) to change for each
artition range (in such cases, the partition range is assigned to the appropriate tablet_replica).
In step 1, the main difference compared to the old algorithm (dispatch_to_vnodes) is that partition ranges are assigned to a tablet_replica rather than just the host.
In step 2, the main difference is that the work is divided into smaller batches, and the ERM is released and re-acquired for each batch.
In the current implementation, each node can correctly handle every partition range, even if the mapreduce supercoordinator does not retain the ERM and the range is absent locally. This is because mapreduce_service::execute_on_this_shard creates a new pager that coordinates the partition range read, including obtaining its own ERM. However, every partition range that is absent locally is handled by shard 0. Therefore, proper routing of partition ranges is necessary to avoid shard 0 overload. This is why, in step 2, the ERM is retained during each batch processing, and the tablet_replica is refreshed for each processed range.
Additionally, shard_id is added to mapreduce request. When shard_id is set, the entire partition range is handled by the specified shard. As the new tablet-aware mapreduce algorithm balances the workload across shards, shard_id ensure that the balance is preserved, even during events such as tablet splits.
This patch series:
- Refactors a bit mapreduce service, to facilitate having two algorithm versions (one for vnodes and one for tablets).
- Implements tablet-aware dispatching algorithm.
- Adds shard_id to mapreduce request and uses the information to handle requests entirely by selected shard.
- Adds test_long_query_timeout_erm to verify the new functionality.
Fixes: scylladb#21831
No backport, as it is rather new feature than a bugfix.
Closesscylladb/scylladb#24383
* github.com:scylladb/scylladb:
mapreduce: add missing comma and space in mapreduce_request operator<<
mapreduce: add shard_id_hint to mapreduce request
test: add test_long_query_timeout_erm
mapreduce: add tablet-aware dispatching algorithm
storage_proxy: make storage_proxy::is_alive public
mapreduce: remove _shared_token_metadata from mapreduce_service
mapreduce: move dispatching logic to dispatch_to_vnodes
mapreduce: remove underscores from variable names
mapreduce: move req_with_modified_pr handling to a new function
mapreduce: change next_vnode lambda to get_next_partition_range function
The following was seen:
```
!WARNING | scylla[6057]: [shard 12:strm] seastar_memory - oversized allocation: 212992 bytes. This is non-fatal, but could lead to latency and/or fragmentation issues. Please report: at
[Backtrace #0]
void seastar::backtrace<seastar::current_backtrace_tasklocal()::$_0>(seastar::current_backtrace_tasklocal()::$_0&&, bool) at ./build/release/seastar/./seastar/include/seastar/util/backtrace.hh:89
(inlined by) seastar::current_backtrace_tasklocal() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:99
seastar::current_tasktrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:136
seastar::current_backtrace() at ./build/release/seastar/./build/release/seastar/./seastar/src/util/backtrace.cc:169
seastar::memory::cpu_pages::warn_large_allocation(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:848
seastar::memory::allocate_slowpath(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:911
operator new(unsigned long) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/memory.cc:1706
std::allocator<dht::token_range_endpoints>::allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/allocator.h:196
(inlined by) std::allocator_traits<std::allocator<dht::token_range_endpoints> >::allocate(std::allocator<dht::token_range_endpoints>&, unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/alloc_traits.h:515
(inlined by) std::_Vector_base<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_allocate(unsigned long) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:380
(inlined by) void std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> >::_M_realloc_append<dht::token_range_endpoints const&>(dht::token_range_endpoints const&) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/vector.tcc:596
locator::describe_ring(replica::database const&, gms::gossiper const&, seastar::basic_sstring<char, unsigned int, 15u, true> const&, bool) at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/bits/stl_vector.h:1294
std::__n4861::coroutine_handle<seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type>::resume() const at /usr/lib/gcc/x86_64-redhat-linux/14/../../../../include/c++/14/coroutine:242
(inlined by) seastar::internal::coroutine_traits_base<std::vector<dht::token_range_endpoints, std::allocator<dht::token_range_endpoints> > >::promise_type::run_and_dispose() at ././seastar/include/seastar/core/coroutine.hh:80
seastar::reactor::do_run() at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:2635
std::_Function_handler<void (), seastar::smp::configure(seastar::smp_options const&, seastar::reactor_options const&)::$_0>::_M_invoke(std::_Any_data const&) at ./build/release/seastar/./build/release/seastar/./seastar/src/core/reactor.cc:4684
```
Fix by using chunked_vector.
Fixes#24158Closesscylladb/scylladb#24561
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
"
This is series starts conversion of the gossiper to use host ids to
index nodes. It does not touch the main map yet, but converts a lot of
internal code to host id. There are also some unrelated cleanups that
were done while working on the series. On of which is dropping code
related to old shadow round. We replaced shadow round with explicit
GOSSIP_GET_ENDPOINT_STATES verb in cd7d64f588
which is in scylla-4.3.0, so there should be no compatibility problem.
We already dropped a lot of old shadow round code in previous patches
anyway.
I tested manually that old and new node can co-exist in the same
cluster,
"
* 'gleb/gossiper-host-id-v2' of github.com:scylladb/scylla-dev: (33 commits)
gossiper: drop unneeded code
gossiper: move _expire_time_endpoint_map to host_id
gossiper: move _just_removed_endpoints to host id
gossiper: drop unused get_msg_addr function
messaging_service: change connection dropping notification to pass host id only
messaging_service: pass host id to remove_rpc_client in down notification
treewide: pass host id to endpoint_lifecycle_subscriber
treewide: drop endpoint life cycle subscribers that do nothing
load_meter: move to host id
treewide: use host id directly in endpoint state change subscribers
treewide: pass host id to endpoint state change subscribers
gossiper: drop deprecated unsafe_assassinate_endpoint operation
storage_service: drop unused code in handle_state_removed
treewide: drop endpoint state change subscribers that do nothing
gossiper: drop ip address from handle_echo_msg and simplify code since host_id is now mandatory
gossiper: start using host ids to send messages earlier
messaging_service: add temporary address map entry on incoming connection
topology_coordinator: notify about IP change from sync_raft_topology_nodes as well
treewide: move everyone to use host id based gossiper::is_alive and drop ip based one
storage_proxy: drop unused template
...