The status is no longer used. The function that referenced it was
removed by 5a96751534 and it was unused
back then for awhile already.
Message-Id: <ZS92mcGE9Ke5DfXB@scylladb.com>
So it would be waited on in shutdown().
Although gossiper::run holds the `_callback_running` semaphore
which is acquired in `do_stop_gossiping`, the gossip messages
it initiates in the background are never waited on.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#15493
Currently, the endpoint address is set as the new
endpoint_state RPC_ADDRESS. This is wrong since
it should be assigned with the `broadcast_rpc_address`
rather than `broadcast_address`.
This was introduced in b82c77ed9c
Instead just create an empty endpoint_state.
The RPC_ADDRESS (as well as HOST_ID) application states
are set later.
Fixesscylladb/scylladb#15458
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#15475
As promised in earlier commits:
Fixes: #7620Fixes: #13957
Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
This feature, when enabled, will modify how schema versions
are calculated and stored.
- In group 0 mode, schema versions are persisted by the group 0 command
that performs the schema change, then reused by each node instead of
being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
perform a schema change, we revert to the old digest-based way, taking
into account the possibility of having performed group0-mode schema
changes (that used persistent versions). As we will see in future
commits, this will be done by storing additional flags and tombstones
in system tables.
By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).
For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
We remove flush from set_scylla_local_param_as
since it's now redundant. We add it to
save_local_enabled_features as features need to
be available before schema commitlog replay.
We skip the flush if save_local_enabled_features
is called from topology_state_load when the features
are migrated to system.topology and we don't need
strict durability.
We want to switch system.local table to
schema commitlog, but this table is used
in host_id initialization (initialize_local_info),
so we need to replay schema commitlog before.
In this commit we gather all the actions
related to early system_keyspace initialization
in one place, before initialize_local_info_thread.
The calls to save_system_schema and recalculate_schema_version
are tied to legacy_schema_migrator::migrate and
initialize_virtual_tables calls, so they are done
separately after legacy_schema_migrator::migrate.
The `gossiper::reset_endpoint_state_map` function is supposed to acquire
a lock in order to serialize with `replicate_live_endpoints_on_change`.
The `lock_endpoint_update_semaphore` is called, but its result is a
future - and it is not co_awaited. Therefore, the lock has no effect.
This commit fixes the issue by adding missing co_await.
Fixes: #15361Closes#15362
Many of the gossiper internal functions currently use seastar threads for historical reasons,
but since they are short living, the cost of spawning a seastar thread for them is excessive
and they can be simplified and made more efficient using coroutines.
Closes#15364
* github.com:scylladb/scylladb:
gossiper: reindent do_stop_gossiping
gossiper: coroutinize do_stop_gossiping
gossiper: reindent assassinate_endpoint
gossiper: coroutinize assassinate_endpoint
gossiper: coroutinize handle_ack2_msg
gossiper: handle_ack_msg: always log warning on exception
gossiper: reindent handle_ack_msg
gossiper: coroutinize handle_ack_msg
gossiper: reindent handle_syn_msg
gossiper: coroutinize handle_syn_msg
gossiper: message handlers: no need to capture shared_from_this
gossiper: add_local_application_state: throw internal error if endpoint state is not found
gossiper: coroutinize add_local_application_state
Simplify the function. It does not need to spawn
a seastar thread.
While at it, declare it as private since it's called
only internally by the gossiper (and on shard 0).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Unlike handle_syn_msg, the warning is currently printed only
`if (_ack_handlers.contains(from.addr))`.
Unclear why. It is interesting in any case.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The handlers future is waited on under `background_msg`
which is closed in gossiper::stop so the instance is
already guranteed to be kept valid.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
If the function is called too early, the first get_endpoint_state_ptr
would throw an exception that is later caught and degraded
into a warning.
But that endpoint_state should never disappear after yielding,
so call on_internal_error in that case.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This function practically returned true from inception.
In d38deef499
it started using messaging_service().knows_version(endpoint)
that also returns `true` unconditionally, to this day
So there's no point calling it since we can assume
that `uses_host_id` is true for all versions.
Closes#15343
* github.com:scylladb/scylladb:
storage_service: fixup indentation after last patch
gossiper: get rid of uses_host_id
This function practically returned true from inception.
In d38deef499
It started using messaging_service().knows_version(endpoint)
that also returns `true` unconditionally, to this day
So there's no point calling it since we can assume
that `uses_host_id` is true for all versions.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This code was supposed to be moved into
`mutate_live_and_unreachable_endpoints`
in 2c27297dbd
but it looks like the original statements were left
in place outside the mutate function.
This patch just removes the stale code since the required
logic is already done inside `mutate_live_and_unreachable_endpoints`.
Fixesscylladb/scylladb#15296
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closes#15304
We need to const_cast `this` since the const
container() has no const invoke_on override.
Trying to fix this in seastar sharded.hh breaks
many other call sites in scylla.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This series ensures that endpoint state changes (for each single endpoint) are applied to the gossiper endpoint_state_map as a whole and on all shards.
Any failure in the process will keep the existing endpoint state intact.
Note that verbs that modify the endpoint states of multiple endpoints may still succeed to modify some of them before hitting an error and those changes are committed to the endpoint_state_map, so we don't ensure atomicity when updating multiple endpoints' states.
Fixes scylladb/scylladb#14794
Fixes scylladb/scylladb#14799
Closes#15073
* github.com:scylladb/scylladb:
gossiper: move endpoint_state by value to apply it
gossiper: replicate: make exception safe
gms: pass endpoint_state_ptr to endpoint_state change subscribers
gossiper: modify endpoint state only via replicate
gossiper: keep and serve shared endpoint_state_ptr in map
gossiper: get_max_endpoint_state_version: get state by reference
api/failure_detector: get_all_endpoint_states: reduce allocations
cdc/generation: get_generation_id_for: get endpoint_state&
gossiper: add for_each_endpoint_state helpers
gossiper: add num_endpoints
gossiper: add my_endpoint_state
This PR collects followups described in #14972:
- The `system.topology` table is now flushed every time feature-related
columns are modified. This is done because of the feature check that
happens before the schema commitlog is replayed.
- The implementation now guarantees that, if all nodes support some
feature as described by the `supported_features` column, then support
for that feature will not be revoked by any node. Previously, in an
edge case where a node is the last one to add support for some feature
`X` in `supported_features` column, crashes before applying/persisting
it and then restarts without supporting `X`, it would be allowed to boot
anyway and would revoke support for the `X` in `system.topology`.
The existing behavior, although counterintuitive, was safe - the
topology coordinator is responsible for explicitly marking features as
enabled, and in order to enable a feature it needs to perform a special
kind of a global barrier (`barrier_after_feature_update`) which only
succeeds after the node has updated its features column - so there is no
risk of enabling an unsupported feature. In order to make the behavior
less confusing, the node now will perform a second check when it tries
to update its `supported_features` column in `system.topology`.
- The `barrier_after_feature_update` is removed and the regular global
`barrier` topology command is used instead. The `barrier` handler now
performs a feature check if the node did not have a chance to verify and
update its cluster features for the second time.
JOIN_NODE rpc will be sent separately as it is a big item on its own.
Fixes: #14972Closes#15168
* github.com:scylladb/scylladb:
test: topology{_experimental_raft}: don't stop gracefully in feature tests
storage_service: remove _topology_updated_with_local_metadata
topology_coordinator: remove barrier_after_feature_update
topology_coordinator: perform feature check during barrier
storage_service: repeat the feature check after read barrier
feature_service: introduce unsupported_feature_exception
feature_service: move startup feature check to a separate function
topology_coordinator: account for features to enable in should_preempt_balancing
group0_state_machine: flush system.topology when updating features columns
The logic responsible for checking supported features agains the
currently enabled features (and features that are unsafe to disable) is
moved to a separate function, `check_features`. Currently, it is only
used from `enable_features_on_startup`, but more checks against features
in raft will be added in the commits that follow.
First replicate the new endpoint_state on all shards
before applying the replicated endpoint_state objects
to _endpoint_state_map.
Fixesscylladb/scylladb#14794
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that the endpoint_state isn't change in place
we do not need to copy it to each subscriber.
We can rather just pass the lw_shared_ptr holding
a snapshot of it.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
And restrict the accessor methods to return const pointers
or refrences.
With that, the endpoint_state_ptr:s held in the _endpoint_state_map
point to immutable endpoint_state objects - with one exception:
the endpoint_state update_timestamp may be updated in place,
but the endpoint_state_map is immutable.
replicate() replaces the endpoint_state_ptr in the map
with a new one to maintain immutability.
A later change will also make this exception safe so
replicate will guarantee strong exception safety so that all shards
are updated or none of them.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This commit changes the interface to
using endpoint_state_ptr = lw_shared_ptr<const endpoint_state>
so that users can get a snapshot of the endpoint_state
that they must not modify in-place anyhow.
While internally, gossiper still has the legacy helpers
to manage the endpoint_state.
Fixesscylladb/scylladb#14799
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>