In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.
Thus the plan is to continue debugging using the longevity test, but we need
more logs. To check whether `handle_state_normal` was called and which branches
were taken, include some INFO level logs there. Also, detect deadlocks inside
`gossiper::lock_endpoint` by reporting an error message if `lock_endpoint`
waits for the lock for too long.
Ref: scylladb/scylladb#16668Closesscylladb/scylladb#16733
* github.com:scylladb/scylladb:
gossiper: report error when waiting too long for endpoint lock
gossiper: store source_location instead of string in endpoint_permit
storage_service: more verbose logging in handle_state_normal
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.
One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt
with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127).
Modify the code so it reports an error if `lock_endpoint` waits for the
lock for more than a minute. When the issue reproduces again in
longevity, we will see if `lock_endpoint` got stuck.
"me" sstable format includes an important feature of storing the
`host_id` of the local node when writing sstables. The is crucial
for validating the sstable's `replay_position` in stats metadata as
it is valid only on the originating node and shard (#10080), therefor
we would like to make the `me` format mandatory.
before making `me` mandatory, we need to stop handling `sstable_format`
option if it is "md".
in this change
- gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if
`sstable_format` is configured with "md". and in that case, instead,
a warning is printed in the logging message to note that
this setting is not valid anymore.
- docs/architecture/sstable: note that "me" is used by default now.
after this change, "sstable_format" will only accept "me" if it's
explicitly configured. and when a server with this change joins a
cluster, it uses "md" if the any of the node in the cluster still has
`sstable_format`. practically, this change makes "me" mandatory
in a 6.x cluster, assuming this change will be included in 6.x
releases.
Fixes#16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
sstable_format comes from scylla.yaml or from the command line
arguments, and we gate scylla from unallowed sstable formats lower
than `md` when parsing the configuration, and scylla bails out
at seeing the unallowed sstable format like:
```
terminate called after throwing an instance of 'std::invalid_argument'
what(): Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me
Aborted (core dumped)
```
scylla errors out way before `feature_config_from_db_config()`
gets called -- it throws in `bpo::notify(configuration)`,
way before `func` is evaluated in `app_template::run_deprecated()`.
so, in this change, we do not handle these values anymore, and
consider it a bug if we run into any of them.
Refs #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The original code extracted only the function_name from the
source_location for logging. We'll use more information from the
source_location in later commits.
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for gms::endpoint_state, and
change update the callers of `operator<<` to use `fmt::print()`.
but we cannot drop `operator<<` yet, as we are still using the
templated operator<< and templated fmt::formatter to print containers
in scylla and in seastar -- they are still using `operator<<`
under the hood.
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16705
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.
in this change, we define a formatter for gms::heart_beat_state, and
remove its operator<<(). the only caller site of its operator<< is
updated to use `fmt::print()`
Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16652
Currently, `add_saved_endpoint` is called from two paths: One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.
In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet. However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.
Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.
This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation. An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.
Fixes#16429Closesscylladb/scylladb#16432
* github.com:scylladb/scylladb:
gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
storage_service: topology_state_load: lock endpoint for add_saved_endpoint
raft_group_registry: move on_alive error injection to gossiper
Change the mutate_live_and_unreachable_endpoints procedure
so that the called `func` would mutate a cloned
`live_and_unreachable_endpoints` object in place.
Those are replicated to temporary copies on all shards
using `foreign<unique_ptr<>>` so that the would be
automatically freed on exception.
Only after all copies are made, they are applied
on all gossiper shards in a noexcept loop
and finally, a `on_success` function is called
to apply further side effects if everything else
was replicated successfully.
The latter is still susceptible to exceptions,
but we can live with those as long as `_live_endpoints`
and `_unreachable_endpoints` are synchronized on all shards.
With that, the read-only methods:
`get_live_members_synchronized` and
`get_unreachable_members_synchronized`
become trivial and they just return the required data
from shard 0.
Fixes#15089
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#16597
State changes are processed as a batch and
there is no reason to maintain them as an ordered map.
Instead, use a std::unordered_map that is more efficient.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.
In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
None of the subscribers is doing anything before_change.
This is done before changing `on_change` in the following patch.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Have a central definition for the map held
in the endpoint_state (before changing it to
std::unordered_map).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, when loading peers' endpoint state from system.peers,
add_saved_endpoint is called.
The first instance of the endpoint state is created with the default
heart_beat_state, with both generation and version set to zero.
However, if add_saved_endpoint finds an existing instance of the
endpoint state, it reuses it, but it updates its heart_beat_state
with the local heart_beat_state() rather than keeping the existing
heart_beat_state, as it should.
This is a problem since it may confuse updates over gossip
later on via do_apply_state_locally that compares the remote
generation vs. the local generation, so they must stem from
the same root that is the endpoint itself.
Fixes#16429
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
`topology_state_load` currently calls `add_saved_endpoint`
only if it finds no endpoint_state_ptr for the endpoint.
However, this is done before locking the endpoint
and the endpoint state could be inserted concurrently.
To prevent that, a permit_id parameter was added to
`add_saved_endpoint` allowing the caller to call it
while the endpoint is locked. With that, `topology_state_load`
locks the endpoint and checks the existence of the endpoint state
under the lock, before calling `add_saved_endpoint`.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Move the `raft_group_registry::on_alive` error injection point
to `gossiper::real_mark_alive` so it can delay marking the endpoint as
alive, and calling the `on_alive` callback, but without holding
the endpoint_lock.
Note that the entry for this endpoint in `_pending_mark_alive_endpoints`
still blocks marking it as alive until real_mark_alive completes.
Fixes#16506
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.
Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.
Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.
We also update documentation by removing information irrelevant in 5.5.
Fixesscylladb/scylladb#15854
Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.
Closesscylladb/scylladb#16334
* github.com:scylladb/scylladb:
docs: update after making consistent_cluster_management mandatory
system_keyspace, main, cql_test_env: fix indendations
db: config: make consistent_cluster_management mandatory
test: boost: schema_change_test: replace disable_raft_schema_config
db: config: make override_decommission deprecated
db: config: make force_schema_commit_log deprecated
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.
The refactoring is structured as follows:
* Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
* Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
* Go over all the places which read `token_metadata` and switch them to the new version.
* Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.
These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.
Closesscylladb/scylladb#15903
* github.com:scylladb/scylladb:
topology: remove_endpoint: remove inet_address overload
token_metadata: topology: cleanup add_or_update_endpoint
token_metadata: add_replacing_endpoint: forbid replacing node with itself
topology: drop key_kind, host_id is now the primary key
dc_rack_fn: make it non-template
token_metadata: drop the template
shared_token_metadata: switch to the new token_metadata
gossiper: use new token_metadata
database: get_token_metadata -> new token_metadata
erm: switch to the new token_metadata
storage_service: get_token_metadata -> token_metadata2
storage_service: get_token_to_endpoint_map: use new token_metadata
api/token_metadata: switch to new version
storage_service::on_change: switch to new token_metadata
cdc: switch to token_metadata2
calculate_natural_endpoints: fix indentation
calculate_natural_endpoints: switch to token_metadata2
storage_service: get_changed_ranges_for_leaving: use new token_metadata
decommission_with_repair, removenode_with_repair -> new token_metadata
rebuild_with_repair, replace_with_repair: use new token_metadata
bootstrap: use new token_metadata
tablets: switch to token_metadata2
calculate_effective_replication_map: use new token_metadata
calculate_natural_endpoints: fix formatting
abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
network_topology_strategy_test: update new token_metadata
storage_service: on_alive: update new token_metadata
storage_service: handle_state_bootstrap: update new token_metadata
storage_service: snitch_reconfigured: update new token_metadata
storage_service: leave_ring: update new token_metadata
storage_service: node_ops_cmd_handler: update new token_metadata
storage_service: node_ops_cmd_handler: add coordinator_host_id
storage_service: bootstrap: update new token_metadata
storage_service: join_token_ring: update new token_metadata
storage_service: excise: update new token_metadata
storage_service: join_cluster: update new token_metadata
storage_service: on_remove: update new token_metadata
storage_service: handle_state_normal: fill new token_metadata
storage_service: topology_state_load: fill new token_metadata
storage_service: adjust update_topology_change_info to update new token_metadata
topology: set self host_id on the new topology
locator::topology: allow being_replaced and replacing nodes to have the same IP
token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
token_metadata: get_host_id: exception -> on_internal_error
token_metadata: add get_all_ips method
token_metadata: support host_id-based version
token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
locator: make dc_rack_fn a template
locator/topology: add key_kind parameter
token_metadata: topology_change_info: change field types to token_metadata_ptr
token_metadata: drop unused method get_endpoint_to_token_map_for_reading
When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620).
If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957).
When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary.
We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`.
Fixes: #7620Fixes: #13957
---
This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller.
I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR.
Closesscylladb/scylladb#16242
* github.com:scylladb/scylladb:
docs: describe group 0 schema versioning in raft docs
test: add test for group 0 schema versioning
feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode
schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0
migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations
schema_tables: use schema version from group 0 if present
migration_manager: store `group0_schema_version` in `scylla_local` during schema changes
system_keyspace: make `get/set_scylla_local_param` public
feature_service: add `GROUP0_SCHEMA_VERSIONING` feature
As promised in earlier commits:
Fixes: #7620Fixes: #13957
Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.
Refs: https://github.com/scylladb/scylladb/issues/16255Closesscylladb/scylladb#16289
* github.com:scylladb/scylladb:
Update unified/build_unified.sh
Update main.cc
Update dist/common/scripts/scylla-housekeeping
Typos: fix typos in code
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.
Closesscylladb/scylladb#16250
* github.com:scylladb/scylladb:
treewide: get rid of now unused fb_utilities
tracing: use locator::topology rather than fb_utilities
streaming: use locator::topology rather than fb_utilities
raft: use locator::topology/messaging rather than fb_utilities
storage_service: use locator::topology rather than fb_utilities
storage_proxy: use locator::topology rather than fb_utilities
service_level_controller: use locator::topology rather than fb_utilities
misc_services: use locator::topology rather than fb_utilities
migration_manager: use messaging rather than fb_utilities
forward_service: use messaging rather than fb_utilities
messaging_service: accept broadcast_addr in config rather than via fb_utilities
messaging_service: move listen_address and port getters inline
test: manual: modernize message test
table: use gossiper rather than fb_utilities
repair: use locator::topology rather than fb_utilities
dht/range_streamer: use locator::topology rather than fb_utilities
db/view: use locator::topology rather than fb_utilities
database: use locator::topology rather than fb_utilities
db/system_keyspace: use topology via db rather than fb_utilities
db/system_keyspace: save_local_info: get broadcast addresses from caller
db/hints/manager: use locator::topology rather than fb_utilities
db/consistency_level: use locator::topology rather than fb_utilities
api: use locator::topology rather than fb_utilities
alternator: ttl: use locator::topology rather than fb_utilities
gossiper: use locator::topology rather than fb_utilities
gossiper: add get_this_endpoint_state_ptr
test: lib: cql_test_env: pass broadcast_address in cql_test_config
init: get_seeds_from_db_config: accept broadcast_address
locator: replication strategies: use locator::topology rather than fb_utilities
locator: topology: add helpers to retrieve this host_id and address
snitch: pass broadcast_address in snitch_config
snitch: add optional get_broadcast_address method
locator: ec2_multi_region_snitch: keep local public address as member
ec2_multi_region_snitch: reindent load_config
ec2_multi_region_snitch: coroutinize load_config
ec2_snitch: reindent load_config
ec2_snitch: coroutinize load_config
thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
This feature, when enabled, will modify how schema versions
are calculated and stored.
- In group 0 mode, schema versions are persisted by the group 0 command
that performs the schema change, then reused by each node instead of
being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
perform a schema change, we revert to the old digest-based way, taking
into account the possibility of having performed group0-mode schema
changes (that used persistent versions). As we will see in future
commits, this will be done by storing additional flags and tombstones
in system tables.
By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).
For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
less overhead this way. the caller of lookup() always passes
a rvalue reference. and seastar::dns::get_host_by_name() actually
moves away from the parameter, so let's pass by std::move() for
slightly better performance, and to match the expectation of
the underlying seastar API.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16280
Returns this node's endpoint_state_ptr.
With this entry point, the caller doesn't need to
get_broadcast_address.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
neither <iomanip> nor "utils/to_string.hh" is used in
`gms/inet_address.cc`, so let's remove their "#include"s.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16281
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.
A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
cluster feature.
The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.
The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.
When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.
In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.
Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
assumed to be true.
Closesscylladb/scylladb#16254
in 4ea6e06c, we specialized fmt::formatter<gms::inet_address> using
the formatter of bytes if the underlying address is an IPv6 address.
this breaks the tests with JMX which expected the shortened form of
the text representation of the IPv6 address.
in this change, instead of reinventing the wheel, let's reuse the
existing formatter of net::inet_address, which is able to handle
both IPv4 and IPv6 addresses, also it follows
https://datatracker.ietf.org/doc/html/rfc5952 by compressing the
consecutive zeros.
since this new formatter is a thin wrapper of seastar::net::inet_addresss,
the corresponding unit test will be added to Seastar.
Refs #16039
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16267
Fixes some typos as found by codespell run on the code.
In this commit, I was hoping to fix only comments, not user-visible alerts, output, etc.
Follow-up commits will take care of them.
Refs: https://github.com/scylladb/scylladb/issues/16255
Signed-off-by: Yaniv Kaul <yaniv.kaul@scylladb.com>
generation_number's type is `generation_type`, which in turn is a
`utils::tagged_integer<struct generation_type_tag, int32_t>`,
which formats using either fmtlib which uses ostream_formatter backed by
operator<< . but `ostream_formatter` does not provide the specifier
support. so {:d} does apply to this type, when compiling with fmtlib
v10, it rejects the format specifier (the error is attached at the end
of the commit message).
so in this change, we just drop the format specifier. as fmtlib prints
`int32_t` as a decimal integer, so even if {:d} applied, it does not
change the behavior.
```
/home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: error: call to consteval function 'fmt::basic_format_string<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int> &>::basic_format_string<char[48], 0>' is not a constant expression
1798 | auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen);
| ^
/usr/include/fmt/core.h:2322:31: note: non-constexpr function 'throw_format_error' cannot be used in a constant expression
2322 | if (!in(arg_type, set)) throw_format_error("invalid format specifier");
| ^
/usr/include/fmt/core.h:2395:14: note: in call to 'parse_presentation_type.operator()(1, 510)'
2395 | return parse_presentation_type(pres::dec, integral_set);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2706:9: note: in call to 'parse_format_specs<char>(&"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47], formatter<mapped_type, char_type>().formatter::specs_, checker(s).context_, 13)'
2706 | detail::parse_format_specs(ctx.begin(), ctx.end(), specs_, ctx, type);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2561:10: note: in call to 'formatter<mapped_type, char_type>().parse<fmt::detail::compile_parse_context<char>>(checker(s).context_)'
2561 | return formatter<mapped_type, char_type>().parse(ctx);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2647:39: note: in call to 'parse_format_specs<utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, fmt::detail::compile_parse_context<char>>(checker(s).context_)'
2647 | return id >= 0 && id < num_args ? parse_funcs_[id](context_) : begin;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2485:15: note: in call to 'handler.on_format_specs(0, &"Remote generation {:d} != local generation {:d}"[20], &"Remote generation {:d} != local generation {:d}"[47])'
2485 | begin = handler.on_format_specs(adapter.arg_id, begin + 1, end);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2541:13: note: in call to 'parse_replacement_field<char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>> &>(&"Remote generation {:d} != local generation {:d}"[19], &"Remote generation {:d} != local generation {:d}"[47], checker(s))'
2541 | begin = parse_replacement_field(p, end, handler);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/fmt/core.h:2769:7: note: in call to 'parse_format_string<true, char, fmt::detail::format_string_checker<char, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>, utils::tagged_tagged_integer<utils::final, gms::generation_type_tag, int>>>({&"Remote generation {:d} != local generation {:d}"[0], 47}, checker(s))'
2769 | detail::parse_format_string<true>(str_, checker(s));
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/kefu/dev/scylladb/gms/gossiper.cc:1798:35: note: in call to 'basic_format_string<char[48], 0>("Remote generation {:d} != local generation {:d}")'
1798 | auto err = format("Remote generation {:d} != local generation {:d}", remote_gen, local_gen);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#16126
allow_mutation_read_page_without_live_row is a new option in the
partition_slice::option option set. In a mixed clusters, old nodes
possibly don't know this new option, so its usage must be protected by a
cluster feature. This patch does just that.
Fixes: #15795Closesscylladb/scylladb#15890
They cause connection drops, which is a significant disruptive
event. We should log it so that we can know that this is the cause of
the problems it may cause, like requests timing out. Connection drop
will cause coordinator-side requests to time out in the absence of
speculation.
Refs #14746Closesscylladb/scylladb#16018
Remove `fall_back_to_syn_msg` which is not necessary in newer Scylla versions.
Fix the calculation of `nodes_down` which could count a single node multiple times.
Make shadow round mandatory during bootstrap and replace -- these operations are unsafe to do without checking features first, which are obtained during the shadow round (outside raft-topology mode).
Finally, during node restart, allow the shadow round to be skipped when getting `timeout_error`s from contact points, not only when getting `closed_error`s (during restart it's best-effort anyway, and in general it's impossible to distinguish between a dead node and a partitioned node).
More details in commit messages.
Ref: https://github.com/scylladb/scylladb/issues/15675Closesscylladb/scylladb#15941
* github.com:scylladb/scylladb:
gossiper: do_shadow_round: increment `nodes_down` in case of timeout
gossiper: do_shadow_round: fix `nodes_down` calculation
storage_service: make shadow round mandatory during bootstrap/replace
gossiper: do_shadow_round: remove default value for nodes param
gossiper: do_shadow_round: remove `fall_back_to_syn_msg`
to have feature parity with `configure.py`. we won't need this
once we migrate to C++20 modules. but before that day comes, we
need to stick with C++ headers.
we generate a rule for each .hh files to create a corresponding
.cc and then compile it, in order to verify the self-containness of
that header. so the number of rule is quite large, to avoid the
unnecessary overhead. the check-header target is enabled only if
`Scylla_CHECK_HEADERS` option is enabled.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#15913
Remove no longer used gossiper states that are not needed even for
compatibility any longer.
* 'remove_unused_states' of github.com:scylladb/scylla-dev:
gossip: remove unused HIBERNATE gossiper status
gossip: remove unused STATUS_MOVING state
Previously we would only increment `nodes_down` when getting
`rpc::closed_error`. Distinguishing between that and timeout is
unreliable. Consider:
1. if a node is dead but we can reach the IP, we'd get `closed_error`
2. if we cannot reach the IP (there's a network partition), the RPC
would hang so we'd get `timeout_error`
3. if the node is both dead and the IP is unreachable, we'd get
`timeout_error`
And there are probably other more complex scenarios as well. In general,
it is impossible to distinguish a dead node from a partitioned node in
asynchronous networks, and whether we end up with `closed_error` or
`timeout_error` is an implementation detail of the underlying protocol
that we use.
The fact that `nodes_down` was not incremented for timeouts would
prevent a node from starting if it cannot reach isolated IPs (whether or
not there were dead or alive nodes behind those IPs). This was observed
in a Jepsen test: https://github.com/scylladb/scylladb/issues/15675.
Note that `nodes_down` is only used to skip shadow round outside
bootstrap/replace, i.e. during restarts, where the shadow round was
"best effort" anyway (not mandatory). During bootstrap/replace it is now
mandatory.
Also fix grammar in the error message.
During shadow round we would calculate the number of nodes from which we
got `rpc::closed_error` using `nodes_counter`, and if the counter
reached the size of all contact points passed to shadow round, we would
skip the shadow round (and after the previous commit, we do it only in
the case of restart, not during bootstrap/replace which is unsafe).
However, shadow round might have multiple loops, and `nodes_down` was
initialized to `0` before the loop, then reused. So the same node might
be counted multiple times in `nodes_down`, and we might incorrectly
enter the skipping branch. Or we might go over `nodes.size()` and never
finish the loop.
Fix this by initializing `nodes_down = 0` inside the loop.
It is unsafe to bootstrap or perform replace without performing the
shadow round, which is used to obtain features from the existing cluster
and verify that we support all enabled features.
Before this patch, I could easily produce the following scenario:
1. bootstrap first node in the cluster
2. shut it down
3. start bootstrapping second node, pointing to the first as seed
4. the second node skips shadow round because it gets
`rpc::closed_error` when trying to connect to first node.
5. the node then passes the feature check (!) and proceeds to the next
step, where it waits for nodes to show up in gossiper
6. we now restart the first node, and the second node finishes bootstrap
The shadow round must be mandatory during bootstrap/replace, which is
what this patch does.
On restart it can remain optional as it was until now. In fact it should
be completely unnecessary during restart, but since we did it until now
(as best-effort), we can keep doing it.