Commit Graph

1069 Commits

Author SHA1 Message Date
Kefu Chai
cb781c0ff7 gms: add add formatter for gms::versioned_value
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::versioned_value`. its
operator<< is preserved, as it's still being used by the homebrew
generic formatter for std::unordered_map<gms::application_state,
gms::versioned_value>, which is in turn used in gms/gossiper.cc.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17366
2024-02-18 19:21:54 +02:00
Kefu Chai
4812a57f71 gms: add add formatter for gms::gossip_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

- gms::gossip_digest
- gms::gossip_digest_ack
- gms::gossip_digest_syn

and drop their operator<<:s

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17379
2024-02-18 19:21:53 +02:00
Kamil Braun
2e81f045cc Merge 'transport: controller: do_start_server: do not set_cql_read for maintenance port' from Benny Halevy
RPC is not ready yet at this point, so we should not set this application state yet.

Also, simplify add_local_application_state as it contains dead code
that will never generate an internal error after 1d07a596bf.

Fixes #16932

Closes scylladb/scylladb#17263

* github.com:scylladb/scylladb:
  gossiper: add_local_application_state: drop internae error
  transport: controller: do_start_server: do not set_cql_read for maintenance port
2024-02-12 13:26:45 +01:00
Benny Halevy
2ed29e31db gms: inet_address: make constructors explicit
In particular, `inet_address(const sstring& addr)` is
dangerous, since a function like
`topology::get_datacenter(inet_address ep)`
might accidentally convert a `sstring` argument
into an `inet_address` (which would most likely
throw an obscure std::invalid_argument if the datacenter
name does not look like an inet_address).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#17260
2024-02-11 15:44:13 +02:00
Benny Halevy
f86a5072d6 gossiper: add_local_application_state: drop internae error
After 1d07a596bf that
dropped before_change notifications there is no sense
in getting the local endpoint_state_ptr twice: before
and after the notifications and call on_internal_error
if the state isn't found after the notifications.

Just throw the runtime_error if the endpoint state is not
found, otherwise, use it.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-02-11 13:33:26 +02:00
Kamil Braun
e9e24f47ec Merge 'raft topology: implement upgrade and recovery procedure' from Piotr Dulikowski
This PR implements a procedure that upgrades existing clusters to use
raft-based topology operations. The procedure does not start
automatically, it must be triggered manually by the administrator after
making sure that no topology operations are currently running.

Upgrade is triggered by sending `POST
/storage_service/raft_topology/upgrade` request. This causes the
topology coordinator to start who drives the rest of the process: it
builds the `system.topology` state based on information observed in
gossip and tells all nodes to switch to raft mode. Then, topology
coordinator runs normally.

Upgrade progress is tracked in a new static column `upgrade_state` in
`system.topology`.

The procedure also serves as an extension to the current recovery
procedure on raft. The current recovery procedure requires restarting
nodes in a special mode which disables raft, perform `nodetool
removenode` on the dead nodes, clean up some state on the nodes and
restart them so that they automatically rebuild the group 0. Raft
topology fits into existing procedure by falling back to legacy topology
operations after disabling raft. After rebuilding the group 0, upgrade
needs to be triggered again.

Because upgrade is manual and it might not be convenient for
administrators to run it right after upgrading the cluster, we allow the
cluster to operate in legacy topology operations mode until upgrade,
which includes allowing new nodes to join. In order to allow it, nodes
now ask the cluster about the mode they should use to join before
proceeding by using a new `JOIN_NODE_QUERY` RPC.

The procedure is explained in more detail in `topology-over-raft.md`.

Fixes: https://github.com/scylladb/scylladb/issues/15008

Closes scylladb/scylladb#17077

* github.com:scylladb/scylladb:
  test/topology_custom: upgrade/recovery tests for topology on raft
  cdc/generation_service: in legacy mode, fall back to raft tables
  system_keyspace: add read_cdc_generation_opt
  cdc/generation_service: turn off gossip notifications in raft topo mode
  cql_test_env: move raft_topology_change_enabled var earlier
  group0_state_machine: pull snapshot after raft topology feature enabled
  storage_service: disable persistent feature enabler on upgrade
  storage_service: replicate raft features to system.peers
  storage_service: gossip tokens and cdc generation in raft topology mode
  API: add api for triggering and monitoring topology-on-raft upgrade
  storage_service: infer which topology operations to use on startup
  storage_service: set the topology kind value based on group 0 state
  raft_group0: expose link to the upgrade doc in the header
  feature_service: fall back to checking legacy features on startup
  storage_service: add fiber for tracking the topology upgrade progress
  gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
  topology_coordinator: implement core upgrade logic
  topology_coordinator: extract top-level error handling logic
  storage_service: initialize discovery leader's state earlier
  topology_coordinator: allow for custom sharding info in prepare_and_broadcast_cdc_generation_data
  topology_coordinator: allow for custom sharding info in prepare_new_cdc_generation_data
  topology_coordinator: remove outdated fixme in prepare_new_cdc_generation_data
  topology_state_machine: introduce upgrade_state
  storage_service: disallow topology ops when upgrade is in progress
  raft_group0_client: add in_recovery method
  storage_service: introduce join_node_query verb
  raft_group0: make discover_group0 public
  raft_group0: filter current node's IP in discover_group0
  raft_group0: remove my_id arg from discover_group0
  storage_service: make _raft_topology_change_enabled more advanced
  docs: document raft topology upgrade and recovery
2024-02-09 11:54:53 +01:00
Piotr Dulikowski
53932420f8 storage_service: disable persistent feature enabler on upgrade
When starting in legacy mode, a gossip event listener called persistent
feature enabler is registered. This listener marks a feature as enabled
when it notices, in gossip, that all nodes declare support for the
feature.

With raft-based topology, features are managed in group 0 instead and do
not rely on the persistent feature enabler at all. Make the listener
look at the raft_topology_change_enabled() method and prevent it from
enabling more features after that method starts returning true.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
3513a07d8a feature_service: fall back to checking legacy features on startup
When checking features on startup (i.e. whether support for any feature
was revoked in an unsafe way), it might happen that upgrade to raft
topology didn't finish yet. In that case, instead of loading an empty
set of features - which supposedly represents the set of features that
were enabled until last boot - we should fall back to loading the set
from the legacy `enabled_features` key in `system.scylla_local`.
2024-02-08 19:12:28 +01:00
Piotr Dulikowski
2ecb8641b1 gms: feature_service: add SUPPORTS_CONSISTENT_TOPOLOGY_CHANGES
All nodes being capable of support for raft topology is a prerequisite
for starting upgrade to raft topology. The newly introduced feature will
track this prerequisite.
2024-02-08 19:12:28 +01:00
Kefu Chai
6eae678eb3 db: add formatter for gms::gossip_digest_ack2
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::gossip_digest_ack2`,
and drop its operator<<.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17153
2024-02-08 11:49:37 +02:00
Pavel Emelyanov
66b859a29f gms: Remove unused operator<< for feature object
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#17109
2024-02-01 19:00:46 +02:00
Kefu Chai
005d231f96 db: add formatter for gms::application_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for `gms::application_state`,
but its operator<< is preserved, as it is still used by the generic
homebrew formatter for `std::unordered_map<>`.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17096
2024-02-01 10:02:25 +02:00
Botond Dénes
5f44ae8371 Merge 'Add more logging for gossiper::lock_endpoint and storage_service::handle_state_normal' from Kamil Braun
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

Thus the plan is to continue debugging using the longevity test, but we need
more logs. To check whether `handle_state_normal` was called and which branches
were taken, include some INFO level logs there. Also, detect deadlocks inside
`gossiper::lock_endpoint` by reporting an error message if `lock_endpoint`
waits for the lock for too long.

Ref: scylladb/scylladb#16668

Closes scylladb/scylladb#16733

* github.com:scylladb/scylladb:
  gossiper: report error when waiting too long for endpoint lock
  gossiper: store source_location instead of string in endpoint_permit
  storage_service: more verbose logging in handle_state_normal
2024-01-12 10:51:21 +02:00
Kamil Braun
cf646022cb gossiper: report error when waiting too long for endpoint lock
In a longevity test reported in scylladb/scylladb#16668 we observed that
NORMAL state is not being properly handled for a node that replaced
another node. Either handle_state_normal is not being called, or it is
but getting stuck in the middle. Which is the case couldn't be
determined from the logs, and attempts at creating a local reproducer
failed.

One hypothesis is that `gossiper` is stuck on `lock_endpoint`. We dealt
with gossiper deadlocks in the past (e.g. scylladb/scylladb#7127).

Modify the code so it reports an error if `lock_endpoint` waits for the
lock for more than a minute. When the issue reproduces again in
longevity, we will see if `lock_endpoint` got stuck.
2024-01-11 17:29:25 +01:00
Kefu Chai
7abd263ee6 db/config.cc: do not respect sstable_format option
"me" sstable format includes an important feature of storing the
`host_id` of the local node when writing sstables. The is crucial
for validating the sstable's `replay_position` in stats metadata as
it is valid only on the originating node and shard (#10080), therefor
we would like to make the `me` format mandatory.

before making `me` mandatory, we need to stop handling `sstable_format`
option if it is "md".

in this change

- gms/feature_service: do not disable `ME_SSTABLE_FORMAT` even if
  `sstable_format` is configured with "md". and in that case, instead,
  a warning is printed in the logging message to note that
  this setting is not valid anymore.
- docs/architecture/sstable: note that "me" is used by default now.

after this change, "sstable_format" will only accept "me" if it's
explicitly configured. and when a server with this change joins a
cluster, it uses "md" if the any of the node in the cluster still has
`sstable_format`. practically, this change makes "me" mandatory
in a 6.x cluster, assuming this change will be included in 6.x
releases.

Fixes #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Kefu Chai
bece3eff0c feature_service: abort if sstable_format < md
sstable_format comes from scylla.yaml or from the command line
arguments, and we gate scylla from unallowed sstable formats lower
than `md` when parsing the configuration, and scylla bails out
at seeing the unallowed sstable format like:

```
terminate called after throwing an instance of 'std::invalid_argument'
  what():  Invalid value for sstable_format: got ka which is not inside the set of allowed values md, me
Aborted (core dumped)
```

scylla errors out way before `feature_config_from_db_config()`
gets called -- it throws in `bpo::notify(configuration)`,
way before `func` is evaluated in `app_template::run_deprecated()`.

so, in this change, we do not handle these values anymore, and
consider it a bug if we run into any of them.

Refs #16551
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-01-11 22:43:05 +08:00
Kamil Braun
6e39c2ffde gossiper: store source_location instead of string in endpoint_permit
The original code extracted only the function_name from the
source_location for logging. We'll use more information from the
source_location in later commits.
2024-01-10 17:02:52 +01:00
Kefu Chai
f61f6c27e3 gms: add formatter for gms::endpoint_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for gms::endpoint_state, and
change update the callers of `operator<<` to use `fmt::print()`.
but we cannot drop `operator<<` yet, as we are still using the
templated operator<< and templated fmt::formatter to print containers
in scylla and in seastar -- they are still using `operator<<`
under the hood.

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16705
2024-01-10 09:16:23 +02:00
Kefu Chai
b91eb89ffa gms: heart_beat_state: add formatter for gms::heart_beat_state
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define a formatter for gms::heart_beat_state, and
remove its operator<<(). the only caller site of its operator<< is
updated to use `fmt::print()`

Refs #13245
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16652
2024-01-09 11:52:40 +02:00
Kefu Chai
cca786e847 gms: endpoint_state: fix a typo in comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16653
2024-01-09 11:51:49 +02:00
Kefu Chai
7e84e03f52 gms: do not include unused headers
these unused includes were identified by clangd. see
https://clangd.llvm.org/guides/include-cleaner#unused-include-warning
for more details on the "Unused include" warning.

because the removal of `#include "unimplemented.hh"`,
`service/migration_manager.cc` misses the definition of
`unimplemented::cause::VALIDATION`, so include the header where it is
used.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16654
2024-01-05 13:37:08 +02:00
Kamil Braun
f942bf4a1f Merge 'Do not update endpoint state via gossiper::add_saved_endpoint once it was updated via gossip' from Benny Halevy
Currently, `add_saved_endpoint` is called from two paths:  One, is when
loading states from system.peers in the join path (join_cluster,
join_token_ring), when `_raft_topology_change_enabled` is false, and the
other is from `storage_service::topology_state_load` when raft topology
changes are enabled.

In the later path, from `topology_state_load`, `add_saved_endpoint` is
called only if the endpoint_state does not exist yet.  However, this is
checked without acquiring the endpoint_lock and so it races with the
gossiper, and once `add_saved_endpoint` acquires the lock, the endpoint
state may already be populated.

Since `add_saved_endpoint` applies local information about the endpoint
state (e.g. tokens, dc, rack), it uses the local heart_beat_version,
with generation=0 to update the endpoint states, and that is
incompatible with changes applies via gossip that will carry the
endpoint's generation and version, determining the state's update order.

This change makes sure that the endpoint state is never update in
`add_saved_endpoint` if it has non-zero generation.  An internal error
exception is thrown if non-zero generation is found, and in the only
call site that might reach that state, in
`storage_service::topology_state_load`, the caller acquires the
endpoint_lock for checking for the existence of the endpoint_state,
calling `add_saved_endpoint` under the lock only if the endpoint_state
does not exist.

Fixes #16429

Closes scylladb/scylladb#16432

* github.com:scylladb/scylladb:
  gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
  storage_service: topology_state_load: lock endpoint for add_saved_endpoint
  raft_group_registry: move on_alive error injection to gossiper
2024-01-04 14:47:10 +01:00
Benny Halevy
9e8998109f gossiper: get_*_members_synchronized: acquire endpoint update semaphore
To ensure that the value they return is synchronized on all shards.

This got broken recently by 147f30caff.

Refs https://github.com/scylladb/scylladb/pull/16597#discussion_r1440445432

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16629
2024-01-03 17:41:46 +01:00
Benny Halevy
147f30caff gossiper: mutate_live_and_unreachable_endpoints: make exception safe
Change the mutate_live_and_unreachable_endpoints procedure
so that the called `func` would mutate a cloned
`live_and_unreachable_endpoints` object in place.

Those are replicated to temporary copies on all shards
using `foreign<unique_ptr<>>` so that the would be
automatically freed on exception.

Only after all copies are made, they are applied
on all gossiper shards in a noexcept loop
and finally, a `on_success` function is called
to apply further side effects if everything else
was replicated successfully.

The latter is still susceptible to exceptions,
but we can live with those as long as `_live_endpoints`
and `_unreachable_endpoints` are synchronized on all shards.

With that, the read-only methods:
`get_live_members_synchronized` and
`get_unreachable_members_synchronized`
become trivial and they just return the required data
from shard 0.

Fixes #15089

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#16597
2024-01-03 14:46:10 +02:00
Benny Halevy
cdd5605d81 gms: endpoint_state: change application_state_map to std::unordered_map
State changes are processed as a batch and
there is no reason to maintain them as an ordered map.
Instead, use a std::unordered_map that is more efficient.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
ad8a9104d8 endpoint_state subscriptions: batch on_change notification
Rather than calling on_change for each particular
application_state, pass an endpoint_state::map_type
with all changed states, to be processed as a batch.

In particular, thise allows storage_service::on_change
to update_peer_info once for all changed states.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
1d07a596bf everywhere: drop before_change subscription
None of the subscribers is doing anything before_change.
This is done before changing `on_change` in the following patch.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
5abf556399 gms: endpoint_state: define application_state_map
Have a central definition for the map held
in the endpoint_state (before changing it to
std::unordered_map).

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 18:37:34 +02:00
Benny Halevy
3cba079b26 gossiper: add_saved_endpoint: keep heart_beat_state if ep_state is found
Currently, when loading peers' endpoint state from system.peers,
add_saved_endpoint is called.
The first instance of the endpoint state is created with the default
heart_beat_state, with both generation and version set to zero.
However, if add_saved_endpoint finds an existing instance of the
endpoint state, it reuses it, but it updates its heart_beat_state
with the local heart_beat_state() rather than keeping the existing
heart_beat_state, as it should.

This is a problem since it may confuse updates over gossip
later on via do_apply_state_locally that compares the remote
generation vs. the local generation, so they must stem from
the same root that is the endpoint itself.

Fixes #16429

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 16:48:57 +02:00
Benny Halevy
3099c5b8ab storage_service: topology_state_load: lock endpoint for add_saved_endpoint
`topology_state_load` currently calls `add_saved_endpoint`
only if it finds no endpoint_state_ptr for the endpoint.
However, this is done before locking the endpoint
and the endpoint state could be inserted concurrently.

To prevent that, a permit_id parameter was added to
`add_saved_endpoint` allowing the caller to call it
while the endpoint is locked.  With that, `topology_state_load`
locks the endpoint and checks the existence of the endpoint state
under the lock, before calling `add_saved_endpoint`.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 16:48:57 +02:00
Benny Halevy
db434e8cb5 raft_group_registry: move on_alive error injection to gossiper
Move the `raft_group_registry::on_alive` error injection point
to `gossiper::real_mark_alive` so it can delay marking the endpoint as
alive, and calling the `on_alive` callback, but without holding
the endpoint_lock.

Note that the entry for this endpoint in `_pending_mark_alive_endpoints`
still blocks marking it as alive until real_mark_alive completes.

Fixes #16506

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-31 15:28:54 +02:00
Pavel Emelyanov
c43501d973 locator,schema: Move initial tablets from r.s. options to params
The option is kepd in DDL, but is _not_ stored in
system_schema.keyspaces. Instead, it's removed from the provided options
and kept in scylla_keyspaces table in its own column. All the places
that had optional initial_tablets disengaged now set this value up the
way the find appropriate.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2023-12-25 16:07:10 +03:00
Kamil Braun
3b108f2e31 Merge 'db: config: make consistent_cluster_management mandatory' from Patryk Jędrzejczak
We make `consistent_cluster_management` mandatory in 5.5. This
option will be always unused and assumed to be true.

Additionally, we make `override_decommission` deprecated, as this option
has been supported only with `consistent_cluster_management=false`.

Making `consistent_cluster_management` mandatory also simplifies
the code. Branches that execute only with
`consistent_cluster_management` disabled are removed.

We also update documentation by removing information irrelevant in 5.5.

Fixes scylladb/scylladb#15854

Note about upgrades: this PR does not introduce any more limitations
to the upgrade procedure than there are already. As in
scylladb/scylladb#16254, we can upgrade from the first version of Scylla
that supports the schema commitlog feature, i.e. from 5.1 (or
corresponding Enterprise release) or later. Assuming this PR ends up in
5.5, the documented upgrade support is from 5.4. For corresponding
Enterprise release, it's from 2023.x (based on 5.2), so all requirements
are met.

Closes scylladb/scylladb#16334

* github.com:scylladb/scylladb:
  docs: update after making consistent_cluster_management mandatory
  system_keyspace, main, cql_test_env: fix indendations
  db: config: make consistent_cluster_management mandatory
  test: boost: schema_change_test: replace disable_raft_schema_config
  db: config: make override_decommission deprecated
  db: config: make force_schema_commit_log deprecated
2023-12-18 09:44:52 +01:00
Patryk Jędrzejczak
5ebfbf42bc db: config: make consistent_cluster_management mandatory
Code that executed only when consistent_cluster_management=false is
removed. In particular, after this patch:
- raft_group0 and raft_group_registry are always enabled,
- raft_group0::status_for_monitoring::disabled becomes unused,
- topology tests can only run with consistent_cluster_management.
2023-12-14 16:54:04 +01:00
Kamil Braun
26cbd28883 Merge 'token_metadata: switch to host_id' from Petr Gusev
In this PR we refactor `token_metadata` to use `locator::host_id` instead of `gms::inet_address` for node identification in its internal data structures. Main motivation for these changes is to make raft state machine deterministic. The use of IPs is a problem since they are distributed through gossiper and can't be used reliably. One specific scenario is outlined [in this comment](https://github.com/scylladb/scylladb/pull/13655#issuecomment-1521389804) - `storage_service::topology_state_load` can't resolve host_id to IP when we are applying old raft log entries, containing host_id-s of the long-gone nodes.

The refactoring is structured as follows:
  * Turn `token_metadata` into a template so that it can be used with host_id or inet_address as the node key. The version with inet_address (the current one) provides a `get_new()` method, which can be used to access the new version.
  * Go over all places which write to the old version and make the corresponding writes to the new version through `get_new()`. When this stage is finished we can use any version of the `token_metadata` for reading.
  * Go over all the places which read `token_metadata` and switch them to the new version.
  * Make `host_id`-based `token_metadata` default, drop `inet_address`-based version, change `token_metadata` back to non-template.

These series [depends](1745a1551a) on RPC sender `host_id` being present in RPC `clent_info` for `bootstrap` and `replace` node_ops commands. This feature was added in [this commit](95c726a8df) and released in `5.4`. It is generally recommended not to skip versions when upgrading, so users who upgrade sequentially first to `5.4` (or the corresponding Enterprise version) then to the version with these changes (`5.5` or `6.0`) should be fine. If for some reason they upgrade from a version without `host_id` in RPC `clent_info` to the version with these changes and they run bootstrap or replace commands during the upgrade procedure itself, these commands may fail with an error `Coordinator host_id not found` if some nodes are already upgraded and the node which started the node_ops command is not yet upgraded. In this case the user can finish the upgrade first to version 5.4 or later, or start bootstrap/replace with an upgraded node. Note that removenode and decommission do not depend on coordinator host_id so they can be started in the middle of upgrade from any node.

Closes scylladb/scylladb#15903

* github.com:scylladb/scylladb:
  topology: remove_endpoint: remove inet_address overload
  token_metadata: topology: cleanup add_or_update_endpoint
  token_metadata: add_replacing_endpoint: forbid replacing node with itself
  topology: drop key_kind, host_id is now the primary key
  dc_rack_fn: make it non-template
  token_metadata: drop the template
  shared_token_metadata: switch to the new token_metadata
  gossiper: use new token_metadata
  database: get_token_metadata -> new token_metadata
  erm: switch to the new token_metadata
  storage_service: get_token_metadata -> token_metadata2
  storage_service: get_token_to_endpoint_map: use new token_metadata
  api/token_metadata: switch to new version
  storage_service::on_change: switch to new token_metadata
  cdc: switch to token_metadata2
  calculate_natural_endpoints: fix indentation
  calculate_natural_endpoints: switch to token_metadata2
  storage_service: get_changed_ranges_for_leaving: use new token_metadata
  decommission_with_repair, removenode_with_repair -> new token_metadata
  rebuild_with_repair, replace_with_repair: use new token_metadata
  bootstrap: use new token_metadata
  tablets: switch to token_metadata2
  calculate_effective_replication_map: use new token_metadata
  calculate_natural_endpoints: fix formatting
  abstract_replication_strategy: calculate_natural_endpoints: make it work with both versions of token_metadata
  network_topology_strategy_test: update new token_metadata
  storage_service: on_alive: update new token_metadata
  storage_service: handle_state_bootstrap: update new token_metadata
  storage_service: snitch_reconfigured: update new token_metadata
  storage_service: leave_ring: update new token_metadata
  storage_service: node_ops_cmd_handler: update new token_metadata
  storage_service: node_ops_cmd_handler: add coordinator_host_id
  storage_service: bootstrap: update new token_metadata
  storage_service: join_token_ring: update new token_metadata
  storage_service: excise: update new token_metadata
  storage_service: join_cluster: update new token_metadata
  storage_service: on_remove: update new token_metadata
  storage_service: handle_state_normal: fill new token_metadata
  storage_service: topology_state_load: fill new token_metadata
  storage_service: adjust update_topology_change_info to update new token_metadata
  topology: set self host_id on the new topology
  locator::topology: allow being_replaced and replacing nodes to have the same IP
  token_metadata: get_endpoint_for_host_id -> get_endpoint_for_host_id_if_known
  token_metadata: get_host_id: exception -> on_internal_error
  token_metadata: add get_all_ips method
  token_metadata: support host_id-based version
  token_metadata: make it a template with NodeId=inet_address/host_id NodeId is used in all internal token_metadata data structures, that previously used inet_address. We choose topology::key_kind based on the value of the template parameter.
  locator: make dc_rack_fn a template
  locator/topology: add key_kind parameter
  token_metadata: topology_change_info: change field types to token_metadata_ptr
  token_metadata: drop unused method get_endpoint_to_token_map_for_reading
2023-12-13 16:35:52 +01:00
Petr Gusev
7b55ccbd8e token_metadata: drop the template
Replace token_metadata2 ->token_metadata,
make token_metadata back non-template.

No behavior changes, just compilation fixes.
2023-12-12 23:19:54 +04:00
Petr Gusev
799f747c8f shared_token_metadata: switch to the new token_metadata 2023-12-12 23:19:54 +04:00
Petr Gusev
c7314aa8e2 gossiper: use new token_metadata 2023-12-12 23:19:53 +04:00
Tomasz Grabiec
effb9fb3cb Merge 'Don't calculate hashes for schema versions in Raft mode' from Kamil Braun
When performing a schema change through group 0, extend the schema mutations with a version that's persisted and then used by the nodes in the cluster in place of the old schema digest, which becomes horribly slow as we perform more and more schema changes (#7620).

If the change is a table create or alter, also extend the mutations with a version for this table to be used for `schema::version()`s instead of having each node calculate a hash which is susceptible to bugs (#13957).

When performing a schema change in Raft RECOVERY mode we also extend schema mutations which forces nodes to revert to the old way of calculating schema versions when necessary.

We can only introduce these extensions if all of the cluster understands them, so protect this code by a new cluster/schema feature, `GROUP0_SCHEMA_VERSIONING`.

Fixes: #7620
Fixes: #13957

---

This is a reincarnation of PR scylladb/scylladb#15331. The previous PR was reverted due to a bug it unmasked; the bug has now been fixed (scylladb/scylladb#16139). Some refactors from the previous PR were already merged separately, so this one is a bit smaller.

I have checked with @Lorak-mmk's reproducer (https://github.com/Lorak-mmk/udt_schema_change_reproducer -- many thanks for it!) that the originally exposed bug is no longer reproducing on this PR, and that it can still be reproduced if I revert the aforementioned fix on top of this PR.

Closes scylladb/scylladb#16242

* github.com:scylladb/scylladb:
  docs: describe group 0 schema versioning in raft docs
  test: add test for group 0 schema versioning
  feature_service: enable `GROUP0_SCHEMA_VERSIONING` in Raft mode
  schema_tables: don't delete `version` cell from `scylla_tables` mutations from group 0
  migration_manager: add `committed_by_group0` flag to `system.scylla_tables` mutations
  schema_tables: use schema version from group 0 if present
  migration_manager: store `group0_schema_version` in `scylla_local` during schema changes
  system_keyspace: make `get/set_scylla_local_param` public
  feature_service: add `GROUP0_SCHEMA_VERSIONING` feature
2023-12-11 12:17:57 +01:00
Kamil Braun
7dad31c78f feature_service: enable GROUP0_SCHEMA_VERSIONING in Raft mode
As promised in earlier commits:
Fixes: #7620
Fixes: #13957

Also modify two test cases in `schema_change_test` which depend on
the digest calculation method in their checks. Details are explained in
the comments.
2023-12-08 17:46:31 +01:00
Botond Dénes
d2a88cd8de Merge 'Typos: fix typos in code' from Yaniv Kaul
Fixes some more typos as found by codespell run on the code. In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255

Closes scylladb/scylladb#16289

* github.com:scylladb/scylladb:
  Update unified/build_unified.sh
  Update main.cc
  Update dist/common/scripts/scylla-housekeeping
  Typos: fix typos in code
2023-12-06 07:36:41 +02:00
Avi Kivity
12f160045b Merge 'Get rid of fb_utilities' from Benny Halevy
utils::fb_utilities is a global in-memory registry for storing and retrieving broadcast_address and broadcat_rpc_address.
As part of the effort to get rid of all global state, this series gets rid of fb_utilities.
This will eventually allow e.g. cql_test_env to instantiate multiple scylla server nodes, each serving on its own address.

Closes scylladb/scylladb#16250

* github.com:scylladb/scylladb:
  treewide: get rid of now unused fb_utilities
  tracing: use locator::topology rather than fb_utilities
  streaming: use locator::topology rather than fb_utilities
  raft: use locator::topology/messaging rather than fb_utilities
  storage_service: use locator::topology rather than fb_utilities
  storage_proxy: use locator::topology rather than fb_utilities
  service_level_controller: use locator::topology rather than fb_utilities
  misc_services: use locator::topology rather than fb_utilities
  migration_manager: use messaging rather than fb_utilities
  forward_service: use messaging rather than fb_utilities
  messaging_service: accept broadcast_addr in config rather than via fb_utilities
  messaging_service: move listen_address and port getters inline
  test: manual: modernize message test
  table: use gossiper rather than fb_utilities
  repair: use locator::topology rather than fb_utilities
  dht/range_streamer: use locator::topology rather than fb_utilities
  db/view: use locator::topology rather than fb_utilities
  database: use locator::topology rather than fb_utilities
  db/system_keyspace: use topology via db rather than fb_utilities
  db/system_keyspace: save_local_info: get broadcast addresses from caller
  db/hints/manager: use locator::topology rather than fb_utilities
  db/consistency_level: use locator::topology rather than fb_utilities
  api: use locator::topology rather than fb_utilities
  alternator: ttl: use locator::topology rather than fb_utilities
  gossiper: use locator::topology rather than fb_utilities
  gossiper: add get_this_endpoint_state_ptr
  test: lib: cql_test_env: pass broadcast_address in cql_test_config
  init: get_seeds_from_db_config: accept broadcast_address
  locator: replication strategies: use locator::topology rather than fb_utilities
  locator: topology: add helpers to retrieve this host_id and address
  snitch: pass broadcast_address in snitch_config
  snitch: add optional get_broadcast_address method
  locator: ec2_multi_region_snitch: keep local public address as member
  ec2_multi_region_snitch: reindent load_config
  ec2_multi_region_snitch: coroutinize load_config
  ec2_snitch: reindent load_config
  ec2_snitch: coroutinize load_config
  thrift: thrift_validation: use std::numeric_limits rather than fb_utilities
2023-12-05 19:40:14 +02:00
Benny Halevy
0bcce35abd treewide: get rid of now unused fb_utilities
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 16:22:49 +02:00
Yaniv Kaul
ae2ab6000a Typos: fix typos in code
Fixes some more typos as found by codespell run on the code.
In this commit, there are more user-visible errors.

Refs: https://github.com/scylladb/scylladb/issues/16255
2023-12-05 15:18:11 +02:00
Kamil Braun
07984215a3 feature_service: add GROUP0_SCHEMA_VERSIONING feature
This feature, when enabled, will modify how schema versions
are calculated and stored.

- In group 0 mode, schema versions are persisted by the group 0 command
  that performs the schema change, then reused by each node instead of
  being calculated as a digest (hash) by each node independently.
- In RECOVERY mode or before Raft upgrade procedure finishes, when we
  perform a schema change, we revert to the old digest-based way, taking
  into account the possibility of having performed group0-mode schema
  changes (that used persistent versions). As we will see in future
  commits, this will be done by storing additional flags and tombstones
  in system tables.

By "schema versions" we mean both the UUIDs returned from
`schema::version()` and the "global" schema version (the one we gossip
as `application_state::SCHEMA`).

For now, in this commit, the feature is always disabled. Once all
necessary code is setup in following commits, we will enable it together
with Raft.
2023-12-05 13:03:28 +01:00
Kefu Chai
e88bd9c5bd gms/inet_address: pass sstring param by std::move()
less overhead this way. the caller of lookup() always passes
a rvalue reference. and seastar::dns::get_host_by_name() actually
moves away from the parameter, so let's pass by std::move() for
slightly better performance, and to match the expectation of
the underlying seastar API.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16280
2023-12-05 12:05:21 +03:00
Benny Halevy
f3e0358563 gossiper: use locator::topology rather than fb_utilities
And add `get_endpoint_state_ptr` for this_node.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Benny Halevy
25754f843b gossiper: add get_this_endpoint_state_ptr
Returns this node's endpoint_state_ptr.
With this entry point, the caller doesn't need to
get_broadcast_address.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2023-12-05 08:42:49 +02:00
Kefu Chai
3608d9be97 gms/inet_address: remove unused '#include'
neither <iomanip> nor "utils/to_string.hh" is used in
`gms/inet_address.cc`, so let's remove their "#include"s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#16281
2023-12-05 08:30:03 +03:00
Patryk Jędrzejczak
c8ee7d4499 db: make schema commitlog feature mandatory
Using consistent cluster management and not using schema commitlog
ends with a bad configuration throw during bootstrap. Soon, we
will make consistent cluster management mandatory. This forces us
to also make schema commitlog mandatory, which we do in this patch.

A booting node decides to use schema commitlog if at least one of
the two statements below is true:
- the node has `force_schema_commitlog=true` config,
- the node knows that the cluster supports the `SCHEMA_COMMITLOG`
  cluster feature.

The `SCHEMA_COMMITLOG` cluster feature has been added in version
5.1. This patch is supposed to be a part of version 6.0. We don't
support a direct upgrade from 5.1 to 6.0 because it skips two
versions - 5.2 and 5.4. So, in a supported upgrade we can assume
that the version which we upgrade from has schema commitlog. This
means that we don't need to check the `SCHEMA_COMMITLOG` feature
during an upgrade.

The reasoning above also applies to Scylla Enterprise. Version
2024.2 will be based on 6.0. Probably, we will only support
an upgrade to 2024.2 from 2024.1, which is based on 5.4. But even
if we support an upgrade from 2023.x, this patch won't break
anything because 2023.1 is based on 5.2, which has schema
commitlog. Upgrades from 2022.x definitely won't be supported.

When we populate a new cluster, we can use the
`force_schema_commitlog=true` config to use schema commitlog
unconditionally. Then, the cluster feature check is irrelevant.
This check could fail because we initiate schema commitlog before
we learn about the features. The `force_schema_commitlog=true`
config is especially useful when we want to use consistent cluster
management. Failing feature checks would lead to crashes during
initial bootstraps. Moreover, there is no point in creating a new
cluster with `consistent_cluster_management=true` and
`force_schema_commitlog=false`. It would just cause some initial
bootstraps to fail, and after successful restarts, the result would
be the same as if we used `force_schema_commitlog=true` from the
start.

In conclusion, we can unconditionally use schema commitlog without
any checks in 6.0 because we can always safely upgrade a cluster
and start a new cluster.

Apart from making schema commitlog mandatory, this patch adds two
changes that are its consequences:
- making the unneeded `force_schema_commitlog` config unused,
- deprecating the `SCHEMA_COMMITLOG` feature, which is always
  assumed to be true.

Closes scylladb/scylladb#16254
2023-12-04 21:02:16 +02:00