Commit Graph

1148 Commits

Author SHA1 Message Date
Avi Kivity
7eb3b15fff Merge 'utils/tagged_integer: remove conversion to underlying integer' from Laszlo Ersek
~~~
utils/tagged_integer: remove conversion to underlying integer

Silently converting a tagged (i.e., "dimension-ful") integer to a naked
("dimensionless") integer defeats the purpose of having tagged integers,
and is a source of practical bugs, such as
<https://github.com/scylladb/scylladb/issues/20080>.

We could make the conversion operator explicit, for enforcing

  static_cast<TAGGED_INTEGER_TYPE::value_type>(TAGGED_INTEGER_VALUE)

in every conversion location -- but that's a mouthful to write. Instead,
remove the conversion operator, and let clients call the (identically
behaving) value() member function.
~~~

No backport needed (refactoring).

The series is supposed to solve #20081.

Two patches in the series touch up code that is known to be (orthogonally) buggy; see
- `service/raft_sys_table_storage: tweak dead code` (#20080)
- `test/raft/replication: untag index_t in test_case::get_first_val()` (#20151)

Fixes for those (independent) issues will have to be rebased on this series, or this series will have to be rebased on those (due to context conflicts).

The series builds at every stage. The debug and release unit test suites pass at the end.

Closes scylladb/scylladb#20159

* github.com:scylladb/scylladb:
  utils/tagged_integer: remove conversion to underlying integer
  test/raft/randomized_nemesis_test: clean up remaining index_t usage
  test/raft/randomized_nemesis_test: clean up index_t usage in store_snapshot()
  test/raft/replication: clean up remaining index_t usage
  test/raft/replication: take an "index_t start_idx" in create_log()
  test/raft/replication: untag index_t in test_case::get_first_val()
  test/raft/etcd_test: tag index_t and term_t for comparisons and subtractions
  test/raft/fsm_test: tag index_t and term_t for comparisons and subtractions
  test/raft/helpers: tighten compare_log_entries() param types
  service/raft_sys_table_storage: tweak dead code
  service/raft_sys_table_storage: simplify (snap.idx - preserve_log_entries)
  service/raft_sys_table_storage: untag index_t and term_t for queries
  raft/server: clean up index_t usage
  raft/tracker: don't drop out of index_t space for subtraction
  raft/fsm: clean up index_t and term_t usage
  raft/log: clean up index_t usage
  db/system_keyspace: promise a tagged integer from increment_and_get_generation()
  gms/gossiper: return "strong_ordering" from compare_endpoint_startup()
  gms/gossiper: get "int32_t" value of "gms::version_type" explicitly
2024-08-19 19:52:54 +03:00
Laszlo Ersek
baccbc09c5 gms/gossiper: return "strong_ordering" from compare_endpoint_startup()
The callers of gossiper::compare_endpoint_startup() need not (should not)
learn of any particular (tagged or untagged) difference of generations;
they only care about the ordering of generations. Change the return type
of compare_endpoint_startup() to "std::strong_ordering", and delegate the
comparison to tagged_tagged_integer::operator<=>.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Laszlo Ersek
3bb608056c gms/gossiper: get "int32_t" value of "gms::version_type" explicitly
In do_sort(), we need to drop to "int32_t" temporarily, so that we can
call ::abs() on the version difference. Do that explicitly.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2024-08-14 13:35:08 +02:00
Łukasz Paszkowski
9690785112 features: add native_reverse_queries
Enabled when all replicas support the native_reversed command slice
and return the result in reverse order in this case.
2024-08-13 10:03:42 +02:00
Michał Jadwiszczak
3745d0a534 gms/feature_service: allow to suppress features
This patch adds `suppress_features` error injection. It allows to revoke
support for some features and it can be used to simulate upgrade process
in test.py.

Features to suppress are passed as injection's value, separated by `;`.
Example: `PARALLELIZED_AGGREGATION;UDA_NATIVE_PARALLELIZED_AGGREGATION`

Fixes scylladb/scylladb#20034

Closes scylladb/scylladb#20055
2024-08-09 19:15:19 +02:00
Avi Kivity
aa1270a00c treewide: change assert() to SCYLLA_ASSERT()
assert() is traditionally disabled in release builds, but not in
scylladb. This hasn't caused problems so far, but the latest abseil
release includes a commit [1] that causes a 1000 insn/op regression when
NDEBUG is not defined.

Clearly, we must move towards a build system where NDEBUG is defined in
release builds. But we can't just define it blindly without vetting
all the assert() calls, as some were written with the expectation that
they are enabled in release mode.

To solve the conundrum, change all assert() calls to a new SCYLLA_ASSERT()
macro in utils/assert.hh. This macro is always defined and is not conditional
on NDEBUG, so we can later (after vetting Seastar) enable NDEBUG in release
mode.

[1] 66ef711d68

Closes scylladb/scylladb#20006
2024-08-05 08:23:35 +03:00
Aleksandra Martyniuk
880058073b db: service: add request type column to topology_requests
topology_requests table will be used by task manager node ops tasks,
but it loses info about request type, which is required by tasks.

Add request_type column to topology_requests.
2024-07-23 13:35:01 +02:00
Avi Kivity
d50ba03965 gossiper: remove initializer-list overload of add_local_application_state()
The initializer_list overload uses a too-clever technique to avoid copies.
While copies here are unlikely to pose any real problem (we're allocating
map nodes anyway), it's simple enough to provide a copy-less replacement
that doesn't require questionable tricks.

We replace the initializer_list<..., in<>> overload with a variadic
template that constructs a temporary map.
2024-07-10 14:11:27 +03:00
Nadav Har'El
2a2e8167c8 gossiper: fix get_rpc_address() for this node
Commit dd46a92e23 introduced a function gossiper::get_rpc_address()
as a shortcut for get_application_state_ptr(endpoint, RPC_ADDRESS) -
i.e., it fetches the endpoint's configured broadcast_rpc_address
(despite its confusing name, this is the endpoint's external IP address
that clients can use to make CQL connections).

But strangely, the implementation get_rpc_address() made an exception
for asking about the *current* host - where instead of getting this
node's broadcast_rpc_address, it returns its internal address, which
is not what this function was supposed to do - it's not useful for
it to do one thing for this node, and a different thing for other
nodes, and when I wrote code that uses this function (see the next
patch), this resulted in wrong results for the current node.

The fix is simple - drop the wrong if(), and get the
broadcast_rpc_address stored by the gossiper unconditionally - the
gossiper knows it for this node just like for other nodes.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
2024-06-30 18:38:15 +03:00
Kamil Braun
13fc2bd854 Merge notify other nodes on boot from Gleb
The series adds a step during node's boot process, just before completing
the initialization, in which the node sends a notification to all other
normal nodes in the cluster that it is UP now. Other nodes wait for this
node to be UP and in normal state before replying. This ensures that,
in a healthy cluster, when a node start serving queries the entire
cluster knows its up-to-date state. The notification is a best effort
though. If some nodes are down or do not reply in time the boot process
continues. It is somewhat similar to shutdown notification in this regard.

* 'gleb/notify-up-v2' of github.com:scylladb/scylla-dev:
  gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization
  Wait for booting node to be marked UP before complete booting.
  gossiper: move gossip verbs to the idl
2024-06-25 17:58:17 +02:00
Kamil Braun
627d566811 Merge 'join_token_ring, gossip topology: recalculate sync nodes in wait_alive' from Patryk Jędrzejczak
The node booting in gossip topology waits until all NORMAL
nodes are UP. If we removed a different node just before,
the booting node could still see it as NORMAL and wait for
it to be UP, which would time out and fail the bootstrap.

This issue caused scylladb/scylladb#17526.

Fix it by recalculating the nodes to wait for in every step of the
of the `wait_alive` loop.

Although the issue fixed by this PR caused only test flakiness,
it could also manifest in real clusters. It's best to backport this
PR to 5.4 and 6.0.

Fixes scylladb/scylladb#17526

Closes scylladb/scylladb#19387

* github.com:scylladb/scylladb:
  join_token_ring, gossip topology: update obsolete comment
  join_token_ring, gossip topology: fix indendation after previous patch
  join_token_ring, gossip topology: recalculate sync nodes in wait_alive
2024-06-21 10:22:32 +02:00
Gleb Natapov
7bc05c3880 gossiper: wait for a bootstrapping node to be seen as normal on all nodes before completing initialization
When a node bootstraps it may happen that some nodes still see it as
bootstrapping while the node itself already is in normal state and ready
to serve queries. We want to delay the bootstrap completion until all
nodes see the new node as normal. Piggy back on UP notification to do so
and what of the node that sent the notification to be seen as normal.

Fixes #18678
2024-06-20 16:37:56 +03:00
Gleb Natapov
28c0a27467 Wait for booting node to be marked UP before complete booting.
Currently a node does not wait to be marked UP by other nodes before
complete booting which creates a usability issue: during a rolling restart
it is not enough to wait for local CQL port to be opened before
restarting next node, but it is also needed to check that all other
nodes already see this node as alive otherwise if next node is restarted
some nodes may see two node as dead instead of one.

This patch improves the situation by making sure that boot process does
not complete before all other nodes do not see the booting one as alive.
This is still a best effort thing: if some nodes are unreachable or
gossiper propagation takes too much time the boot process continues
anyway.

Fixes scylladb/scylladb#19206
2024-06-20 14:55:40 +03:00
Patryk Jędrzejczak
017134fd38 join_token_ring, gossip topology: recalculate sync nodes in wait_alive
Before this patch, if we booted a node just after removing
a different node, the booting node may still see the removed node
as NORMAL and wait for it to be UP, which would time out and fail
the bootstrap.

This issue caused scylladb/scylladb#17526.

Fix it by recalculating the nodes to wait for in every step of the
of the `wait_alive` loop.
2024-06-20 10:59:49 +02:00
Kefu Chai
ec5f0fccce gms: remove unused operator<<
since we've switched almost all callers of the operator<< to {fmt},
let's drop the unused operator<<:s.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-06-18 15:55:22 +08:00
Gleb Natapov
09556bff0e gossiper: move gossip verbs to the idl 2024-06-17 12:47:17 +03:00
Botond Dénes
cd10beb89d Merge 'Don't use db::config by gossiper' from Pavel Emelyanov
All sharded<service>'s a supposed to have their own config and not use global db::config one. The service config, in turn, is to be created by main/cql_test_env/whatever out of db::config and, maybe, other data. Gossiper is almost there, but it still uses db::config in few places.

Closes scylladb/scylladb#19051

* github.com:scylladb/scylladb:
  gossiper: Stop using db::config
  gossiper: Move force_gossip_generation on gossip_config
  gossiper: Move failure_detector_timeout_ms on gossip_config
  main: Fix indentation after previous patch
  main: Make gossiper config a sharded parameter
  main: Add local variable for set of seeds
  main: Add local variable for group0 id
  main: Add local variable for cluster_name
2024-06-06 09:12:51 +03:00
Benny Halevy
b2fa954d82 gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory
Assigning to a member of an uninitialized optional
does not initialize the object before assigning to it.
This resulted in the AddressSanitizer detecting attempt
to double-free when the uninitialized string contained
apprently a bogus pointer.

The change emplaces the returned optional when needed
without resorting to the copy-assignment operator.
So it's not suceptible to assigning to uninitialized
memory, and it's more efficient as well...

Fixes scylladb/scylladb#19041

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#19043
2024-06-05 13:09:01 +03:00
Pavel Emelyanov
dcc083110d gossiper: Stop using db::config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
00d8590d7e gossiper: Move force_gossip_generation on gossip_config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
e3abc5d2fd gossiper: Move failure_detector_timeout_ms on gossip_config
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-06-04 20:19:47 +03:00
Pavel Emelyanov
83d491af02 config: Remove experimental TABLETS feature
... and replace it with boolean enable_tablets option. All the places
in the code are patched to check the latter option instead of the former
feature.

The option is OFF by default, but the default scylla.yaml file sets this
to true, so that newly installed clusters turn tablets ON.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18898
2024-05-30 18:03:51 +03:00
Pavel Emelyanov
b24fb8dc87 inet_address: Remove to_sstring() in favor of fmt::to_string
The existing inet_address::to_string() calls fmt::format("{}", *this)
anyway. However, the to_string() method is declared in .cc file, while
form formatter is in the header and is equipeed with constexprs so
that converting an address to string is done as much as possible
compile-time.

Also, though minor, fmt::to_string(foo) is believed to be even faster
than fmt::format("{}", foo).

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18712
2024-05-21 09:43:08 +03:00
Avi Kivity
54a82fed6b feature, index: grandfather CORRECT_IDX_TOKEN_IN_SECONDARY_INDEX
This feature corrected how we store the token in secondary indexes. It
was introduced in 7ff72b0ba5 (2020; 4.4) and can now be assumed present
everywhere. Note that we still support indexes created with the old format.
2024-05-18 00:24:11 +03:00
Avi Kivity
2fbd78c769 feature: grandfather DIGEST_FOR_NULL_VALUES
The DIGEST_FOR_NULL_VALUES feature was added in 21a77612b3 (2020; 4.4)
and can now be assumed to be always present. The hasher which it invoked
is removed.
2024-05-18 00:24:00 +03:00
Avi Kivity
7c264e8a71 feature: grandfather PER_TABLE_CACHING
The PER_TABLE_CACHING feature was added in 0475dab359 (2020; 4.2)
and can now be assumed to be always present.
2024-05-18 00:23:30 +03:00
Avi Kivity
d52c424a5f feature: grandfather LWT
LWT was make non-experimental in 9948f548a5 (2020; 4.1) and can now be
assumed to be always present.
2024-05-18 00:20:53 +03:00
Avi Kivity
93088d0921 feature: grandfather HINTED_HANDOFF_SEPARATE_CONNECTION
The HINTED_HANDOFF_SEPARATE_CONNECTION feature was introduced in 3a46b1bb2b (2019; 3.3)
and can be assumed always present.
2024-05-18 00:18:27 +03:00
Avi Kivity
3bead8cea0 feature: grandfather PER_TABLE_PARTITIONERS
The PER_TABLE_PARTITIONERS feature was added in 90df9a44ce (2020; 4.0)
and can now be assumed to be always present. We also remove the associated
schema_feature.
2024-05-18 00:15:07 +03:00
Avi Kivity
93113da01b feature: grandfather NONFROZEN_UDTS
The NONFROZEN_UDTS feature was added in e74b5deb5d (2019; 3.2)
and can now be assumed to be always present.
2024-05-17 20:41:20 +03:00
Avi Kivity
c7d7ca2c23 feature: grandfather CDC
The CDC feature was made non-experimental in e9072542c1 (2020; 4.4)
and can now be assumed to be always present. We also remove the corresponding
schema_feature.
2024-05-17 20:41:20 +03:00
Avi Kivity
82ad2913ca feature: grandfather DIGEST_INSENSITIVE_TO_EXPIRY
The DIGEST_INSENSITIVE_TO_EXPIRY feature was added in 9de071d214 (2019; 3.2)
and can now be assumed to be always present. We enable the corresponding
schema_feature unconditionally.

We do not remove the corresponding schema feature, because it can be disabled
when the related TABLE_DIGEST_INSENSITIVE_TO_EXPIRY is present.
2024-05-17 20:41:19 +03:00
Avi Kivity
b5f6021a6b feature: grandfather VIEW_VIRTUAL_COLUMNS
The VIEW_VIRTUAL_COLUMNS feature was added in a108df09f9 (2019; 3.1)
and can now be assumed to be always present.

The corresponding schema_feature is removed. Note schema_features are not sent
over the wire. A digest calculation without VIEW_VIRTUAL_COLUMNS is no longer tested.
2024-05-17 20:41:19 +03:00
Avi Kivity
7952200c8c feature: grandfather ME_SSTABLE feature
"me" format sstables were introduced in d370558279 (Jan 2022; 5.1)
and so can be assumed always present. The listener that checks when
the cluster understands ME_SSTABLE was removed and in its place
we default to sstable_version_types::me (and call on_enabled()
immediately).
2024-05-17 20:41:19 +03:00
Avi Kivity
6d0c0b542c feature: grandfather MD_SSTABLE_FORMAT
"md" sstable support was introduced in e8d7744040 (2020; 4.4)
and so can be assumed to be present on all versions we upgrade from.
Nothing appears to depend on it.
2024-05-17 20:41:19 +03:00
Benny Halevy
796ca367d1 gossiper: rename topo_sm member to _topo_sm
Follow scylla convention for class member naming.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>

Closes scylladb/scylladb#18528
2024-05-12 11:02:35 +03:00
Gleb Natapov
3b40d450e5 gossiper: try to locate an endpoint by the host id when applying state if search by IP fails
Even if there is no endpoint for the given IP the state can still belong to existing endpoint that
was restarted with different IP, so lets try to locate the endpoint by host id as well. Do it in raft
topology mode only to not have impact on gossiper mode.

Also make the test more robust in detecting wrong amount of entries in
the peers table. Today it may miss that there is a wrong entry there
because the map will squash two entries for the same host id into one.

Fixes: scylladb/scylladb#18419
Fixes: scylladb/scylladb#18457
2024-05-09 13:14:54 +02:00
Botond Dénes
96a7ed7efb Merge 'sstables: add dead row count when issuing warning to system.large_partitions' from Ferenc Szili
This is the second half of the fix for issue #13968. The first half is already merged with PR #18346

Scylla issues warnings for partitions containing more rows than a configured threshold. The warning is issued by inserting a row into the `system.large_partitions` table. This row contains the information about the partition for which the warning is issued: keyspace, table, sstable, partition key and size, compaction time and the number of rows in the partition. A previous PR #18346 also added range tombstone count to this row.

This change adds a new counter for dead rows to the large_partitions table.

This change also adds cluster feature protection for writing into these new counters. This is needed in case a cluster is in the process of being upgraded to this new version, after which an upgraded node writes data with the new schema into `system.large_partitions`, and finally a node is then rolled back to an old version. This node will then revert the schema to the old version, but the written sstables will still contain data with the new counters, causing any readers of this table to throw errors when they encounter these cells.

This is an enhancement, and backporting is not needed.

Fixes #13968

Closes scylladb/scylladb#18458

* github.com:scylladb/scylladb:
  sstable: added test for counting dead rows
  sstable: added docs for system.large_partitions.dead_rows
  sstable: added cluster feature for dead rows and range tombstones
  sstable: write dead_rows count to system.large_partitions
  sstable: added counter for dead rows
2024-05-09 08:26:43 +03:00
Piotr Dulikowski
64ba620dc2 Merge 'hinted handoff: Use host IDs instead of IPs in the module' from Dawid Mędrek
This pull request introduces host ID in the Hinted Handoff module. Nodes are now identified by their host IDs instead of their IPs. The conversion occurs on the boundary between the module and `storage_proxy.hh`, but aside from that, IPs have been erased.

The changes take into considerations that there might still be old hints, still identified by IPs, on disk – at start-up, we map them to host IDs if it's possible so that they're not lost.

Refs scylladb/scylladb#6403
Fixes scylladb/scylladb#12278

Closes scylladb/scylladb#15567

* github.com:scylladb/scylladb:
  docs: Update Hinted Handoff documentation
  db/hints: Add endpoint_downtime_not_bigger_than()
  db/hints: Migrate hinted handoff when cluster feature is enabled
  db/hints: Handle arbitrary directories in resource manager
  db/hints: Start using hint_directory_manager
  db/hints: Enforce providing IP in get_ep_manager()
  db/hints: Introduce hint_directory_manager
  db/hints/resource_manager: Update function description
  db/hints: Coroutinize space_watchdog::scan_one_ep_dir()
  db/hints: Expose update lock of space watchdog
  db/hints: Add function for migrating hint directories to host ID
  db/hints: Take both IP and host ID when storing hints
  db/hints: Prepare initializing endpoint managers for migrating from IP to host ID
  db/hints: Migrate to locator::host_id
  db/hints: Remove noexcept in do_send_one_mutation()
  service: Add locator::host_id to on_leave_cluster
  service: Fix indentation
  db/hints: Fix indentation
2024-05-06 09:58:18 +02:00
Kefu Chai
0b0e661a85 build: bring abseil submodule back
because of https://bugzilla.redhat.com/show_bug.cgi?id=2278689,
the rebuilt abseil package provided by fedora has different settings
than the ones if the tree is built with the sanitizer enabled. this
inconsistency leads to a crash.

to address this problem, we have to reinstate the abseil submodule, so
we can built it with the same compiler options with which we build the
tree.

in this change

* Revert "build: drop abseil submodule, replace with distribution abseil"
* update CMake building system with abseil header include settings
* bump up the abseil submodule to the latest LTS branch of abseil:
  lts_2024_01_16
* update scylla-gdb.py to adapt to the new structure of
  flat_hash_map

This reverts commit 8635d24424.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18511
2024-05-05 23:31:09 +03:00
Ferenc Szili
90634b419c sstable: added cluster feature for dead rows and range tombstones
Previously, writing into system.large_partitions was done by calling
record_large_partition(). In order to write different data based on
the cluster feature flag, another level of indirection was added by
calling _record_large_partitions which is initialized to a lambda
which calls internal_record_large_partitions(). This function does
not record the values of the two new columns (dead_rows and
range_tombstones). After the cluster feature flag becomes true,
_record_large_partitions is set to a lambda which calls
internal_record_large_partitions_all_data() which record the values
of the two new columns.
2024-05-02 11:49:46 +02:00
Dawid Medrek
0ef8d67d32 db/hints: Migrate hinted handoff when cluster feature is enabled
These changes migrate hinted handoff to using
host ID as soon as the corresponding cluster
feature is enabled.

When a node starts, it defaults to creating
directories naming them after IP addresses.
When the whole cluster has upgraded
to a version of Scylla that can handle
directories representing host IDs,
we perform a migration of the IP folders,
i.e. we try to rename them to host IDs.
Invalid directories, i.e. those that
represent neither an IP address, nor a host
ID, are removed.

During the migration, hinted handoff is
disabled. It is necessary because we have
to modify the disk's contents, so new hints
cannot be saved until the migration finishes.
2024-04-28 01:22:57 +02:00
Patryk Jędrzejczak
0d428a3857 treewide: fix indentation after the previous patch 2024-04-25 14:33:21 +02:00
Patryk Jędrzejczak
3a34bb18cd db: config: make consistent-topology-changes unused
We make the `consistent-topology-changes` experimental feature
unused and assumed to be true in 6.0. We remove code branches that
executed if `consistent-topology-changes` was disabled.
2024-04-25 14:33:21 +02:00
Kamil Braun
3363f6e1e8 Merge 'Fix write failures during node replace with same IP with topology over raft' from Gleb
Currently a new node is marked as alive too late, after it is already
reported as a pending node. The patch series changes replace procedure
to be the same as what node_ops do: first stop reporting the IP of the
node that is being replaced as a natural replica for writes, then mark
the IP is alive, and only after that report the IP as a pending endpoint.

Fixes: scylladb/scylladb#17421

* 'gleb/17421-fix-v2' of github.com:scylladb/scylla-dev:
  test_replace_reuse_ip: add data plane load
  sync_raft_topology_nodes: make replace procedure similar to nodeops one
  storage_service: topology_coordinator: fix indentation after previous patch
  storage_service: topology coordinator: drop ring check in node_state::replacing state
2024-04-24 17:09:01 +02:00
Gleb Natapov
4614fedd22 sync_raft_topology_nodes: make replace procedure similar to nodeops one
In replace-with-same-ip a new node calls gossiper.start_gossiping
from join_token_ring with the 'advertise' parameter set to false.
This means that this node will fail echo RPC-s from other nodes,
making it appear as not alive to them. The node changes this only
in storage_service::join_node_response_handler, when the topology
coordinator notifies it that it's actually allowed to join the
cluster. The node calls _gossiper.advertise_to_nodes({}), and
only from this moment other nodes can see it as alive.

The problem is that topology coordinator sends this notification
in topology::transition_state::join_group0 state. In this state
nodes of the cluster already see the new node as pending,
they react with calling tmpr->add_replacing_endpoint and
update_topology_change_info when they process the corresponding
raft notification in sync_raft_topology_nodes. When the new
token_metadata is published, assure_sufficient_live_nodes
sees the new node in pending_endpoints. All of this happen
before the new node handled successful join notification,
so it's not alive yet. Suppose we had a cluster with three
nodes and we're replacing on them with a fourth node.
For cl=qurum assure_sufficient_live_nodes throws if
live < need + pending, which in our case becomes 2 < 2 + 1.
The end effect is that during replace-with-same-ip
data plane requests can fail with unavailable_exception,
breaking availability.

The patch makes boot procedure more similar to node ops one.
It splits the marking of a node as "being replaced" and adding it to
pending set in to different steps and marks it as alive in the middle.
So when the node is in topology::transition_state::join_group0 state
it marked as "being replaced" which means it will no longer be used for
reads and writes. Then, in the next state, new node is marked as alive and
is added to pending list.

fixes scylladb/scylladb#17421
2024-04-24 16:59:22 +03:00
Gleb Natapov
06e6ed09ed gossiper: disable status check for endpoints in raft mode
Gossiper automatically removes endpoints that do not have tokens in
normal state and either do not send gossiper updates or are dead for a
long time. We do not need this with topology coordinator mode since in
this mode the coordinator is responsible to manage the set of nodes in
the cluster. In addition the patch disables quarantined endpoint
maintenance in gossiper in raft mode and uses left node list from the
topology coordinator to ignore updates for nodes that are no longer part
of the topology.
2024-04-21 16:36:07 +03:00
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00