Commit Graph

796 Commits

Author SHA1 Message Date
Kefu Chai
372a4d1b79 treewide: do not define FMT_DEPRECATED_OSTREAM
since we do not rely on FMT_DEPRECATED_OSTREAM to define the
fmt::formatter for us anymore, let's stop defining `FMT_DEPRECATED_OSTREAM`.

in this change,

* utils: drop the range formatters in to_string.hh and to_string.c, as
  we don't use them anymore. and the tests for them in
  test/boost/string_format_test.cc are removed accordingly.
* utils: use fmt to print chunk_vector and small_vector. as
  we are not able to print the elements using operator<< anymore
  after switching to {fmt} formatters.
* test/boost: specialize fmt::details::is_std_string_like<bytes>
  due to a bug in {fmt} v9, {fmt} fails to format a range whose
  element type is `basic_sstring<uint8_t>`, as it considers it
  as a string-like type, but `basic_sstring<uint8_t>`'s char type
  is signed char, not char. this issue does not exist in {fmt} v10,
  so, in this change, we add a workaround to explicitly specialize
  the type trait to assure that {fmt} format this type using its
  `fmt::formatter` specialization instead of trying to format it
  as a string. also, {fmt}'s generic ranges formatter calls the
  pair formatter's `set_brackets()` and `set_separator()` methods
  when printing the range, but operator<< based formatter does not
  provide these method, we have to include this change in the change
  switching to {fmt}, otherwise the change specializing
  `fmt::details::is_std_string_like<bytes>` won't compile.
* test/boost: in tests, we use `BOOST_REQUIRE_EQUAL()` and its friends
  for comparing values. but without the operator<< based formatters,
  Boost.Test would not be able to print them. after removing
  the homebrew formatters, we need to use the generic
  `boost_test_print_type()` helper to do this job. so we are
  including `test_utils.hh` in tests so that we can print
  the formattable types.
* treewide: add "#include "utils/to_string.hh" where
  `fmt::formatter<optional<>>` is used.
* configure.py: do not define FMT_DEPRECATED_OSTREAM
* cmake: do not define FMT_DEPRECATED_OSTREAM

Refs #13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:57:36 +08:00
Kefu Chai
a439ebcfce treewide: include fmt/ranges.h and/or fmt/std.h
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we include `fmt/ranges.h` and/or `fmt/std.h`
for formatting the container types, like vector, map
optional and variant using {fmt} instead of the homebrew
formatter based on operator<<.
with this change, the changes adding fmt::formatter and
the changes using ostream formatter explicitly, we are
allowed to drop `FMT_DEPRECATED_OSTREAM` macro.

Refs scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
2024-04-19 22:56:16 +08:00
Kefu Chai
168ade72f8 treewide: replace formatter<std::string_view> with formatter<string_view>
in in {fmt} before v10, it provides the specialization of `fmt::formatter<..>`
for `std::string_view` as well as the specialization of `fmt::formatter<..>`
for `fmt::string_view` which is an implementation builtin in {fmt} for
compatibility of pre-C++17. and this type is used even if the code is
compiled with C++ stadandard greater or equal to C++17. also, before v10,
the `fmt::formatter<std::string_view>::format()` is defined so it accepts
`std::string_view`. after v10, `fmt::formatter<std::string_view>` still
exists, but it is now defined using `format_as()` machinery, so it's
`format()` method does not actually accept `std::string_view`, it
accepts `fmt::string_view`, as the former can be converted to
`fmt::string_view`.

this is why we can inherit from `fmt::formatter<std::string_view>` and
use `formatter<std::string_view>::format(foo, ctx);` to implement the
`format()` method with {fmt} v9, but we cannot do this with {fmt} v10,
and we would have following compilation failure:

```
FAILED: service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o
/home/kefu/.local/bin/clang++ -DFMT_DEPRECATED_OSTREAM -DFMT_SHARED -DSCYLLA_BUILD_MODE=release -DSEASTAR_API_LEVEL=7 -DSEASTAR_LOGGER_COMPILE_TIME_FMT -DSEASTAR_LOGGER_TYPE_STDOUT -DSEASTAR_SCHEDULING_GROUPS_COUNT=16 -DSEASTAR_SSTRING -DXXH_PRIVATE_API -DCMAKE_INTDIR=\"RelWithDebInfo\" -I/home/kefu/dev/scylladb -I/home/kefu/dev/scylladb/build/gen -I/home/kefu/dev/scylladb/seastar/include -I/home/kefu/dev/scylladb/build/seastar/gen/include -I/home/kefu/dev/scylladb/build/seastar/gen/src -ffunction-sections -fdata-sections -O3 -g -gz -std=gnu++20 -fvisibility=hidden -Wall -Werror -Wextra -Wno-error=deprecated-declarations -Wimplicit-fallthrough -Wno-c++11-narrowing -Wno-deprecated-copy -Wno-mismatched-tags -Wno-missing-field-initializers -Wno-overloaded-virtual -Wno-unsupported-friend -Wno-enum-constexpr-conversion -Wno-unused-parameter -ffile-prefix-map=/home/kefu/dev/scylladb=. -march=westmere -mllvm -inline-threshold=2500 -fno-slp-vectorize -U_FORTIFY_SOURCE -Werror=unused-result -MD -MT service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -MF service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o.d -o service/CMakeFiles/service.dir/RelWithDebInfo/topology_state_machine.cc.o -c /home/kefu/dev/scylladb/service/topology_state_machine.cc
/home/kefu/dev/scylladb/service/topology_state_machine.cc:254:41: error: no matching member function for call to 'format'
  254 |     return formatter<std::string_view>::format(it->second, ctx);
      |            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~
/usr/include/fmt/core.h:2759:22: note: candidate function template not viable: no known conversion from 'seastar::basic_sstring<char, unsigned int, 15>' to 'const fmt::basic_string_view<char>' for 1st argument
 2759 |   FMT_CONSTEXPR auto format(const T& val, FormatContext& ctx) const
      |                      ^      ~~~~~~~~~~~~
```

because the inherited `format()` method actually comes from
`fmt::formatter<fmt::string_view>`. to reduce the confusion, in this
change, we just inherit from `fmt::format<string_view>`, where
`string_view` is actually `fmt::string_view`. this follows
the document at
https://fmt.dev/latest/api.html#formatting-user-defined-types,
and since there is less indirection under the hood -- we do not
use the specialization created by `FMT_FORMAT_AS` which inherit
from `formatter<fmt::string_view>`, hopefully this can improve
the compilation speed a little bit. also, this change addresses
the build failure with {fmt} v10.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18299
2024-04-19 07:44:07 +03:00
Kamil Braun
eb9ba914a3 Merge 'Set dc and rack in gossiper when loaded from system.peers and load the ignored nodes state for replace' from Benny Halevy
The problem this series solves is correctly ignoring DOWN nodes state
when replacing a node.

When a node is replaced and there are other nodes that are down, the
replacing node is told to ignore those DOWN nodes using the
`ignore_dead_nodes_for_replace` option.

Since the replacing node is bootstrapping it starts with an empty
system.peers table so it has no notion about any node state and it
learns about all other nodes via gossip shadow round done in
`storage_service::prepare_replacement_info`.

Normally, since the DOWN nodes to ignore already joined the ring, the
remaining node will have their endpoint state already in gossip, but if
the whole cluster was restarted while those DOWN nodes did not start,
the remaining nodes will only have a partial endpoint state from them,
which is loaded from system.peers.

Currently, the partial endpoint state contains only `HOST_ID` and
`TOKENS`, and in particular it lacks `STATUS`, `DC`, and `RACK`.

The first part of this series loads also `DC` and `RACK` from
system.peers to make them available to the replacing node as they are
crucial for building a correct replication map with network topology
replication strategy.

But still, without a `STATUS` those nodes are not considered as normal
token owners yet, and they do not go through handle_state_normal which
adds them to the topology and token_metadata.

The second part of this series uses the endpoint state retrieved in the
gossip shadow round to explicitly add the ignored nodes' state to
topology (including dc and rack) and token_metadata (tokens) in
`prepare_replacement_info`.  If there are more DOWN nodes that are not
explicitly ignored replace will fail (as it should).

Fixes scylladb/scylladb#15787

Closes scylladb/scylladb#15788

* github.com:scylladb/scylladb:
  storage_service: join_token_ring: load ignored nodes state if replacing
  storage_service: replacement_info: return ignore_nodes state
  locator: host_id_or_endpoint: keep value as variant
  gms: endpoint_state: add getters for host_id, dc_rack, and tokens
  storage_service: topology_state_load: set local STATUS state using add_saved_endpoint
  gossiper: add_saved_endpoint: set dc and rack
  gossiper: add_saved_endpoint: fixup indentation
  gossiper: add_saved_endpoint: make host_id mandatory
  gossiper: add load_endpoint_state
  gossiper: start_gossiping: log local state
2024-04-16 10:27:36 +02:00
Pavel Emelyanov
725b2863d2 tablet: Make pending replica optional
Just like leaving replica could be optional when adding replica to
tablet, the pending replica can be optional too if we're removing a
replica from tablet

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-15 16:31:07 +03:00
Benny Halevy
7c2bd8dc34 locator: host_id_or_endpoint: keep value as variant
Rather than allowing to keep both
host_id and endpoint, keep only one of them
and provide resolve functions that use the
token_metadata to resolve the host_id into
an inet_address or vice verse.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-04-14 15:25:50 +03:00
Pavel Emelyanov
b0cba57e29 tablet: Make leaving replica optional
When getting leaving replica from from tablet info and transition info,
the getter code assumes that this replica always exists. It's not going
to be the case soon, so make the return value be optional.

There are four places that mess with leaving replica:

- stream tablet handler: this place checks that the leaving replica is
  _not_ current host. If leaving replica is missing, the check should
  pass

- cleanup tablet handler: this place checks that the leaving replica
  _is_ current host. If leaving replica is missing, the check should
  fail as well

- topology coordinator: it gets leaving replica to call cleanup on. If
  leaving replica is missing, the cleanup call is short-circuited to
  succeed immediately

- load-stats calculator: it checks if the leaving replica is self. This
  check is not patched as it's automatically satisfied by std::optional
  comparison operator overload for wrapped type

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-04-04 09:03:36 +03:00
Raphael S. Carvalho
12714a4123 locator: Avoid tablet map lookup on every write for getting replicas
We can cache tablet map in erm, to avoid looking it up on every write for
getting write replicas. We do that in tablet_sharder, but not in tablet
erm. Tablet map is immutable in the context of a given erm, so the
address of the map is stable during erm lifetime.

This caught my attention when looking at perf diff output
(comparing tablet and vnode modes).

It also helps when erm is called again on write completion for
checking locality, used for forwarding info to the driver if needed.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#18158
2024-04-03 10:28:04 +02:00
Pavel Emelyanov
1adf16ce73 Merge 'network_topology_strategy: reallocate_tablets: support for rf changes' from Benny Halevy
This series provides a reallocate_tablets function, that's initially called by allocate_tablets_for_new_table.
The new allocation implementation is independent of vnodes/token ownership.
Rather than using the natural_endpoints_tracker, it implements its own tracking
based on dc/rack load (== number of replicas in rack), with the additional benefit
that tablet allocation will balance the allocation across racks, using a heap structure,
similar to the one we use to balance tablet allocation across shards in each node.

reallocate_tablets may also be called with an optional parameter pointing the the current tablet_map.
In this case the function either allocates more tablet replicas in datacenters for which the replication factor was increased,
or it will deallocate tablet replicas from datacenters for which replication factor was decreased.

The NetworkTopologyStrategy_tablets_test unit test was extended to cover replication factor changes.

Closes scylladb/scylladb#17846

* github.com:scylladb/scylladb:
  network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
  network_topology_startegy_test: add NetworkTopologyStrategy_tablet_allocation_balancing_test
  network_topology_strategy: reallocate_tablets: support deallocation via rf change
  network_topology_startegy_test: tablets_test: randomize cases
  network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
  network_topology_startegy_test: add NetworkTopologyStrategy_tablets_negative_test
  network_topology_strategy_test: endpoints_check: use particular BOOST_CHECK_* functions
  network_topology_strategy_test: endpoints_check: verify that replicas are placed on unique nodes
  network_topology_strategy_test: endpoints_check: strictly check rf for tablets
  network_topology_strategy_test: full_ring_check for tablets: drop unused options param
2024-03-28 11:19:11 +03:00
Benny Halevy
8a77319cb7 network_topology_strategy: reallocate_tablets: consider new_racks before existing racks
Allocate first from new (unpopulated) racks before
allocating from racks that are already populated
with replicas.

Still, rotate both new and existing racks by tablet id
to ensure fairness.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
4a7d57525e network_topology_strategy: reallocate_tablets: support deallocation via rf change
Add support for deallocating tablet replicas when the
datacenter replication factor is decreased.

We deallocate replicas back-to-front order to maintain
replica pairing between the base table and
its materialized views.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:24 +02:00
Benny Halevy
898cd1d404 network_topology_strategy: allocate_tablets_for_new_table: do not rely on token ownership
Base initial tablets allocation for new table
on the dc/rack topology, rather then on the token ring,
to remove the dependency on token ownership.

We keep the rack ordinal order in each dc
to facilitate in-rack pairing of base/view
replica pairing, and we apply load-balancing
principles by sorting the nodes in each rack
by their load (number of tablets allocated to
the node), and attempting to fill lease-loaded
nodes first.

This method is more efficient than circling
the token ring and attemting to insert the endpoints
to the natural_endpoint_tracker until the replication
factor per dc is fulfilled, and it allows an easier
way to incrementally allocate more replicas after
rf is increased.

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2024-03-27 12:06:21 +02:00
Kefu Chai
2e2c3a5fea locator: fix a typo in comment
s/Substracts/Subtracts/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#18048
2024-03-27 10:15:18 +02:00
Pavel Emelyanov
04370dc8a4 tablets: Introduce substract_sets()
There are several places in code that calculate replica sets associated
with specific tablet transision. Having a helper to substract two sets
improves code readability.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#18033
2024-03-26 23:33:06 +02:00
Avi Kivity
4ddf82e58b treewide: don't #include "gms/feature_service.hh" from other headers
feature_service.hh is a high-level header that integrates much
of the system functionality, so including it in lower-level headers
causes unnecessary rebuilds. Specifically, when retiring features.

Fix by removing feature_service.hh from headers, and supply forward
declarations and includes in .cc where needed.

Closes scylladb/scylladb#18005
2024-03-26 15:31:18 +02:00
Raphael S. Carvalho
6bdb456fad sstables_loader: Fix loader when write selector is previous during tablet migration
The loader is writing to pending replica even when write selector is set
to previous. If migration is reverted, then the writes won't be rolled
back as it assumes pending replicas weren't written to yet. That can
cause data resurrection if tablet is later migrated back into the same
replica.

NOTE: write selector is handled correctly when set to next, because
get_natural_endpoints() will return the next replica set, and none
of the replicas will be considered leaving. And of course, selector
set to both is also handled correctly.

Fixes #17892.

Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>

Closes scylladb/scylladb#17902
2024-03-24 01:20:50 +01:00
Tomasz Grabiec
1c71f44e63 tablets, raft topology: Rebuild tablets after replacing node is normal
This fixes a problem with replacing a node with tablets when
RF=N. Currently, this will fail because new tablet replica allocation
will not be able to find a viable destination, as the replacing node
is not considered a candidate. It cannot be a candidate because
replace rolls back on failure and we cannot roll back after tablets
were migrated.

The solution taken here is to not drain tablet replicas from replaced
node during topology request but leave it to happen later after the
replaced node is left and replacing node is normal.

The replacing node waits for this draining to be complete on boot
before the node is considered booted.

Fixes #17025
2024-03-15 13:20:08 +01:00
Tomasz Grabiec
888dc41d66 effective_replication_map: Introduce host_id-based get_replicas() 2024-03-15 11:05:29 +01:00
Tomasz Grabiec
61b3453552 raft topology: Keep nodes in the left state to topology
Those nodes will be kept in tablet replica sets for a while after node
replace is done, until the new replica is rebuilt. So we need to know
about those node's location (dc, rack) for two reasons:

 1) algorithms which work with replica sets filter nodes based on
 their location. For example materialized views code which pairs base
 replicas with view replicas filters by datacenter first.

 2) tablet scheduler needs to identify each node's location in order
 to make decisions about new replica placement.

It's ok to not know the IP, and we don't keep it. Those nodes will not
be present in the IP-based replica sets, e.g. those returned by
get_natural_endpoints(), only in host_id-based replica
sets. storage_proxy request coordination is not affected.

Nodes in the left state are still not present in token ring, and not
considered to be members of the ring (datacanter endpoints excludes them).

In the future we could make the change even more transparent by only
loading locator::node* for those nodes and keeping node* in tablet
replica sets.

We load topology infromation only for left nodes which are actually
referenced by any tablet. To achieve that, topology loading code
queries system.tablet for the set of hosts. This set is then passed to
system.topology loading method which decides whether to load
replica_state for a left node or not.
2024-03-15 11:05:29 +01:00
Patryk Wrobel
75aadeb32f locator/effective_replication_map: make 'get_ranges(inet_address ep)' virtual
Before this patch, the mentioned function was a specific
member of vnode_effective_replication_strategy class.
To allow its usage also when tablets are enabled it was
shifted to the base class - effective_replication_strategy
and made pure virtual to force the derived classes to
implement it.

It is used by 'storage_service::get_ranges_for_endpoint()'
that is used in calculation of effective ownership. Such
calculation needs to be performed also when tablets are
enabled.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Patryk Wrobel
3fff6bd407 locator/tablets: add tablet_map::get_sorted_tokens()
This change introudces a new member function that
returns a vector of sorted tokens where each pair of adjacent
elements depicts a range of tokens that belong to tablet.

It will be used to produce the equivalent of sorted_tokens() of
vnodes when trying to use dht::describe_ownership() for tablets.

Refs: scylladb#17342
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>
2024-03-11 09:50:20 +01:00
Kefu Chai
64e14d21db locator/tablets: add fmt::formatter for tablet_*
before this change, we rely on the default-generated fmt::formatter
created from operator<<, but fmt v10 dropped the default-generated
formatter.

in this change, we define formatters for

* tablet_id
* tablet_replica
* tablet_metadata
* tablet_map

their operator<<:s are dropped

Refs scylladb/scylladb#13245

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17504
2024-03-07 09:00:49 +03:00
Kefu Chai
7e9b0d3d9e network_topology_strategy: use structured binding when appropriate
for better readability

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17642
2024-03-06 09:52:20 +02:00
Kefu Chai
643c01fd80 locator: fix typo in comment -- s/slecting/selecting/
fix a typo

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17470
2024-02-22 13:28:18 +02:00
Tomasz Grabiec
ef9e5e64a3 locator: token_metadata: Introduce topology barrier stall detector
When topology barrier is blocked for longer than configured threshold
(2s), stale versions are marked as stalled and when they get released
they report backtrace to the logs. This should help to identify what
was holding for token metadata pointer for too long.

Example log:

  token_metadata - topology version 30 held for 299.159 [s] past expiry, released at:  0x2397ae1 0x23a36b6 ...

Closes scylladb/scylladb#17427
2024-02-21 15:05:34 +02:00
Botond Dénes
7bdd0c2cae locator: introduce tablet_range_spliter
Given a list of partition-ranges, yields the intersection of this
range-list, with that of that tablet-ranges, for tablets located on the
given host.
This will be used in multishard_mutation_query.cc, to obtain the ranges
to read from the local node: given the read ranges, obtain the ranges
belonging to tablets who have replicas on the local node.
2024-02-21 02:08:48 -05:00
Avi Kivity
605bf6e221 range.hh: retire
range.hh was deprecated in bd794629f9 (2020) since its names
conflict with the C++ library concept of an iterator range. The name
::range also mapped to the dangerous wrapping_interval rather than
nonwrapping_interval.

Complete the deprecation by removing range.hh and replacing all the
aliases by the names they point to from the interval library. Note
this now exposes uses of wrapping intervals as they are now explicit.

The unit tests are renamed and range.hh is deleted.

Closes scylladb/scylladb#17428
2024-02-21 00:24:25 +02:00
Tomasz Grabiec
e63d8ae272 Merge 'Handle tablet migration failure while streaming' from Pavel Emelyanov
It can happen that a node is lost during tablet migration involving that node. Migration will be stuck, blocking topology state machine. To recover from this, the current procedure is for the admin to execute nodetool removenode or replacing the node. This marks the node as "ignored" and tablet state machine can pick this up and abort the migration.

This PR implements the handling for streaming stage only and adds a test for it. Checking other stages needs more work with failure injection to inject failures into specific barrier.

To handle streaming failure two new stages are introduced -- cleanup_target and revert_migration. The former is to clean the pending replica that could receive some data by the time streaming stopped working, the latter is like end_migration, but doesn't commit the new_replicas into replicas field.

refs: #16527

Closes scylladb/scylladb#17360

* github.com:scylladb/scylladb:
  test/topology: Add checking error paths for failed migration
  topology.tablets_migration: Handle failed streaming
  topology.tablets_migration: Add cleanup_target transition stage
  topology.tablets_migration: Add revert_migration transition stage
  storage_service: Rewrap cleanup stage checking in cleanup_tablet()
  test/topology: Move helpers to get tablet replicas to pylib
2024-02-20 18:50:55 +01:00
Kefu Chai
b0bb3ab5b0 topology: print node* with node_printer
in da53854b66, we added formatter for printing a `node*`, and switched
to this formatter when printing `node*`. but we failed to update some
caller sites when migrating to the new formatter, where a
`unique_ptr<node>` is printed instead. this is not the behavior before
the change, and is not expected.

so, in this change, we explicitly instantiate `node_printer` instances
with the pointer held by `unique_ptr<node>`, to restore the behavior
before da53854b66.

this issue was identified when compiling the tree using {fmt} v10 and
compile-time format-string check enabled, which is yet upstreamed to
Seastar.

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#17418
2024-02-20 14:35:56 +03:00
Pavel Emelyanov
72f3b1d5fe topology.tablets_migration: Add cleanup_target transition stage
The new stage will be used to revert migration that fails at some
stages. The goal is to cleanup the pending replica, which may already
received some writes by doing the cleanup RPC to the pending replica,
then jumping to "revert_migration" stage introduced earlier.

If pending node is dead, the call to cleanup RPC is skipped.

Coordinators use old replicas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:59:06 +03:00
Pavel Emelyanov
ced5bf56eb topology.tablets_migration: Add revert_migration transition stage
It's like end_migration, but old replicas intact just removing the
transition (including new replicas).

Coordinators use old replicas.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-20 08:53:36 +03:00
Botond Dénes
42a76ca568 Merge 'Improve printing of nodes and backtraces in topology' from Pavel Emelyanov
There's a bunch of debug- and trace-level logging of locator::node-s that also include current_backtrace(). Printing node is done via debug_format() helper that generates and returns an sstring to print. Backtrace printing is not very lightweight on its own because of backtrace collecting. Not to slow things down in info log level, which is default, all such prints are wrapped with explicit if-s about log-level being enabled or not.

This PR removes those level checks by introducing lazy_backtrace() helper and by providing a formatter for nodes that also results in lazy node format string calculation.

Closes scylladb/scylladb#17235

* github.com:scylladb/scylladb:
  topology: Restore indentation after previous patch
  topology: Drop if_enabled checks for logging
  topology: Add lazy_backtrace() helper
  topology: Add printer wrapper for node* and formatter for it
  topology: Expand formatter<locator::node>
2024-02-19 09:32:53 +02:00
Patryk Wrobel
a3fb44cbca Rename keyspace::get_effective_replication_map()
This commit renames keyspace::get_effective_replication_map()
to keyspace::get_vnode_effective_replication_map(). This change
is required to ease the analysis of the usage of this function.

When tablets are enabled, then this function shall not be used.
Instead of per-keyspace, per-table replication map should be used.
The rename was performed to distinguish between those two calls.
The next step will be an audit of usages of
keyspace::get_vnode_effective_replication_map().

Refs: scylladb#16626
Signed-off-by: Patryk Wrobel <patryk.wrobel@scylladb.com>

Closes scylladb/scylladb#17314
2024-02-13 20:22:02 +02:00
Botond Dénes
3f2d7e8b25 tree: remove unnecessary yields around for_each_tablet()
Commit 904bafd069 consolidated the two
existing for_each_tablet() overloads, to the one which has a future<>
returning callback. It also added yields to the bodies of said
callbacks. This is unnecessary, the loop in for_each_tablet() already
has a yield per tablet, which should be enough to prevent stalls.

This patch is a follow-up to #17118

Closes scylladb/scylladb#17284
2024-02-12 17:10:25 +01:00
Pavel Emelyanov
309d34a147 topology: Restore indentation after previous patch
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
f7a13b9bb0 topology: Drop if_enabled checks for logging
Now all the logged arguments are lazily evaluated (node* format string
and backtrace) so the preliminary log-level checks are not needed.

indentation is deliberately left broken

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
c1ea6c8acf topology: Add lazy_backtrace() helper
This helper returns lazy_eval-ed current_backtrace(), so it will be
generated and printed only if logger is really going to do it with its
current log-level.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
da53854b66 topology: Add printer wrapper for node* and formatter for it
Currently to print node information there's a debug_format(node*) helper
function that returns back an sstring object. Here's the formatter
that's more flexible and convenient, and a node_printer wrapper, since
formatters cannot format non-void pointers.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Pavel Emelyanov
aa0293f411 topology: Expand formatter<locator::node>
Equip it with :v specifier that turns verbose mode on and prints much
more data about the node. Main user will appear in the next patch.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-09 13:49:15 +03:00
Botond Dénes
35da9551fb Merge 'storage_service: Add describe_ring support for tablet table' from Asias He
The table query param is added to get the describe_ring result for a
given table.

Both vnode table and tablet table can use this table param, so it is
easier for users to user.

If the table param is not provided by user and the keyspace contains
tablet table, the request will be rejected.

E.g.,
curl "http://127.0.0.1:10000/storage_service/describe_ring/system_auth?table=roles"
curl "http://127.0.0.1:10000/storage_service/describe_ring/ks1?table=standard1"

Refs #16509

Closes scylladb/scylladb#17118

* github.com:scylladb/scylladb:
  tablets: Convert to use the new version of for_each_tablet
  storage_service: Add describe_ring support for tablet table
  storage_service: Mark host2ip as const
  tablets: Add for_each_tablet_gently
2024-02-07 10:41:36 +02:00
Tomasz Grabiec
032c1a3d04 Merge 'tablets: Make sure topology has enough endpoints for RF' from Pavel Emelyanov
When creating a keyspace, scylla allows setting RF value smaller than there are nodes in the DC. With vnodes, when new nodes are bootstrapped, new tokens are inserted thus catching up with RF. With tablets, it's not the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and require as many nodes to be up and running as the requested RF is. This patch implementes this in a lazy manned -- when creating a keyspace RF can be any, but when a new table is created the topology should meet RF requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

closes: #16529

Closes scylladb/scylladb#17079

* github.com:scylladb/scylladb:
  tablets: Make sure topology has enough endpoints for RF
  cql-pytest: Disable tablets when RF > nodes-in-DC
  test: Remove test that configures RF larger than the number of nodes
  keyspace_metadata: Include tablets property in DESCRIBE
2024-02-06 22:38:11 +01:00
Botond Dénes
a3d4131918 Merge 'Sanitize replication factor parsing by strategies' from Pavel Emelyanov
RF values appear as strings and strategies classes convert them to integers. This PR removes some duplication of efforts in converting code.

Closes scylladb/scylladb#17132

* github.com:scylladb/scylladb:
  network_topology_strategy: Do not walk list of datacenters twice
  replication_strategy: Do not convert string RF into int twise
  abstract_replication_strategy: Make validate_replication_factor return value
2024-02-06 13:26:31 +02:00
Asias He
904bafd069 tablets: Convert to use the new version of for_each_tablet
It is more gently than the old one.
2024-02-05 18:45:40 +08:00
Pavel Emelyanov
45dbe38658 tablets: Make sure topology has enough endpoints for RF
When creating a keyspace, scylla allows setting RF value smaller than
there are nodes in the DC. With vnodes, when new nodes are bootstrapped,
new tokens are inserted thus catching up with RF. With tablets, it's not
the case as replica set remains unchanged.

With tablets it's good chance not to mimic the vnodes behavior and
require as many nodes to be up and running as the requested RF is. This
patch implementes this in a lazy manned -- when creating a keyspace RF
can be any, but when a new table is created the topology should meet RF
requirements. If not met, user can bootstrap new nodes or ALTER KEYSPACE.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-05 12:50:04 +03:00
Asias He
fab0d33d08 tablets: Add for_each_tablet_gently
In this version, the callback returns a future<>, so it can yield itself
to avoid stalls in func itself.
2024-02-05 13:42:08 +08:00
Avi Kivity
784c2f8ad2 Merge 'treewide: replace calls to future::get0() by calls to future::get()' from Kefu Chai
get0() dates back from the days where Seastar futures carried tuples, and get0() was a way to get the first (and usually only) element. Now it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.

Closes scylladb/scylladb#17130

* github.com:scylladb/scylladb:
  treewide: replace seastar::future::get0() with seastar::future::get()
  sstable: capture return value of get0() using auto
  utils: result_loop: define result_type with decayed type

[avi: add another one that snuck in while this was cooking]
2024-02-04 15:23:33 +02:00
Avi Kivity
7cb1c10fed treewide: replace seastar::future::get0() with seastar::future::get()
get0() dates back from the days where Seastar futures carried tuples, and
get0() was a way to get the first (and usually only) element. Now
it's a distraction, and Seastar is likely to deprecate and remove it.

Replace with seastar::future::get(), which does the same thing.
2024-02-02 22:12:57 +08:00
Pavel Emelyanov
afda0f6ddf network_topology_strategy: Do not walk list of datacenters twice
Construct of that class walks the provided options to get per-DC
replication factors. It does it twice -- first to populate the dc:rf
map, second to calculate the sum of provided RF values. The latter loop
can be optimized away.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:39:24 +03:00
Pavel Emelyanov
06f9e7367c replication_strategy: Do not convert string RF into int twise
There are two replication strategy classes that validate string RF and
then convert it into integer. Since validation helper returns the parsed
value, it can be just used avoiding the 2nd conversion.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:38:17 +03:00
Pavel Emelyanov
a8cd3bc636 abstract_replication_strategy: Make validate_replication_factor return value
The helper in question checks if string RF is indeed an integer. Make
this helper return the "checked" integer value, because it does this
conversion. And rename it to parse_... to reflect what it now does. Next
patches will make use of this change.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2024-02-02 14:36:47 +03:00